On 11/27/18 9:59 AM, Kyotaro HORIGUCHI wrote:
>>
>> ...>>
For the main workload there's pretty much no difference, but for selects
from the stats catalogs there's ~20% drop in throughput. In absolute
numbers this means drop from ~670tps to ~550tps. I haven't investigated
this, but I suppose this is due to dshash seqscan being more expensive
than reading the data from file.

Thanks for finding that. The three seqscan loops in
pgstat_vacuum_stat cannot take such a long time, I think. I'll
investigate it.


OK. I'm not sure this is related to pgstat_vacuum_stat - the slowdown happens while querying the catalogs, so why would that trigger vacuum of the stats? I may be missing something, of course.

FWIW, the "query statistics" test simply does this:

  SELECT * FROM pg_stat_all_tables;
  SELECT * FROM pg_stat_all_indexes;
  SELECT * FROM pg_stat_user_indexes;
  SELECT * FROM pg_stat_user_tables;
  SELECT * FROM pg_stat_sys_tables;
  SELECT * FROM pg_stat_sys_indexes;

and the slowdown happened even it was running on it's own (nothing else running on the instance). Which mostly rules out concurrency issues with the hash table locking etc.

I don't think any of this is an issue in practice, though. The important
thing is that there's no measurable impact on the regular workload.

Now, a couple of comments regarding individual parts of the patch.


0001-0003
---------

I do think 0001 - 0003 are ready, with some minor cosmetic issues:

1) I'd rephrase the last part of dshash_seq_init comment more like this:

* If consistent is set for dshash_seq_init, the all hash table
* partitions are locked in the requested mode (as determined by the
* exclusive flag), and the locks are held until the end of the scan.
* Otherwise the partition locks are acquired and released as needed
* during the scan (up to two partitions may be locked at the same time).

Replaced with this.

Maybe it should briefly explain what the consistency guarantees are (and
aren't), but considering we're not materially changing the existing
behavior probably  is not really necessary.

Mmm. actually sequential scan is a new thing altogether, but..


Sure, there are new pieces. But does it significantly change consistency guarantees when reading the stats? I don't think so - there was no strict consistency guaranteed before (due to data interleaved with inquiries, UDP throwing away packets under load, etc.). Based on the discussion in this thread that seems to be the consensus.

2) I think the dshash_find_extended() signature should be more like
dshash_find(), i.e. just appending parameters instead of moving them
around unnecessarily. Perhaps we should add

Sure. It seems to have done by my off-lined finger;p Fixed.


;-)


0004 (+0005 and 0007)
---------------------

This seems fine, but I have my doubts about two changes - removing of
stats_temp_directory and the IDLE_STATS_UPDATE_TIMEOUT thingy.

There's a couple of issues with the stats_temp_directory. Firstly, I
don't understand why it's spread over multiple parts of the patch. The
GUC is removed in 0004, the underlying variable is removed in 0005 and
then the docs are updated in 0007. If we really want to do this, it
should happen in a single patch.

Sure.

But the main question is - do we really want to do that? I understand
this directory was meant for the stats data we're moving to shared
memory, so removing it seems natural. But clearly it's used by
pg_stat_statements - 0005 fixes that, of course, but I wonder if there
are other extensions using it to store files?
It's not just about how intensive I/O to those files is, but this also
means the files will now be included in backups / pg_rewind, and maybe
that's not really desirable?

Maybe it's fine but I'm not quite convinced about it ...

It was also in my mind. Anyway sorry for the strange separation.

I was confused about pgstat_stat_directory (the names are
actually very confusing..). Addition to that pg_stat_statements
does *not* use the variable stats_temp_directory, but using
PG_STAT_TMP_DIR. pgstat_stat_directory was used only by
basebackup.c.

The GUC base variable pgstat_temp_directory is not extern'ed so
we can just remove it along with the GUC
definition. pgstat_stat_directory (it actually stores *temporary*
stats directory) was extern'ed in pgstat.h and PG_STAT_TMP_DIR is
defined in pgstat.h. They are not removed in the new version.
Finally 0005 no longer breaks any other bins, contribs and
external extensions.


Great. I'll take a look.

I'm not sure I understand what IDLE_STATS_UPDATE_TIMEOUT does. You've
described it as

    This adds a new timeout IDLE_STATS_UPDATE_TIMEOUT. This works
    similarly to IDLE_IN_TRANSACTIION_SESSION_TIMEOUT. It fires in
    at most PGSTAT_STAT_MIN_INTERVAL(500)ms to clean up pending
    statistics update.

but I'm not sure what pending updates do you mean? Aren't we updating
the stats at the end of each transaction? At least that's what we've
been doing before, so maybe this patch changes that?

Without the timeout, updates on shared memory happens at the same
rate with transaction traffic and it easily causes congestion. So
the update frequency is limited to the timtout in this patch and
the local statistics made by trasactions committed within the
timeout interval will be merged into one shared stats update. It
is the "pending statistics".

With the socket-based stats collector, it doesn't update the
temporary stats file with the interval not shorter than the
timeout.  The update timeout seemingly behaves the same way with
the socket-based stats collector in the view of readers.

If local statistics is not fully processed at the end of the last
transaction. We don't have a chance to flush them before the next
transaction ends. So timeout is loaded if any "panding stats"
remains. (around postgres.c:4175) The pending stats are processed
forcibly in ProcessInterrupts().


OK, thanks for the explanation. So it's essentially a protection against stats from short transactions not being reported for a long time, when the next transaction is long. For example we might end up with a sequence of short transactions

    T1: short, does not trigger IDLE_STATS_UPDATE_TIMEOUT -> local
    T2: short, does not trigger IDLE_STATS_UPDATE_TIMEOUT -> local
    ...
    TN: short, does not trigger IDLE_STATS_UPDATE_TIMEOUT -> local
    T(N+1): long (say, several hours)

in which case stats from short ones are not reported until the end of the long one. That makes sense.

That however raises the question - won't that also report some of the stats from the last transaction? That would be a change compared to current behavior, although I'm not sure it's undesirable - it's often quite annoying that we don't receive stats from a transaction until it completes. But I wonder - doesn't this affect pg_stat_xact catalogs?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to