Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

Tomas Vondra Sat, 24 Sep 2016 11:29:50 -0700

On 09/24/2016 06:06 AM, Amit Kapila wrote:

On Fri, Sep 23, 2016 at 8:22 PM, Tomas Vondra
<[email protected]> wrote:

...

>>

So I'm using 16GB shared buffers (so with scale 300 everything fits into
shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint timeout
1h etc. So no, there are no checkpoints during the 5-minute runs, only those
triggered explicitly before each run.


Thanks for clarification.  Do you think we should try some different
settings *_flush_after parameters as those can help in reducing spikes
in writes?

I don't see why that settings would matter. The tests are on unloggedtables, so there's almost no WAL traffic and checkpoints (triggeredexplicitly before each run) look like this:

checkpoint complete: wrote 17 buffers (0.0%); 0 transaction log file(s)added, 0 removed, 13 recycled; write=0.062 s, sync=0.006 s, total=0.092s; sync files=10, longest=0.004 s, average=0.000 s; distance=309223 kB,estimate=363742 kB

So I don't see how tuning the flushing would change anything, as we'renot doing any writes.

Moreover, the machine has a bunch of SSD drives (16 or 24, I don'tremember at the moment), behind a RAID controller with 2GB of writecache on it.

Also, I think instead of 5 mins, read-write runs should be run for 15
mins to get consistent data.



Where does the inconsistency come from?


Thats what I am also curious to know.

Lack of warmup?


Can't say, but at least we should try to rule out the possibilities.
I think one way to rule out is to do slightly longer runs for
Dilip's test cases and for pgbench we might need to drop and
re-create database after each reading.

My point is that it's unlikely to be due to insufficient warmup, becausethe inconsistencies appear randomly - generally you get a bunch of slowruns, one significantly faster one, then slow ones again.

I believe the runs to be sufficiently long. I don't see why recreatingthe database would be useful - the whole point is to get the databaseand shared buffers into a stable state, and then do measurements on it.

I don't think bloat is a major factor here - I'm collecting someadditional statistics during this run, including pg_database_size, and Ican see the size oscillates between 4.8GB and 5.4GB. That's prettynegligible, I believe.

I'll let the current set of benchmarks complete - it's running on 4.5.5now, I'll do tests on 3.2.80 too.


Then we can re-evaluate if longer runs are needed.

Considering how uniform the results from the 10 runs are (at least
on 4.5.5), I claim  this is not an issue.


It is quite possible that it is some kernel regression which might
be fixed in later version. Like we are doing most tests in cthulhu
which has 3.10 version of kernel and we generally get consistent
results. I am not sure if later version of kernel say 4.5.5 is a net
win, because there is a considerable difference (dip) of performance
in that version, though it produces quite stable results.

Well, the thing is - the 4.5.5 behavior is much nicer in general. I'llalways prefer lower but more consistent performance (in most cases). Inany case, we're stuck with whatever kernel version the people are using,and they're likely to use the newer ones.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

Reply via email to