Re: [HACKERS] checkpointer continuous flushing

Fabien COELHO Tue, 08 Sep 2015 07:40:40 -0700


Hello Amit,

I have done some tests with both the patches(sort+flush) and below
are results:


Thanks a lot for these runs on this great harware!

Test - 1 (Data Fits in shared_buffers)


Rounded for easier comparison:

  flush/sort
  off off: 27480.4 ± 12791.1 [   0, 16009, 32109, 37629, 51671] (2.8%)
  off on : 27482.5 ± 12552.0 [   0, 16587, 31226, 37516, 51297] (2.8%)

The two above case are pretty indistinguishable, sorting has no impact.The 2.8% means more than 1 minute offline per hour (not necessarily awhole minute, it may be distributed over the whole hour).


  on  off: 25214.8 ± 11059.7 [5268, 14188, 26472, 35626, 51479] (0.0%)
  on  on : 26819.6 ± 10589.7 [5192, 16825, 29430, 35708, 51475] (0.0%)

For this test run, the best results are when both the sort and flushoptions are enabled, the value of lowest TPS is increased substantiallywithout sacrificing much on average or median TPS values (though thereis ~9% dip in median TPS value). When only sorting is enabled, there isneither significant gain nor any loss. When only flush is enabled,there is significant degradation in both average and median value of TPS~8% and ~21% respectively.

I interpret the five numbers in bracket as an indicator of performancestability: they should be equal for perfect stability. Once they show somestability, the next point for me is to focus at the average performance. Ido not see a median decrease as a big issue if the average is reasonablygood.

Thus I essentially note the -2.5% dip on average of on-on vs off-on. Iwould say that it is probably significant, although it might be in theerror margin of the measure. Not sure whether the little stddev reductionis really significant. Anyway the benefit is clear: 100% availability.


Flushing without sorting is a bad idea (tm), not a surprise.

Test - 2 (Data doesn't fit in shared_buffers, but fits in RAM)


 flush/sort
 off off: 5050.1 ± 4884.5 [   0,   98, 4699, 10126, 13631] ( 7.7%)
 off on : 6194.2 ± 4913.5 [   0,   98, 8982, 10558, 14035] (11.0%)
 on  off: 2771.3 ± 1861.0 [ 288, 2039, 2375,  2679, 12862] ( 0.0%)
 on  on : 6110.6 ± 1939.3 [1652, 5215, 5724,  6196, 13828] ( 0.0%)

I'm not sure that the off-on vs on-on -1.3% avg tps dip is significant,but it may be. With both flushing and sorting pg becomes fully available,and the standard deviation is devided by more than 2, so the benefit isclear.

For this test run, again the best results are when both the sort and flush
options are enabled, the value of lowest TPS is increased substantially
and the average and median value of TPS has also increased to
~21% and ~22% respectively.  When only sorting is enabled, there is a
significant gain in average and median TPS values, but then there is also
an increase in number of times when TPS is below 10 which is bad.
When only flush is enabled, there is significant degradation in both average
and median value of TPS to ~82% and ~97% respectively, now I am not
sure if such a big degradation could be expected for this case or it's just
a problem in this run, I have not repeated this test.

Yes, I agree that it is strange that sorting without flushing on its ownboth improves performance (+20% tps) but seems to degrade availability atthe same time. A rerun would have helped to check whether it is a fluke orit is reproducible.

Test - 3 (Data doesn't fit in shared_buffers, but fits in RAM)
----------------------------------------------------------------------------------------
Same configuration and settings as above, but this time, I have enforced
Flush to use posix_fadvise() rather than sync_file_range()  (basically
changed code to comment out sync_file_range() and enable posix_fadvise()).

On using posix_fadvise(), the results for best case (both flush and sort as
on) shows significant degradation in average and median TPS values
by ~48% and ~43% which indicates that probably using posix_fadvise()
with the current options might not be the best way to achieve Flush.


Yes, indeed.

The way posix_fadvise is implemented on Linux is between no effect and badeffect (the buffer is erased). You hit the later quite strongly... As youare doing a "not fit in shared buffer" test, it is essential that buffersare kept in ram, but posix_fadvise on Linux just instructs to erase thebuffer from memory if it was already passed to the I/O subsystem, whichgiven the probably large I/O device cache on your host should be donepretty quickly, so that later read must be fetch back from the device(either cache or disk), which means a drop in performance.

Note that FreeBSD implementation seems more convincing, although less goodthan Linux sync_file_range function. I've no idea about other systems.

Overall, I think this patch (sort+flush) brings a lot of value on tablein terms of stablizing the TPS during checkpoint, however some of thecases like use of posix_fadvise() and the case (all data fits inshared_buffers) where the value of median TPS is regressed could beinvestigated to see what can be done to improve them. I think moretests can be done to ensure the benefit or regression of this patch, butfor now this is what best I can do.


Thanks a lot, again, for these tests!

I think that we may conclude, on these run:

(1) sorting seems not to harm performance, and may help a lot.

(2) Linux flushing with sync_file_range may degrade a little raw tps
    average in some case, but definitely improves performance stability
    (always 100% availability when on !).

(3) posix_fadvise on Linux is a bad idea... the good news is that it
    is not needed there:-) How good or bad an idea it is on other system
    is an open question...

These results are consistent with the current default values in the patch:sorting is on by default, flushing is on with Linux and off otherwise(posix_fadvise).

Also, as the effect on other systems is unclear, I think it is best tokeep both settings as GUCs for now.


--
Fabien.
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to