Hello Amit,

I have done some tests with both the patches(sort+flush) and below
are results:

Thanks a lot for these runs on this great harware!

Test - 1 (Data Fits in shared_buffers)

Rounded for easier comparison:

  flush/sort
  off off: 27480.4 ± 12791.1 [   0, 16009, 32109, 37629, 51671] (2.8%)
  off on : 27482.5 ± 12552.0 [   0, 16587, 31226, 37516, 51297] (2.8%)

The two above case are pretty indistinguishable, sorting has no impact. The 2.8% means more than 1 minute offline per hour (not necessarily a whole minute, it may be distributed over the whole hour).

  on  off: 25214.8 ± 11059.7 [5268, 14188, 26472, 35626, 51479] (0.0%)
  on  on : 26819.6 ± 10589.7 [5192, 16825, 29430, 35708, 51475] (0.0%)

For this test run, the best results are when both the sort and flush options are enabled, the value of lowest TPS is increased substantially without sacrificing much on average or median TPS values (though there is ~9% dip in median TPS value). When only sorting is enabled, there is neither significant gain nor any loss. When only flush is enabled, there is significant degradation in both average and median value of TPS ~8% and ~21% respectively.

I interpret the five numbers in bracket as an indicator of performance stability: they should be equal for perfect stability. Once they show some stability, the next point for me is to focus at the average performance. I do not see a median decrease as a big issue if the average is reasonably good.

Thus I essentially note the -2.5% dip on average of on-on vs off-on. I would say that it is probably significant, although it might be in the error margin of the measure. Not sure whether the little stddev reduction is really significant. Anyway the benefit is clear: 100% availability.

Flushing without sorting is a bad idea (tm), not a surprise.

Test - 2 (Data doesn't fit in shared_buffers, but fits in RAM)

 flush/sort
 off off: 5050.1 ± 4884.5 [   0,   98, 4699, 10126, 13631] ( 7.7%)
 off on : 6194.2 ± 4913.5 [   0,   98, 8982, 10558, 14035] (11.0%)
 on  off: 2771.3 ± 1861.0 [ 288, 2039, 2375,  2679, 12862] ( 0.0%)
 on  on : 6110.6 ± 1939.3 [1652, 5215, 5724,  6196, 13828] ( 0.0%)

I'm not sure that the off-on vs on-on -1.3% avg tps dip is significant, but it may be. With both flushing and sorting pg becomes fully available, and the standard deviation is devided by more than 2, so the benefit is clear.

For this test run, again the best results are when both the sort and flush
options are enabled, the value of lowest TPS is increased substantially
and the average and median value of TPS has also increased to
~21% and ~22% respectively.  When only sorting is enabled, there is a
significant gain in average and median TPS values, but then there is also
an increase in number of times when TPS is below 10 which is bad.
When only flush is enabled, there is significant degradation in both average
and median value of TPS to ~82% and ~97% respectively, now I am not
sure if such a big degradation could be expected for this case or it's just
a problem in this run, I have not repeated this test.

Yes, I agree that it is strange that sorting without flushing on its own both improves performance (+20% tps) but seems to degrade availability at the same time. A rerun would have helped to check whether it is a fluke or it is reproducible.

Test - 3 (Data doesn't fit in shared_buffers, but fits in RAM)
----------------------------------------------------------------------------------------
Same configuration and settings as above, but this time, I have enforced
Flush to use posix_fadvise() rather than sync_file_range()  (basically
changed code to comment out sync_file_range() and enable posix_fadvise()).

On using posix_fadvise(), the results for best case (both flush and sort as
on) shows significant degradation in average and median TPS values
by ~48% and ~43% which indicates that probably using posix_fadvise()
with the current options might not be the best way to achieve Flush.

Yes, indeed.

The way posix_fadvise is implemented on Linux is between no effect and bad effect (the buffer is erased). You hit the later quite strongly... As you are doing a "not fit in shared buffer" test, it is essential that buffers are kept in ram, but posix_fadvise on Linux just instructs to erase the buffer from memory if it was already passed to the I/O subsystem, which given the probably large I/O device cache on your host should be done pretty quickly, so that later read must be fetch back from the device (either cache or disk), which means a drop in performance.

Note that FreeBSD implementation seems more convincing, although less good than Linux sync_file_range function. I've no idea about other systems.

Overall, I think this patch (sort+flush) brings a lot of value on table in terms of stablizing the TPS during checkpoint, however some of the cases like use of posix_fadvise() and the case (all data fits in shared_buffers) where the value of median TPS is regressed could be investigated to see what can be done to improve them. I think more tests can be done to ensure the benefit or regression of this patch, but for now this is what best I can do.

Thanks a lot, again, for these tests!

I think that we may conclude, on these run:

(1) sorting seems not to harm performance, and may help a lot.

(2) Linux flushing with sync_file_range may degrade a little raw tps
    average in some case, but definitely improves performance stability
    (always 100% availability when on !).

(3) posix_fadvise on Linux is a bad idea... the good news is that it
    is not needed there:-) How good or bad an idea it is on other system
    is an open question...

These results are consistent with the current default values in the patch: sorting is on by default, flushing is on with Linux and off otherwise (posix_fadvise).

Also, as the effect on other systems is unclear, I think it is best to keep both settings as GUCs for now.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to