On 21/02/15 18:18, Tomas Vondra wrote:
Hi,

On 21.2.2015 02:06, Tomas Vondra wrote:
On 21.2.2015 02:00, Andrew Gierth wrote:
"Tomas" == Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
  >> Right...so don't test a datum sort case, since that isn't supported
  >> at all in the master branch. Your test case is invalid for that
  >> reason.

  Tomas> What do you mean by 'Datum sort case'?

A case where the code path goes via tuplesort_begin_datum rather than
tuplesort_begin_heap.

  Tomas> The test I was using is this:

  Tomas>    select percentile_disc(0) within group (order by randnum) from 
stuff;

Sorting single columns in aggregate calls uses the Datum sort path (in
fact I think it's currently the only place that does).

Do that test with _both_ the Datum and Numeric sort patches in place,
and you will see the effect. With only the Numeric patch, the numeric
abbrev code is not called.
D'oh! Thanks for the explanation.
OK, so I've repeated the benchmarks with both patches applied, and I
think the results are interesting. I extended the benchmark a bit - see
the SQL script attached.

   1) multiple queries

      select percentile_disc(0) within group (order by val) from stuff

      select count(distinct val) from stuff

      select * from
        (select * from stuff order by val offset 100000000000) foo

   2) multiple data types - int, float, text and numeric

   3) multiple scales - 1M, 2M, 3M, 4M and 5M rows

Each query was executed 10x, the timings were averaged. I do know some
of the data types don't benefit from the patches, but I included them to
get a sense of how noisy the results are.

I did the measurements for

   1) master
   2) master + datum_sort_abbrev.patch
   3) master + datum_sort_abbrev.patch + numeric_sortsup.patch

and then computed the speedup for each type/scale combination (the
impact on all the queries is almost exactly the same).

Complete results are available here: http://bit.ly/1EA4mR9

I'll post all the summary here, although some of the numbers are about
the other abbreviated keys patch.


1) datum_sort_abbrev.patch vs. master

     scale      float      int    numeric     text
     ---------------------------------------------
     1          101%       99%       105%     404%
     2          101%       98%        96%      98%
     3          101%      101%        99%      97%
     4          100%      101%        98%      95%
     5           99%       98%        93%      95%

2) numeric_sortsup.patch vs. master

     scale     float       int    numeric     text
     ---------------------------------------------
     1           97%       98%       374%     396%
     2          100%      101%       407%      96%
     3           99%      102%       407%      95%
     4           99%      101%       423%      92%
     5           95%       99%       411%      92%


I think the gains are pretty awesome - I mean, 400% speedup for Numeric
accross the board? Yes please!

The gains for text are also very nice, although in this case that only
happens for the smallest scale (1M rows), and for larger scales it's
actually slower than current master :-(

It's not just rainbows and unicorns, though. With both patches applied,
text sorts get even slower (up to ~8% slower than master), It also seems
to impact float (which gets ~5% slower, for some reason), but I don't
see how that could happen ... but I suspect this might be noise.

I'll repeat the tests on another machine after the weekend, and post an
update whether the results are the same or significantly different.

regards



What are the standard deviations?

Do the arithmetic means change much if you exclude the 2 fastest & 2 slowest?

How do the arithmetic means compare to their respective medians?

Essentially, how consistent are the results, or how great is the noise? There may be better indicators than the ones I've suggested above.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to