On Thu, Aug 20, 2015 at 10:41 AM, Peter Geoghegan <p...@heroku.com> wrote:

> On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs <si...@2ndquadrant.com>
> wrote:
> > On 20 August 2015 at 03:24, Peter Geoghegan <p...@heroku.com> wrote:
> >>
> >>
> >> The patch is ~3.25x faster than master
> >
> >
> > I've tried to read this post twice and both times my work_mem overflowed.
> > ;-)
> >
> > Can you summarize what this patch does? I understand clearly what it
> doesn't
> > do...
>
> The most important thing that it does is always quicksort runs, that
> are formed by simply filling work_mem with tuples in no particular
> order, rather than trying to make runs that are twice as large as
> work_mem on average. That's what the ~3.25x improvement concerned.
> That's actually a significantly simpler algorithm than replacement
> selection, and appears to be much faster. You might even say that it's
> a dumb algorithm, because it is less sophisticated than replacement
> selection. However, replacement selection tends to use CPU caches very
> poorly, while its traditional advantages have become dramatically less
> important due to large main memory sizes in particular. Also, it hurts
> that we don't currently dump tuples in batches, for several reasons.
> Better to do memory intense operations in batch, rather than having a
> huge inner loop, in order to minimize or prevent instruction cache
> misses. And we can better take advantage of asynchronous I/O.
>
> The complicated aspect of considering the patch is whether or not it's
> okay to not use replacement selection anymore -- is that an
> appropriate trade-off?
>
> The reason that the code has not actually been simplified by this
> patch is that I still want to use replacement selection for one
> specific case: when it is anticipated that a "quicksort with
> spillover" can occur, which is only possible with incremental
> spilling. That may avoid most I/O, by spilling just a few tuples using
> a heap/priority queue, and quicksorting everything else. That's
> compelling when you can manage it, but no reason to always use
> replacement selection for the first run in the common case where there
> well be several runs in total.
>
> Is that any clearer? To borrow a phrase from the processor
> architecture community, from a high level this is a "Brainiac versus
> Speed Demon" [1] trade-off. (I wish that there was a widely accepted
> name for this trade-off.)
>
> [1]
> http://www.lighterra.com/papers/modernmicroprocessors/#thebrainiacdebate
> --
> Peter Geoghegan
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Hi, Peter,

Just a quick anecdotal evidence.  I did similar experiment about three
years ago.   The conclusion was that if you have SSD, just do quick sort
and forget the longer runs, but if you are using hard drives, longer runs
is the winner (and safer, to avoid cliffs).    I did not experiment with
RAID0/5 on many spindles though.

Not limited to sort, more generally, SSD is different enough from HDD,
therefore it may worth the effort for backend to "guess" what storage
device it has, then choose the right thing to do.

Cheers.

Reply via email to