On Thu, Aug 20, 2015 at 11:56 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > This will give more runs, so merging those needs some thought. It will also > give a more predictable number of runs, so we'll be able to predict any > merging issues ahead of time. We can more easily find out the min/max tuple > in each run, so we only merge overlapping runs.
I think that merging runs can be optimized to reduce the number of cache misses. Poul-Henning Kamp, the FreeBSD guy, has described problems with binary heaps and cache misses [1], and I think we could use his solution for merging. But we should definitely still quicksort runs. > Using a heapsort is known to be poor for large heaps. We previously > discussed the idea of quicksorting the first chunk of memory, then > reallocating the heap as a smaller chunk for the rest of the sort. That > would solve the cache miss problem. > > I'd like to see some discussion of how we might integrate aggregation and > sorting. A heap might work quite well for that, whereas quicksort doesn't > sound like it would work as well. If you're talking about deduplicating within tuplesort, then there are techniques. I don't know that that needs to be an up-front priority of this work. > I think its premature to retire that algorithm - I think we should keep it > for a while yet. I suspect it may serve well in cases where we have low > memory, though I accept that is no longer the case for larger servers that > we would now call typical. I have given one case where I think the first run should still use replacement selection: where that enables a "quicksort with spillover". For that reason, I would consider that I have not actually proposed to retire the algorithm. In principle, I agree with also using it under any other circumstances where it is likely to be appreciably faster, but it's just not in evidence that there is any other such case. I did look at all the traditionally sympathetic cases, as I went into, and it still seemed to not be worth it at all. But by all means, if you think I missed something, please show me a test case. > This could cause particular issues in optimization, since heap sort is > wonderfully predictable. We'd need a cost_sort() that was slightly > pessimistic to cover the risk that a quicksort might not be as fast as we > hope. Wonderfully predictable? Really? It's totally sensitive to CPU cache characteristics. I wouldn't say that at all. If you're alluding to the quicksort worst case, that seems like the wrong thing to worry about. The risk around that is often overstated, or based on experience from third-rate implementations that don't follow various widely accepted recommendations from the research community. > I'd like to see a more general and concise plan for how sorting evolves. We > are close to having the infrastructure to perform intermediate aggregation, > which would allow that to happen during sorting when required (aggregation, > sort distinct). We also agreed some time back that parallel sorting would be > the first incarnation of parallel operations, so we need to consider that > also. I agree with everything you say here, I think. I think it's appropriate that this work anticipate adding a number of other optimizations in the future, at least including: * Parallel sort using worker processes. * Memory prefetching. * Offset-value coding of runs, a compression technique that was used in System R, IIRC. This can speed up merging a lot, and will save I/O bandwidth on dumping out runs. * Asynchronous I/O. There should be an integrated approach to applying every possible optimization, or at least leaving the possibility open. A lot of these techniques are complementary. For example, there are significant benefits where the "onlyKey" optimization is now used with external sorts, which you get for free by using quicksort for runs. In short, I am absolutely on-board with the idea that these things need to be anticipated at the very least. For another speculative example, offset coding makes the merge step cheaper, but the work of doing the offset coding can be offloaded to worker processes, whereas the merge step proper cannot really be effectively parallelized -- those two techniques together are greater than the sum of their parts. One big problem that I see with replacement selection is that it makes most of these things impossible. In general, I think that parallel sort should be an external sort technique first and foremost. If you can only parallelize an internal sort, then running out of road when there isn't enough memory to do the sort in memory becomes a serious issue. Besides, you need to partition the input anyway, and external sorting naturally needs to do that while not precluding runs not actually being dumped to disk. [1] http://queue.acm.org/detail.cfm?id=1814327 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers