On Tue, Dec 31, 2013 at 2:41 PM, Andreas Karlsson <andr...@proxel.se> wrote:
> On 12/29/2013 08:24 AM, David Rowley wrote: > >> If it was possible to devise some way to reuse any >> previous tuplesortstate perhaps just inventing a reset method which >> clears out tuples, then we could see performance exceed the standard >> seqscan -> sort. The code the way it is seems to lookup the sort >> functions from the syscache for each group then allocate some sort >> space, so quite a bit of time is also spent in palloc0() and pfree() >> >> If it was not possible to do this then maybe adding a cost to the number >> of sort groups would be better so that the optimization is skipped if >> there are too many sort groups. >> > > It should be possible. I have hacked a quick proof of concept for reusing > the tuplesort state. Can you try it and see if the performance regression > is fixed by this? > > One thing which have to be fixed with my patch is that we probably want to > close the tuplesort once we have returned the last tuple from ExecSort(). > > I have attached my patch and the incremental patch on Alexander's patch. > > Thanks, the attached is about 5 times faster than it was previously with my test case upthread. The times now look like: No pre-sortable index: Total runtime: 86.278 ms With pre-sortable index with partial sorting Total runtime: 47.500 ms With the query where there is no index the sort remained in memory. I spent some time trying to find a case where the partial sort is slower than the seqscan -> sort. The only places partial sort seems slower are when the number of estimated sort groups are around the crossover point where the planner would be starting to think about performing a seqscan -> sort instead. I'm thinking right now that it's not worth raising the costs around this as the partial sort is less likely to become a disk sort than the full sort is. I'll keep going with trying to break it. Regards David Rowley > -- > Andreas Karlsson >