On Sat, Dec 14, 2013 at 6:39 PM, Martijn van Oosterhout <klep...@svana.org>wrote:
> On Sat, Dec 14, 2013 at 06:21:18PM +0400, Alexander Korotkov wrote: > > > Is that actually all that beneficial when sorting with a bog standard > > > qsort() since that doesn't generally benefit from data being > pre-sorted? > > > I think we might need to switch to a different algorithm to really > > > benefit from mostly pre-sorted input. > > > > > > > In this patch I don't do full sort of dataset. For instance, index > returns > > data ordered by first column and we need to order them also by second > > column. Then this node sorts groups (assumed to be small) where values of > > the first column are same by value of second column. And with limit > clause > > only required number of such groups will be processed. But, I don't think > > we should expect pre-sorted values of second column inside a group. > > Nice. I imagine this would be mostly beneficial for fast-start plans, > since you no longer need to sort the whole table prior to returning the > first tuple. > > Reduced memory usage might be a factor, especially for large sorts > where you otherwise might need to spool to disk. > > You can now use an index on (a) to improve sorting for (a,b). > > Cost of sorting n groups of size l goes from O(nl log nl) to just O(nl > log l), useful for large n. > Agree. Your reasoning looks correct. > Minor comments: > > I find cmpTuple a bad name. That's what it's doing but perhaps > cmpSkipColumns would be clearer. > > I think it's worthwhile adding a seperate path for the skipCols = 0 > case, to avoid extra copies. > Thanks. I'll take care about. ------ With best regards, Alexander Korotkov.