On Fri, May 5, 2017 at 11:13 AM, Alexander Korotkov <a.korot...@postgrespro.ru> wrote: > Incremental sort is faster in vast majority of cases. It appears to be > slower only when whose dataset is one sort group. In this case incremental > sort is useless, and it should be considered as misuse of incremental sort. > Slowdown is related to the fact that we anyway have to do extra comparisons, > unless we somehow push our comparison result into qsort itself and save some > cpu cycles (but that would be unreasonable break of encapsulation). Thus, > in such cases regression seems to be inevitable anyway. I think we could > evade this regression during query planning. If we see that there would be > only few groups, we should choose plain sort instead of incremental sort.
I'm sorry that I don't have time to review this in detail right now, but it sounds like you are doing good work to file down cases where this might cause regressions, which is great. Regarding the point in the paragraph above, I'd say that it's OK for the planner to be responsible for picking between Sort and Incremental Sort in some way. It is, after all, the planner's job to decide between different strategies for executing the same query and, of course, sometimes it will be wrong, but that's OK as long as it's not wrong too often (or by too much, hopefully). It may be a little difficult to get this right, though, because I'm not sure that the information you need actually exists (or is reliable). For example, consider the case where we need to sort 100m rows and there are 2 groups. If 1 group contains 1 row and the other group contains all of the rest, there is really no point in an incremental sort. On the other hand, if each group contains 50m rows and we can get the data presorted by the grouping column, there might be a lot of point to an incremental sort, because two 50m-row sorts might be a lot cheaper than one 100m sort. More generally, it's quite easy to imagine situations where the individual groups can be quicksorted but sorting all of the rows requires I/O, even when the number of groups isn't that big. On the other hand, the real sweet spot for this is probably the case where the number of groups is very large, with many single-row groups or many groups with just a few rows each, so if we can at least get this to work in those cases that may be good enough. On the third hand, when costing aggregation, I think we often underestimate the number of groups and there might well be similar problems here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers