Hi! On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp <ma...@juffo.org> wrote:
> First, thanks a lot for working on this feature. This PostgreSQL > shortcoming crops up in all the time in web applications that implement > paging by multiple sorted columns. > Thanks! I've been trying it out in a few situations. I implemented a new > enable_partialsort GUC to make it easier to turn on/off, this way it's a > lot easier to test. The attached patch applies on top of > partial-sort-5.patch > I though about such option. Generally not because of testing convenience, but because of overhead of planning. This way you implement it is quite naive :) For instance, merge join rely on partial sort which will be replaced with simple sort. > I will spend more time reviewing the patch, but some of this planner code > is over my head. If there's any way I can help to make sure this lands in > the next version, let me know. > > ---- > > The patch performs just as well as I would expect it to: > > marti=# select ac.name, r.name from artist_credit ac join release r on ( > ac.id=r.artist_credit) order by ac.name, r.name limit 1000; > Time: 9.830 ms > marti=# set enable_partialsort = off; > marti=# select ac.name, r.name from artist_credit ac join release r on ( > ac.id=r.artist_credit) order by ac.name, r.name limit 1000; > Time: 1442.815 ms > > A difference of almost 150x! > > There's a missed opportunity in that the code doesn't consider pushing new > Sort steps into subplans. For example, if there's no index on > language(name) then this query cannot take advantage partial sorts: > > marti=# explain select l.name, r.name from language l join release r on ( > l.id=r.language) order by l.name, r.name limit 1000; > Limit (cost=123203.20..123205.70 rows=1000 width=32) > -> Sort (cost=123203.20..126154.27 rows=1180430 width=32) > Sort Key: l.name, r.name > -> Hash Join (cost=229.47..58481.49 rows=1180430 width=32) > Hash Cond: (r.language = l.id) > -> Seq Scan on release r (cost=0.00..31040.10 > rows=1232610 width=26) > -> Hash (cost=131.43..131.43 rows=7843 width=14) > -> Seq Scan on language l (cost=0.00..131.43 > rows=7843 width=14) > > But because there are only so few languages, it would be a lot faster to > sort languages in advance and then do partial sort: > Limit (rows=1000 width=31) > -> Partial sort (rows=1180881 width=31) > Sort Key: l.name, r.name > Presorted Key: l.name > -> Nested Loop (rows=1180881 width=31) > -> Sort (rows=7843 width=10) > Sort Key: name > -> Seq Scan on language (rows=7843 width=14) > -> Index Scan using release_language_idx on release r > (rows=11246 width=25) > Index Cond: (language = l.id) > > Even an explicit sorted CTE cannot take advantage of partial sorts: > marti=# explain with sorted_lang as (select id, name from language order > by name) > marti-# select l.name, r.name from sorted_lang l join release r on > (l.id=r.language) > order by l.name, r.name limit 1000; > Limit (cost=3324368.83..3324371.33 rows=1000 width=240) > CTE sorted_lang > -> Sort (cost=638.76..658.37 rows=7843 width=14) > Sort Key: language.name > -> Seq Scan on language (cost=0.00..131.43 rows=7843 width=14) > -> Sort (cost=3323710.46..3439436.82 rows=46290543 width=240) > Sort Key: l.name, r.name > -> Merge Join (cost=664.62..785649.92 rows=46290543 width=240) > Merge Cond: (r.language = l.id) > -> Index Scan using release_language_idx on release r > (cost=0.43..87546.06 rows=1232610 width=26) > -> Sort (cost=664.19..683.80 rows=7843 width=222) > Sort Key: l.id > -> CTE Scan on sorted_lang l (cost=0.00..156.86 > rows=7843 width=222) > > But even with these limitations, this will easily be the killer feature of > the next release, for me at least. > I see. But I don't think it can be achieved by small changes in planner. Moreover, I didn't check but I think if you remove ordering by r.name you will still not get sorting languages in the inner node. So, this problem is not directly related to partial sort. ------ With best regards, Alexander Korotkov.