On Tue, Jan 14, 2014 at 11:16 PM, Marti Raudsepp <ma...@juffo.org> wrote:
> On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov <aekorot...@gmail.com> > wrote: > >> I implemented a new > >> enable_partialsort GUC to make it easier to turn on/off > > > I though about such option. Generally not because of testing convenience, > > but because of overhead of planning. This way you implement it is quite > > naive :) For instance, merge join rely on partial sort which will be > > replaced with simple sort. > > Oh, this actually highlights a performance regression with the partial > sort patch. I assumed the planner will discard the full sort because of > higher costs, but it looks like the new code always assumes that a Partial > sort will be cheaper than a Join Filter without considering costs. When > doing a join USING (unique_indexed_value, something), the new plan is > significantly worse. > > Unpatched: > marti=# explain analyze select * from release a join release b using (id, > name); > Merge Join (cost=0.85..179810.75 rows=12 width=158) (actual > time=0.011..1279.596 rows=1232610 loops=1) > Merge Cond: (a.id = b.id) > Join Filter: ((a.name)::text = (b.name)::text) > -> Index Scan using release_id_idx on release a (cost=0.43..79120.04 > rows=1232610 width=92) (actual time=0.005..211.928 rows=1232610 loops=1) > -> Index Scan using release_id_idx on release b (cost=0.43..79120.04 > rows=1232610 width=92) (actual time=0.004..371.592 rows=1232610 loops=1) > Total runtime: 1309.049 ms > > Patched: > Merge Join (cost=0.98..179810.87 rows=12 width=158) (actual > time=0.037..5034.158 rows=1232610 loops=1) > Merge Cond: ((a.id = b.id) AND ((a.name)::text = (b.name)::text)) > -> Partial sort (cost=0.49..82201.56 rows=1232610 width=92) (actual > time=0.013..955.938 rows=1232610 loops=1) > Sort Key: a.id, a.name > Presorted Key: a.id > Sort Method: quicksort Memory: 25kB > -> Index Scan using release_id_idx on release a > (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.007..449.332 > rows=1232610 loops=1) > -> Materialize (cost=0.49..85283.09 rows=1232610 width=92) (actual > time=0.019..1352.377 rows=1232610 loops=1) > -> Partial sort (cost=0.49..82201.56 rows=1232610 width=92) > (actual time=0.018..1223.251 rows=1232610 loops=1) > Sort Key: b.id, b.name > Presorted Key: b.id > Sort Method: quicksort Memory: 25kB > -> Index Scan using release_id_idx on release b > (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.004..597.258 > rows=1232610 loops=1) > Total runtime: 5166.906 ms > ---- > Interesting. Could you share the dataset? There's another "wishlist" kind of thing with top-N heapsort bounds; if I > do a query with LIMIT 1000 then every sort batch has Tuplesortstate.bound > set to 1000, but it could be reduced after each batch. If the first batch > is 900 rows then the 2nd batch only needs the top 100 rows at most. > Right. Just didn't implement it yet. > Also, I find the name "partial sort" a bit confusing; this feature is not > actually sorting *partially*, it's finishing the sort of partially-sorted > data. Perhaps "batched sort" would explain the feature better? Because it > does the sort in multiple batches instead of all at once. But maybe that's > just me. > I'm not sure. For me "batched sort" sounds like we're going to sort in batch something that we sorted separately before. Probably I'm wrong because I'm far from native english :) ------ With best regards, Alexander Korotkov.