Re: [HACKERS] PoC: Partial sort

Alexander Korotkov Tue, 14 Jan 2014 11:30:01 -0800

On Tue, Jan 14, 2014 at 11:16 PM, Marti Raudsepp <ma...@juffo.org> wrote:


> On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov <aekorot...@gmail.com>
> wrote:
> >> I implemented a new
> >> enable_partialsort GUC to make it easier to turn on/off
>
> > I though about such option. Generally not because of testing convenience,
> > but because of overhead of planning. This way you implement it is quite
> > naive :) For instance, merge join rely on partial sort which will be
> > replaced with simple sort.
>
> Oh, this actually highlights a performance regression with the partial
> sort patch. I assumed the planner will discard the full sort because of
> higher costs, but it looks like the new code always assumes that a Partial
> sort will be cheaper than a Join Filter without considering costs. When
> doing a join USING (unique_indexed_value, something), the new plan is
> significantly worse.
>
> Unpatched:
> marti=# explain analyze select * from release a join release b using (id,
> name);
>  Merge Join  (cost=0.85..179810.75 rows=12 width=158) (actual
> time=0.011..1279.596 rows=1232610 loops=1)
>    Merge Cond: (a.id = b.id)
>    Join Filter: ((a.name)::text = (b.name)::text)
>    ->  Index Scan using release_id_idx on release a  (cost=0.43..79120.04
> rows=1232610 width=92) (actual time=0.005..211.928 rows=1232610 loops=1)
>    ->  Index Scan using release_id_idx on release b  (cost=0.43..79120.04
> rows=1232610 width=92) (actual time=0.004..371.592 rows=1232610 loops=1)
>  Total runtime: 1309.049 ms
>
> Patched:
>  Merge Join  (cost=0.98..179810.87 rows=12 width=158) (actual
> time=0.037..5034.158 rows=1232610 loops=1)
>    Merge Cond: ((a.id = b.id) AND ((a.name)::text = (b.name)::text))
>    ->  Partial sort  (cost=0.49..82201.56 rows=1232610 width=92) (actual
> time=0.013..955.938 rows=1232610 loops=1)
>          Sort Key: a.id, a.name
>          Presorted Key: a.id
>          Sort Method: quicksort  Memory: 25kB
>          ->  Index Scan using release_id_idx on release a
>  (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.007..449.332
> rows=1232610 loops=1)
>    ->  Materialize  (cost=0.49..85283.09 rows=1232610 width=92) (actual
> time=0.019..1352.377 rows=1232610 loops=1)
>          ->  Partial sort  (cost=0.49..82201.56 rows=1232610 width=92)
> (actual time=0.018..1223.251 rows=1232610 loops=1)
>                Sort Key: b.id, b.name
>                Presorted Key: b.id
>                Sort Method: quicksort  Memory: 25kB
>                ->  Index Scan using release_id_idx on release b
>  (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.004..597.258
> rows=1232610 loops=1)
>  Total runtime: 5166.906 ms
> ----
>

Interesting. Could you share the dataset?

There's another "wishlist" kind of thing with top-N heapsort bounds; if I
> do a query with LIMIT 1000 then every sort batch has Tuplesortstate.bound
> set to 1000, but it could be reduced after each batch. If the first batch
> is 900 rows then the 2nd batch only needs the top 100 rows at most.
>

Right. Just didn't implement it yet.


> Also, I find the name "partial sort" a bit confusing; this feature is not
> actually sorting *partially*, it's finishing the sort of partially-sorted
> data. Perhaps "batched sort" would explain the feature better? Because it
> does the sort in multiple batches instead of all at once. But maybe that's
> just me.
>

I'm not sure. For me "batched sort" sounds like we're going to sort in
batch something that we sorted separately before. Probably I'm wrong
because I'm far from native english :)

------
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

Reply via email to