Re: [HACKERS] PoC: Partial sort

Alexander Korotkov Tue, 14 Jan 2014 07:52:22 -0800

Hi!

On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp <[email protected]> wrote:


> First, thanks a lot for working on this feature. This PostgreSQL
> shortcoming crops up in all the time in web applications that implement
> paging by multiple sorted columns.
>

Thanks!

I've been trying it out in a few situations. I implemented a new
> enable_partialsort GUC to make it easier to turn on/off, this way it's a
> lot easier to test. The attached patch applies on top of
> partial-sort-5.patch
>

I though about such option. Generally not because of testing convenience,
but because of overhead of planning. This way you implement it is quite
naive :) For instance, merge join rely on partial sort which will be
replaced with simple sort.


> I will spend more time reviewing the patch, but some of this planner code
> is over my head. If there's any way I can help to make sure this lands in
> the next version, let me know.
>
> ----
>
> The patch performs just as well as I would expect it to:
>
> marti=# select ac.name, r.name from artist_credit ac join release r on (
> ac.id=r.artist_credit) order by ac.name, r.name limit 1000;
> Time: 9.830 ms
> marti=# set enable_partialsort = off;
> marti=# select ac.name, r.name from artist_credit ac join release r on (
> ac.id=r.artist_credit) order by ac.name, r.name limit 1000;
> Time: 1442.815 ms
>
> A difference of almost 150x!
>
> There's a missed opportunity in that the code doesn't consider pushing new
> Sort steps into subplans. For example, if there's no index on
> language(name) then this query cannot take advantage partial sorts:
>
> marti=# explain select l.name, r.name from language l join release r on (
> l.id=r.language) order by l.name, r.name limit 1000;
>  Limit  (cost=123203.20..123205.70 rows=1000 width=32)
>    ->  Sort  (cost=123203.20..126154.27 rows=1180430 width=32)
>          Sort Key: l.name, r.name
>          ->  Hash Join  (cost=229.47..58481.49 rows=1180430 width=32)
>                Hash Cond: (r.language = l.id)
>                ->  Seq Scan on release r  (cost=0.00..31040.10
> rows=1232610 width=26)
>                ->  Hash  (cost=131.43..131.43 rows=7843 width=14)
>                      ->  Seq Scan on language l  (cost=0.00..131.43
> rows=7843 width=14)
>
> But because there are only so few languages, it would be a lot faster to
> sort languages in advance and then do partial sort:
>  Limit  (rows=1000 width=31)
>    ->  Partial sort  (rows=1180881 width=31)
>          Sort Key: l.name, r.name
>          Presorted Key: l.name
>          ->  Nested Loop  (rows=1180881 width=31)
>                ->  Sort  (rows=7843 width=10)
>                      Sort Key: name
>                      ->  Seq Scan on language  (rows=7843 width=14)
>                ->  Index Scan using release_language_idx on release r
> (rows=11246 width=25)
>                      Index Cond: (language = l.id)
>
> Even an explicit sorted CTE cannot take advantage of partial sorts:
> marti=# explain with sorted_lang as (select id, name from language order
> by name)
> marti-# select l.name, r.name from sorted_lang l join release r on 
> (l.id=r.language)
> order by l.name, r.name limit 1000;
>  Limit  (cost=3324368.83..3324371.33 rows=1000 width=240)
>    CTE sorted_lang
>      ->  Sort  (cost=638.76..658.37 rows=7843 width=14)
>            Sort Key: language.name
>            ->  Seq Scan on language  (cost=0.00..131.43 rows=7843 width=14)
>    ->  Sort  (cost=3323710.46..3439436.82 rows=46290543 width=240)
>          Sort Key: l.name, r.name
>          ->  Merge Join  (cost=664.62..785649.92 rows=46290543 width=240)
>                Merge Cond: (r.language = l.id)
>                ->  Index Scan using release_language_idx on release r
> (cost=0.43..87546.06 rows=1232610 width=26)
>                ->  Sort  (cost=664.19..683.80 rows=7843 width=222)
>                      Sort Key: l.id
>                      ->  CTE Scan on sorted_lang l  (cost=0.00..156.86
> rows=7843 width=222)
>
> But even with these limitations, this will easily be the killer feature of
> the next release, for me at least.
>

I see. But I don't think it can be achieved by small changes in planner.
Moreover, I didn't check but I think if you remove ordering by r.name you
will still not get sorting languages in the inner node. So, this problem is
not directly related to partial sort.

------
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

Reply via email to