Re: Parallelize correlated subqueries that execute within each worker

James Coleman Wed, 18 Jan 2023 18:34:51 -0800

On Wed, Jan 18, 2023 at 2:09 PM Tomas Vondra
<tomas.von...@enterprisedb.com> wrote:
>
> Hi,
>
> This patch hasn't been updated since September, and it got broken by
> 4a29eabd1d91c5484426bc5836e0a7143b064f5a which the incremental sort
> stuff a little bit. But the breakage was rather limited, so I took a
> stab at fixing it - attached is the result, hopefully correct.


Thanks for fixing this up; the changes look correct to me.

> I also added a couple minor comments about stuff I noticed while
> rebasing and skimming the patch, I kept those in separate commits.
> There's also a couple pre-existing TODOs.

I started work on some of these, but wasn't able to finish this
evening, so I don't have an updated series yet.

> James, what's your plan with this patch. Do you intend to work on it for
> PG16, or are there some issues I missed in the thread?

I'd love to see it get into PG16. I don't have any known issues, but
reviewing activity has been light. Originally Robert had had some
concerns about my original approach; I think my updated approach
resolves those issues, but it'd be good to have that sign-off.

Beyond that I'm mostly looking for review and evaluation of the
approach I've taken; of note is my description of that in [1].

> One of the queries in in incremental_sort changed plans a little bit:
>
> explain (costs off) select distinct
>   unique1,
>   (select t.unique1 from tenk1 where tenk1.unique1 = t.unique1)
> from tenk1 t, generate_series(1, 1000);
>
> switched from
>
>  Unique  (cost=18582710.41..18747375.21 rows=10000 width=8)
>    ->  Gather Merge  (cost=18582710.41..18697375.21 rows=10000000 ...)
>          Workers Planned: 2
>          ->  Sort  (cost=18582710.39..18593127.06 rows=4166667 ...)
>                Sort Key: t.unique1, ((SubPlan 1))
>              ...
>
> to
>
>  Unique  (cost=18582710.41..18614268.91 rows=10000 ...)
>    ->  Gather Merge  (cost=18582710.41..18614168.91 rows=20000 ...)
>          Workers Planned: 2
>          ->  Unique  (cost=18582710.39..18613960.39 rows=10000 ...)
>                ->  Sort  (cost=18582710.39..18593127.06 ...)
>                      Sort Key: t.unique1, ((SubPlan 1))
>                    ...
>
> which probably makes sense, as the cost estimate decreases a bit.

Off the cuff that seems fine. I'll read it over again when I send the
updated series.

James Coleman

1: 
https://www.postgresql.org/message-id/CAAaqYe8m0DHUWk7gLKb_C4abTD4nMkU26ErE%3Dahow4zNMZbzPQ%40mail.gmail.com

Re: Parallelize correlated subqueries that execute within each worker

Reply via email to