On Tue, Jun 6, 2023 at 4:36 AM Richard Guo <guofengli...@gmail.com> wrote:
>
>
> On Mon, Jan 23, 2023 at 10:00 PM James Coleman <jtc...@gmail.com> wrote:
>>
>> Which this patch we do in fact now see (as expected) rels with
>> non-empty lateral_relids showing up in generate_[useful_]gather_paths.
>> And the partial paths can now have non-empty required outer rels. I'm
>> not able to come up with a plan that would actually be caught by those
>> checks; I theorize that because of the few places we actually call
>> generate_[useful_]gather_paths we are in practice already excluding
>> those, but for now I've left these as a conditional rather than an
>> assertion because it seems like the kind of guard we'd want to ensure
>> those methods are safe.
>
>
> I'm trying to understand this part.  AFAICS we will not create partial
> paths for a rel, base or join, if it has lateral references.  So it
> seems to me that in generate_[useful_]gather_paths after we've checked
> that there are partial paths, the checks for lateral_relids are not
> necessary because lateral_relids should always be empty in this case.
> Maybe I'm missing something.

At first I was thinking "isn't the point of the patch to generate
partial paths for rels with lateral references" given what I'd written
back in January, but I added "Assert(bms_is_empty(required_outer));"
to both of those functions and the assertion never fails running the
tests (including my newly parallelizable queries). I'm almost positive
I'd checked this back in January (not only had I'd explicitly written
that I'd confirmed we had non-empty lateral_relids there, but also it
was the entire based of the alternate approach to the patch), but...I
can't go back to 5 months ago and remember what I'd done.

Ah! Your comment about "after we've checked that there are partial
paths" triggered a thought. I think originally I'd had the
"bms_is_subset(required_outer, rel->relids)" check first in these
functions. And indeed if I run the tests with that the assertion moved
to above the partial paths check, I get failures in
generate_useful_gather_paths specifically. Mystery solved!

> And while trying the v9 patch I came across a crash with the query
> below.
>
> set min_parallel_table_scan_size to 0;
> set parallel_setup_cost to 0;
> set parallel_tuple_cost to 0;
>
> explain (costs off)
> select * from pg_description t1 where objoid in
>     (select objoid from pg_description t2 where t2.description = 
> t1.description);
>                        QUERY PLAN
> --------------------------------------------------------
>  Seq Scan on pg_description t1
>    Filter: (SubPlan 1)
>    SubPlan 1
>      ->  Gather
>            Workers Planned: 2
>            ->  Parallel Seq Scan on pg_description t2
>                  Filter: (description = t1.description)
> (7 rows)
>
> select * from pg_description t1 where objoid in
>     (select objoid from pg_description t2 where t2.description = 
> t1.description);
> WARNING:  terminating connection because of crash of another server process
>
> Seems something is wrong when extracting the argument from the Param in
> parallel worker.

With what I'm trying to change I don't think this plan should ever be
generated since it means we'd have to pass a param from the outer seq
scan into the parallel subplan, which we can't do (currently).

I've attached the full backtrace to the email, but as you hinted at
the parallel worker is trying to get the param (in this case
detoasting it), but the param doesn't exist on the worker, so it seg
faults. Looking at this further I think there's an existing test case
that exposes the misplanning here (the one right under the comment
"Parallel Append is not to be used when the subpath depends on the
outer param" in select_parallel.sql), but it doesn't seg fault because
the param is an integer, doesn't need to be detoasted, and therefore
(I think) we skate by (but probably with wrong results in depending on
the dataset).

Interestingly this is one of the existing test queries my original
patch approach didn't change, so this gives me something specific to
work with improving the path. Thanks for testing this and bringing
this to my attention!

BTW are you by any chance testing on ARM macOS? I reproduced the issue
there, but for some reason I did not reproduce the error (and the plan
wasn't parallelized) when I tested this on linux. Perhaps I missed
setting something up; it seems odd.

> BTW another rebase is needed as it no longer applies to HEAD.

Apologies; I'd rebased, but hadn't updated the thread. See attached
for an updated series (albeit still broken on your test query).

Thanks,
James
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
(code=1, address=0x0)
  * frame #0: 0x000000010449cab0 postgres`toast_raw_datum_size(value=0) at 
detoast.c:550:6
    frame #1: 0x0000000104cf9438 postgres`texteq(fcinfo=0x0000000112809c80) at 
varlena.c:1645:10
    frame #2: 0x00000001047ca7ac 
postgres`ExecInterpExpr(state=0x00000001128097e0, econtext=0x0000000112808c48, 
isnull=0x000000016b97e39f) at execExprInterp.c:758:8
    frame #3: 0x00000001047c98bc 
postgres`ExecInterpExprStillValid(state=0x00000001128097e0, 
econtext=0x0000000112808c48, isNull=0x000000016b97e39f) at 
execExprInterp.c:1870:9
    frame #4: 0x00000001047ee1b0 
postgres`ExecEvalExprSwitchContext(state=0x00000001128097e0, 
econtext=0x0000000112808c48, isNull=0x000000016b97e39f) at executor.h:355:13
    frame #5: 0x00000001047edddc postgres`ExecQual(state=0x00000001128097e0, 
econtext=0x0000000112808c48) at executor.h:424:8
    frame #6: 0x00000001047ed964 postgres`ExecScan(node=0x0000000112808d68, 
accessMtd=(postgres`SeqNext at nodeSeqscan.c:51), 
recheckMtd=(postgres`SeqRecheck at nodeSeqscan.c:90)) at exec
Scan.c:226:23
    frame #7: 0x0000000104837ea8 
postgres`ExecSeqScan(pstate=0x0000000112808d68) at nodeSeqscan.c:112:9
    frame #8: 0x00000001047e8ca0 
postgres`ExecProcNodeFirst(node=0x0000000112808d68) at execProcnode.c:464:9
    frame #9: 0x00000001047dfd90 postgres`ExecProcNode(node=0x0000000112808d68) 
at executor.h:273:9
    frame #10: 0x00000001047daa28 
postgres`ExecutePlan(estate=0x0000000112808318, planstate=0x0000000112808d68, 
use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuple
s=0, direction=ForwardScanDirection, dest=0x00000001130389d8, 
execute_once=true) at execMain.c:1661:10
    frame #11: 0x00000001047da8d4 
postgres`standard_ExecutorRun(queryDesc=0x0000000113039738, 
direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:365:3
    frame #12: 0x00000001047da644 
postgres`ExecutorRun(queryDesc=0x0000000113039738, 
direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:309:3
    frame #13: 0x00000001047e3230 
postgres`ParallelQueryMain(seg=0x000000011300a5b8, toc=0x0000000105434000) at 
execParallel.c:1464:2
    frame #14: 0x000000010458a8d0 
postgres`ParallelWorkerMain(main_arg=4065634722) at parallel.c:1521:2
    frame #15: 0x00000001049e70e4 postgres`StartBackgroundWorker at 
bgworker.c:861:2
    frame #16: 0x00000001049f64b8 
postgres`do_start_bgworker(rw=0x0000000113804080) at postmaster.c:5762:4
    frame #17: 0x00000001049eed90 postgres`maybe_start_bgworkers at 
postmaster.c:5986:9
    frame #18: 0x00000001049f1764 postgres`process_pm_pmsignal at 
postmaster.c:5149:3
    frame #19: 0x00000001049eef7c postgres`ServerLoop at postmaster.c:1770:5
    frame #20: 0x00000001049edb48 postgres`PostmasterMain(argc=3, 
argv=0x00006000016453c0) at postmaster.c:1463:11
    frame #21: 0x00000001048782f4 postgres`main(argc=3, 
argv=0x00006000016453c0) at main.c:198:3
    frame #22: 0x00000001a08cff28 dyld`start + 2236

Attachment: v10-0001-Add-tests-before-change.patch
Description: Binary data

Attachment: v10-0002-Parallelize-correlated-subqueries.patch
Description: Binary data

Reply via email to