On Tue, Jun 6, 2023 at 4:36 AM Richard Guo <guofengli...@gmail.com> wrote: > > > On Mon, Jan 23, 2023 at 10:00 PM James Coleman <jtc...@gmail.com> wrote: >> >> Which this patch we do in fact now see (as expected) rels with >> non-empty lateral_relids showing up in generate_[useful_]gather_paths. >> And the partial paths can now have non-empty required outer rels. I'm >> not able to come up with a plan that would actually be caught by those >> checks; I theorize that because of the few places we actually call >> generate_[useful_]gather_paths we are in practice already excluding >> those, but for now I've left these as a conditional rather than an >> assertion because it seems like the kind of guard we'd want to ensure >> those methods are safe. > > > I'm trying to understand this part. AFAICS we will not create partial > paths for a rel, base or join, if it has lateral references. So it > seems to me that in generate_[useful_]gather_paths after we've checked > that there are partial paths, the checks for lateral_relids are not > necessary because lateral_relids should always be empty in this case. > Maybe I'm missing something.
At first I was thinking "isn't the point of the patch to generate partial paths for rels with lateral references" given what I'd written back in January, but I added "Assert(bms_is_empty(required_outer));" to both of those functions and the assertion never fails running the tests (including my newly parallelizable queries). I'm almost positive I'd checked this back in January (not only had I'd explicitly written that I'd confirmed we had non-empty lateral_relids there, but also it was the entire based of the alternate approach to the patch), but...I can't go back to 5 months ago and remember what I'd done. Ah! Your comment about "after we've checked that there are partial paths" triggered a thought. I think originally I'd had the "bms_is_subset(required_outer, rel->relids)" check first in these functions. And indeed if I run the tests with that the assertion moved to above the partial paths check, I get failures in generate_useful_gather_paths specifically. Mystery solved! > And while trying the v9 patch I came across a crash with the query > below. > > set min_parallel_table_scan_size to 0; > set parallel_setup_cost to 0; > set parallel_tuple_cost to 0; > > explain (costs off) > select * from pg_description t1 where objoid in > (select objoid from pg_description t2 where t2.description = > t1.description); > QUERY PLAN > -------------------------------------------------------- > Seq Scan on pg_description t1 > Filter: (SubPlan 1) > SubPlan 1 > -> Gather > Workers Planned: 2 > -> Parallel Seq Scan on pg_description t2 > Filter: (description = t1.description) > (7 rows) > > select * from pg_description t1 where objoid in > (select objoid from pg_description t2 where t2.description = > t1.description); > WARNING: terminating connection because of crash of another server process > > Seems something is wrong when extracting the argument from the Param in > parallel worker. With what I'm trying to change I don't think this plan should ever be generated since it means we'd have to pass a param from the outer seq scan into the parallel subplan, which we can't do (currently). I've attached the full backtrace to the email, but as you hinted at the parallel worker is trying to get the param (in this case detoasting it), but the param doesn't exist on the worker, so it seg faults. Looking at this further I think there's an existing test case that exposes the misplanning here (the one right under the comment "Parallel Append is not to be used when the subpath depends on the outer param" in select_parallel.sql), but it doesn't seg fault because the param is an integer, doesn't need to be detoasted, and therefore (I think) we skate by (but probably with wrong results in depending on the dataset). Interestingly this is one of the existing test queries my original patch approach didn't change, so this gives me something specific to work with improving the path. Thanks for testing this and bringing this to my attention! BTW are you by any chance testing on ARM macOS? I reproduced the issue there, but for some reason I did not reproduce the error (and the plan wasn't parallelized) when I tested this on linux. Perhaps I missed setting something up; it seems odd. > BTW another rebase is needed as it no longer applies to HEAD. Apologies; I'd rebased, but hadn't updated the thread. See attached for an updated series (albeit still broken on your test query). Thanks, James
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x000000010449cab0 postgres`toast_raw_datum_size(value=0) at detoast.c:550:6 frame #1: 0x0000000104cf9438 postgres`texteq(fcinfo=0x0000000112809c80) at varlena.c:1645:10 frame #2: 0x00000001047ca7ac postgres`ExecInterpExpr(state=0x00000001128097e0, econtext=0x0000000112808c48, isnull=0x000000016b97e39f) at execExprInterp.c:758:8 frame #3: 0x00000001047c98bc postgres`ExecInterpExprStillValid(state=0x00000001128097e0, econtext=0x0000000112808c48, isNull=0x000000016b97e39f) at execExprInterp.c:1870:9 frame #4: 0x00000001047ee1b0 postgres`ExecEvalExprSwitchContext(state=0x00000001128097e0, econtext=0x0000000112808c48, isNull=0x000000016b97e39f) at executor.h:355:13 frame #5: 0x00000001047edddc postgres`ExecQual(state=0x00000001128097e0, econtext=0x0000000112808c48) at executor.h:424:8 frame #6: 0x00000001047ed964 postgres`ExecScan(node=0x0000000112808d68, accessMtd=(postgres`SeqNext at nodeSeqscan.c:51), recheckMtd=(postgres`SeqRecheck at nodeSeqscan.c:90)) at exec Scan.c:226:23 frame #7: 0x0000000104837ea8 postgres`ExecSeqScan(pstate=0x0000000112808d68) at nodeSeqscan.c:112:9 frame #8: 0x00000001047e8ca0 postgres`ExecProcNodeFirst(node=0x0000000112808d68) at execProcnode.c:464:9 frame #9: 0x00000001047dfd90 postgres`ExecProcNode(node=0x0000000112808d68) at executor.h:273:9 frame #10: 0x00000001047daa28 postgres`ExecutePlan(estate=0x0000000112808318, planstate=0x0000000112808d68, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuple s=0, direction=ForwardScanDirection, dest=0x00000001130389d8, execute_once=true) at execMain.c:1661:10 frame #11: 0x00000001047da8d4 postgres`standard_ExecutorRun(queryDesc=0x0000000113039738, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:365:3 frame #12: 0x00000001047da644 postgres`ExecutorRun(queryDesc=0x0000000113039738, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:309:3 frame #13: 0x00000001047e3230 postgres`ParallelQueryMain(seg=0x000000011300a5b8, toc=0x0000000105434000) at execParallel.c:1464:2 frame #14: 0x000000010458a8d0 postgres`ParallelWorkerMain(main_arg=4065634722) at parallel.c:1521:2 frame #15: 0x00000001049e70e4 postgres`StartBackgroundWorker at bgworker.c:861:2 frame #16: 0x00000001049f64b8 postgres`do_start_bgworker(rw=0x0000000113804080) at postmaster.c:5762:4 frame #17: 0x00000001049eed90 postgres`maybe_start_bgworkers at postmaster.c:5986:9 frame #18: 0x00000001049f1764 postgres`process_pm_pmsignal at postmaster.c:5149:3 frame #19: 0x00000001049eef7c postgres`ServerLoop at postmaster.c:1770:5 frame #20: 0x00000001049edb48 postgres`PostmasterMain(argc=3, argv=0x00006000016453c0) at postmaster.c:1463:11 frame #21: 0x00000001048782f4 postgres`main(argc=3, argv=0x00006000016453c0) at main.c:198:3 frame #22: 0x00000001a08cff28 dyld`start + 2236
v10-0001-Add-tests-before-change.patch
Description: Binary data
v10-0002-Parallelize-correlated-subqueries.patch
Description: Binary data