Thomas Munro <[email protected]> writes:
>> This is explained by the early exit case in
>> ExecParallelHashEnsureBatchAccessors(). With just the right timing,
>> it finishes up not reporting the true nbatch number, and never calling
>> ExecParallelHashUpdateSpacePeak().
> Hi Tom,
> You mentioned that prairiedog sees the problem about one time in
> thirty. Would you mind checking if it goes away with this patch
> applied?
I've run 55 cycles of "make installcheck" without seeing a failure
with this patch installed. That's not enough to be totally sure
of course, but I think this probably fixes it.
However ... I noticed that my other dinosaur gaur shows the other failure
mode we see in the buildfarm, the "increased_batches = t" diff, and
I can report that this patch does *not* help that. The underlying
EXPLAIN output goes from something like
! Finalize Aggregate (cost=823.85..823.86 rows=1 width=8) (actual
time=1378.102..1378.105 rows=1 loops=1)
! -> Gather (cost=823.63..823.84 rows=2 width=8) (actual
time=1377.909..1378.006 rows=3 loops=1)
! Workers Planned: 2
! Workers Launched: 2
! -> Partial Aggregate (cost=823.63..823.64 rows=1 width=8) (actual
time=1280.298..1280.302 rows=1 loops=3)
! -> Parallel Hash Join (cost=387.50..802.80 rows=8333
width=0) (actual time=1070.179..1249.142 rows=6667 loops=3)
! Hash Cond: (r.id = s.id)
! -> Parallel Seq Scan on simple r (cost=0.00..250.33
rows=8333 width=4) (actual time=0.173..62.063 rows=6667 loops=3)
! -> Parallel Hash (cost=250.33..250.33 rows=8333
width=4) (actual time=454.305..454.305 rows=6667 loops=3)
! Buckets: 4096 Batches: 8 Memory Usage: 208kB
! -> Parallel Seq Scan on simple s
(cost=0.00..250.33 rows=8333 width=4) (actual time=0.178..67.115 rows=6667
loops=3)
! Planning time: 1.861 ms
! Execution time: 1687.311 ms
to something like
! Finalize Aggregate (cost=823.85..823.86 rows=1 width=8) (actual
time=1588.733..1588.737 rows=1 loops=1)
! -> Gather (cost=823.63..823.84 rows=2 width=8) (actual
time=1588.529..1588.634 rows=3 loops=1)
! Workers Planned: 2
! Workers Launched: 2
! -> Partial Aggregate (cost=823.63..823.64 rows=1 width=8) (actual
time=1492.631..1492.635 rows=1 loops=3)
! -> Parallel Hash Join (cost=387.50..802.80 rows=8333
width=0) (actual time=1270.309..1451.501 rows=6667 loops=3)
! Hash Cond: (r.id = s.id)
! -> Parallel Seq Scan on simple r (cost=0.00..250.33
rows=8333 width=4) (actual time=0.219..158.144 rows=6667 loops=3)
! -> Parallel Hash (cost=250.33..250.33 rows=8333
width=4) (actual time=634.614..634.614 rows=6667 loops=3)
! Buckets: 4096 (originally 4096) Batches: 16
(originally 8) Memory Usage: 176kB
! -> Parallel Seq Scan on simple s
(cost=0.00..250.33 rows=8333 width=4) (actual time=0.182..120.074 rows=6667
loops=3)
! Planning time: 1.931 ms
! Execution time: 2219.417 ms
so again we have a case where the plan didn't change but the execution
behavior did. This isn't quite 100% reproducible on gaur/pademelon,
but it fails more often than not seems like, so I can poke into it
if you can say what info would be helpful.
regards, tom lane