Hi David, Do you plan to work continue working on this patch? I did take a look, and on the whole it looks reasonable - it modifies the right places etc.
I think there are two things that may need an improvement: 1) Storing variable-length data in ParallelBitmapHeapState I agree with Robert the snapshot_and_stats name is not great. I see Dmitry mentioned phs_snapshot_off as used by ParallelTableScanDescData - the reasons are somewhat different (phs_snapshot_off exists because we don't know which exact struct will be allocated), while here we simply need to allocate two variable-length pieces of memory. But it seems like it would work nicely for this. That is, we could try adding an offset for each of those pieces of memory: - snapshot_off - stats_off I don't like the GetSharedSnapshotData name very much, it seems very close to GetSnapshotData - quite confusing, I think. Dmitry also suggested we might add a separate piece of shared memory. I don't quite see how that would work for ParallelBitmapHeapState, but I doubt it'd be simpler than having two offsets. I don't think the extra complexity (paid by everyone) would be worth it just to make EXPLAIN ANALYZE work. 2) Leader vs. worker counters It seems to me this does nothing to add the per-worker values from "Heap Blocks" into the leader, which means we get stuff like this: Heap Blocks: exact=102 lossy=10995 Worker 0: actual time=50.559..209.773 rows=215253 loops=1 Heap Blocks: exact=207 lossy=19354 Worker 1: actual time=50.543..211.387 rows=162934 loops=1 Heap Blocks: exact=161 lossy=14636 I think this is wrong / confusing, and inconsistent with what we do for other nodes. It's also inconsistent with how we deal e.g. with BUFFERS, where we *do* add the values to the leader: Heap Blocks: exact=125 lossy=10789 Buffers: shared hit=11 read=45420 Worker 0: actual time=51.419..221.904 rows=150437 loops=1 Heap Blocks: exact=136 lossy=13541 Buffers: shared hit=4 read=13541 Worker 1: actual time=56.610..222.469 rows=229738 loops=1 Heap Blocks: exact=209 lossy=20655 Buffers: shared hit=4 read=20655 Here it's not entirely obvious, because leader participates in the execution, but once we disable leader participation, it's clearer: Buffers: shared hit=7 read=45421 Worker 0: actual time=28.540..247.683 rows=309112 loops=1 Heap Blocks: exact=282 lossy=27806 Buffers: shared hit=4 read=28241 Worker 1: actual time=24.290..251.993 rows=190815 loops=1 Heap Blocks: exact=188 lossy=17179 Buffers: shared hit=3 read=17180 Not only is "Buffers" clearly a sum of per-worker stats, but the "Heap Blocks" simply disappeared because the leader does nothing and we don't print zeros. 3) I'm not sure dealing with various EXPLAIN flags may not be entirely correct. Consider this: EXPLAIN (ANALYZE): -> Parallel Bitmap Heap Scan on t (...) Recheck Cond: (a < 5000) Rows Removed by Index Recheck: 246882 Worker 0: Heap Blocks: exact=168 lossy=15648 Worker 1: Heap Blocks: exact=302 lossy=29337 EXPLAIN (ANALYZE, VERBOSE): -> Parallel Bitmap Heap Scan on public.t (...) Recheck Cond: (t.a < 5000) Rows Removed by Index Recheck: 246882 Worker 0: actual time=35.067..300.882 rows=282108 loops=1 Heap Blocks: exact=257 lossy=25358 Worker 1: actual time=32.827..302.224 rows=217819 loops=1 Heap Blocks: exact=213 lossy=19627 EXPLAIN (ANALYZE, BUFFERS): -> Parallel Bitmap Heap Scan on t (...) Recheck Cond: (a < 5000) Rows Removed by Index Recheck: 246882 Buffers: shared hit=7 read=45421 Worker 0: Heap Blocks: exact=236 lossy=21870 Worker 1: Heap Blocks: exact=234 lossy=23115 EXPLAIN (ANALYZE, VERBOSE, BUFFERS): -> Parallel Bitmap Heap Scan on public.t (...) Recheck Cond: (t.a < 5000) Rows Removed by Index Recheck: 246882 Buffers: shared hit=7 read=45421 Worker 0: actual time=28.265..260.381 rows=261264 loops=1 Heap Blocks: exact=260 lossy=23477 Buffers: shared hit=3 read=23478 Worker 1: actual time=28.224..261.627 rows=238663 loops=1 Heap Blocks: exact=210 lossy=21508 Buffers: shared hit=4 read=21943 Why should the per-worker buffer info be shown when combined with the VERBOSE flag, and not just with BUFFERS, when the patch shows the per-worker info always? 4) Now that I think about this, isn't the *main* problem really that we don't display the sum of the per-worker stats (which I think is wrong)? I mean, we already can get the worker details VERBOSEm right? So the only reason to display that by default seems to be that it the values in "Heap Blocks" are from the leader only. BTW doesn't this also suggest some of the code added to explain.c may not be quite necessary? Wouldn't it be enough to just "extend" the existing code printing per-worker stats. (I haven't tried, so maybe I'm wrong and we need the new code.) regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company