Hi, @@ -747,7 +747,7 @@ try_hashjoin_path(PlannerInfo *root, * never have any output pathkeys, per comments in create_hashjoin_path. */ initial_cost_hashjoin(root, &workspace, jointype, hashclauses, - outer_path, inner_path, extra); + outer_path, inner_path, extra, false); if (add_path_precheck(joinrel, workspace.startup_cost, workspace.total_cost, @@ -761,6 +761,7 @@ try_hashjoin_path(PlannerInfo *root, extra, outer_path, inner_path, + false, /* parallel_hash */ extra->restrictlist, required_outer, hashclauses)); @@ -776,6 +777,10 @@ try_hashjoin_path(PlannerInfo *root, * try_partial_hashjoin_path * Consider a partial hashjoin join path; if it appears useful, push it into * the joinrel's partial_pathlist via add_partial_path(). + * The outer side is partial. If parallel_hash is true, then the inner path + * must be partial and will be run in parallel to create one or more shared + * hash tables; otherwise the inner path must be complete and a copy of it + * is run in every process to create separate identical private hash tables. */
When do we have "or more shared hash tables" rather than one? Are you thinking about subordinate nodes? @@ -1839,6 +1846,10 @@ hash_inner_and_outer(PlannerInfo *root, * able to properly guarantee uniqueness. Similarly, we can't handle * JOIN_FULL and JOIN_RIGHT, because they can produce false null * extended rows. Also, the resulting path must not be parameterized. + * We should be able to support JOIN_FULL and JOIN_RIGHT for Parallel + * Hash, since in that case we're back to a single hash table with a + * single set of match bits for each batch, but that will require + * figuring out a deadlock-free way to wait for the probe to finish. */ s/should be able/would be able/? index 6a45b68e5df..2d38a5efae8 100644 --- a/src/backend/storage/ipc/barrier.c +++ b/src/backend/storage/ipc/barrier.c @@ -451,7 +451,6 @@ BarrierDetachImpl(Barrier *barrier, bool arrive) release = true; barrier->arrived = 0; ++barrier->phase; - Assert(barrier->selected); barrier->selected = false; } Uh, what? diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 35523bd8065..40a076d976f 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -5821,6 +5821,9 @@ analyze extremely_skewed; insert into extremely_skewed select 42 as id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' from generate_series(1, 19000); +-- Make a relation with a couple of enormous tuples. +create table wide as select generate_series(1, 2) as id, rpad('', 320000, 'x') as t; +alter table wide set (parallel_workers = 2); I'm doubtful this is actually going to be a wide tuple - this'll get compressed down quite a bit, no? postgres[26465][1]=# SELECT octet_length(t), pg_column_size(t) FROM wide ; ┌──────────────┬────────────────┐ │ octet_length │ pg_column_size │ ├──────────────┼────────────────┤ │ 320000 │ 3671 │ │ 320000 │ 3671 │ └──────────────┴────────────────┘ (2 rows) (and yes, it's ridiculous that a compressed datum of that size still takes up 3kb) +-- parallel with parallel-aware hash join +set max_parallel_workers_per_gather = 2; +set work_mem = '128kB'; +set enable_parallel_hash = on; I think it'd be better if we structured the file so we just sat guc's with SET LOCAL inside a transaction. +-- parallel with parallel-aware hash join +set max_parallel_workers_per_gather = 2; +set work_mem = '64kB'; +set enable_parallel_hash = on; +explain (costs off) + select count(*) from simple r join extremely_skewed s using (id); + QUERY PLAN +----------------------------------------------------------------------- + Finalize Aggregate + -> Gather + Workers Planned: 2 + -> Partial Aggregate + -> Parallel Hash Join + Hash Cond: (r.id = s.id) + -> Parallel Seq Scan on simple r + -> Parallel Hash + -> Parallel Seq Scan on extremely_skewed s +(9 rows) + +select count(*) from simple r join extremely_skewed s using (id); + count +------- + 20000 +(1 row) As written before, I think it'd be good if we extracted the number of batches from json output to be sure things are going sensibly. All-in-all this part looks fairly boring. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers