Hi hackers, Here is a new version of my parallel-aware hash join patchset. I've dropped 'shared' from the feature name and EXPLAIN output since that's now implied by the word "Parallel" (that only made sense in earlier versions that had Shared Hash and Parallel Shared Hash, but a Shared Hash with just one participant building it didn't turn out to be very useful so I dropped it a few versions ago). I figured for this new round I should create a new thread, but took the liberty of copying the CC list from the previous one[1].
The main changes are: 1. Implemented the skew optimisation for parallel-aware mode. The general approach is the same as the regular hash table: insert with a CAS loop. The details of memory budget management are different though. It grants chunks of budget to participants as needed even though allocation is still per-tuple, and it has to deal with concurrent bucket removal. I removed one level of indirection from the skew hash table: in this version hashtable->skewBucket is an array of HashSkewBucket instead of pointers to HashSkewBuckets allocated separately. That makes the hash table array twice as big but avoids one pointer hop when probing an active bucket; that refactoring was not strictly necessary but made the changes to support parallel build simpler. 2. Simplified costing. There is now just one control knob "parallel_synchronization_cost", which I charge for each time the participants will wait for each other at a barrier, to be set high enough to dissuade the planner from using Parallel Hash for tiny hash tables that would be faster in a parallel-oblivious hash join. Earlier ideas about modelling the cost of shared memory access didn't work out. Status: I think there are probably some thinkos in the new skew stuff. I think I need some new ideas about how to refactor things so that there isn't quite so much "if-shared-then-this-else-that". I think I should build some kind of test mode to control barriers so that I can test the permutations of participant arrival phase exhaustively. I need to propose an empirically derived default for the GUC. There are several other details I would like to tidy up and improve. That said, I wanted to post what I have as a checkpoint now that I have the major remaining piece (skew optimisation) more-or-less working and the costing at a place that I think make sense. I attach some queries to exercise various interesting cases. I would like to get something like these into fast-running regression test format. Note that this patch requires the shared record typmod patch[2] in theory, since shared hash table tuples might reference bless record types, but there is no API dependency so you can use this patch set without applying that one. If anyone knows how to actually provoke a parallel hash join that puts RECORD types into the hash table, I'd be very interested to hear about it, but certainly for TPC and similar testing that other patch set is not necessary. Of the TPC-H queries, I find that Q3, Q5, Q7, Q8, Q9, Q10, Q12, Q14, Q16, Q18, Q20 and Q21 make use of Parallel Hash nodes (I tested with neqjoinsel-fix-v3.patch[3] also applied, which avoids some but not all craziness in Q21). For examples that also include a parallel-oblivious Hash see Q8 and Q10: in those queries you can see the planner deciding that it's not worth paying parallel_synchronization_cost = 10 to load the 25 row "nation" table. I'll report on performance separately. [1] https://www.postgresql.org/message-id/flat/CAEepm=2W=cokizxcg6qifqp-dhue09aqtremm7yjdrhmhdv...@mail.gmail.com [2] https://www.postgresql.org/message-id/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kzwqk3g5ygn3mdv7a8da...@mail.gmail.com [3] https://www.postgresql.org/message-id/CAEepm%3D3%3DNHHko3oOzpik%2BggLy17AO%2Bpx3rGYrg3x_x05%2BBr9-A%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
parallel-hash-v16.patchset.tgz
Description: GNU Zip compressed data
hj-test-queries.sql
Description: Binary data
hj-skew.sql
Description: Binary data
hj-skew-unmatched.sql
Description: Binary data
hj-skew-overflow.sql
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers