Hi hackers, Tuples can have type RECORDOID and a typmod number that identifies a "blessed" TupleDesc in a backend-private cache. To support the sharing of such tuples through shared memory and temporary files, I think we need a typmod registry in shared memory. Here's a proof-of-concept patch for discussion. I'd be grateful for any feedback and/or flames.
This is a problem I ran into in my parallel hash join project. Robert pointed it out to me and told me to go read tqueue.c for details, and my first reaction was: I'll code around this by teaching the planner to avoid sharing tuples from paths that produce transient record types based on tlist analysis[1]. Aside from being a cop-out, that approach doesn't work because the planner doesn't actually know what types the executor might come up with since some amount of substitution for structurally-similar records seems to be allowed[2] (though I'm not sure I can explain that). So... we're gonna need a bigger boat. The patch uses typcache.c's backend-private cache still, but if the backend is currently "attached" to a shared registry then it functions as a write though cache. There is no cache-invalidation problem because registered typmods are never unregistered. parallel.c exports the leader's existing record typmods into a shared registry, and attaches to it in workers. A DSM detach hook returns backends to private cache mode when parallelism ends. Some thoughts: * Maybe it would be better to have just one DSA area, rather than the one controlled by execParallel.c (for executor nodes to use) and this new one controlled by parallel.c (for the ParallelContext). Those scopes are approximately the same at least in the parallel query case, but... * It would be nice for the SharedRecordTypeRegistry to be able to survive longer than a single parallel query, perhaps in a per-session DSM segment. Perhaps eventually we will want to consider a query-scoped area, a transaction-scoped area and a session-scoped area? I didn't investigate that for this POC. * It seemed to be a reasonable goal to avoid allocating an extra DSM segment for every parallel query, so the new DSA area is created in-place. 192KB turns out to be enough to hold an empty SharedRecordTypmodRegistry due to dsa.c's superblock allocation scheme (that's two 64KB size class superblocks + some DSA control information). It'll create a new DSM segment as soon as you start using blessed records, and will do so for every parallel query you start from then on with the same backend. Erm, maybe adding 192KB to every parallel query DSM segment won't be popular... * Perhaps simplehash + an LWLock would be better than dht, but I haven't looked into that. Can it be convinced to work in DSA memory and to grow on demand? Here's one way to hit the new code path, so that record types blessed in a worker are accessed from the leader: CREATE TABLE foo AS SELECT generate_series(1, 10) AS x; CREATE OR REPLACE FUNCTION make_record(n int) RETURNS RECORD LANGUAGE plpgsql PARALLEL SAFE AS $$ BEGIN RETURN CASE n WHEN 1 THEN ROW(1) WHEN 2 THEN ROW(1, 2) WHEN 3 THEN ROW(1, 2, 3) WHEN 4 THEN ROW(1, 2, 3, 4) ELSE ROW(1, 2, 3, 4, 5) END; END; $$; SET force_parallel_mode = 1; SELECT make_record(x) FROM foo; PATCH 1. Apply dht-v3.patch[3]. 2. Apply shared-record-typmod-registry-v1.patch. 3. Apply rip-out-tqueue-remapping-v1.patch. [1] https://www.postgresql.org/message-id/CAEepm%3D2%2Bzf7L_-eZ5hPW5%3DUS%2Butdo%3D9tMVD4wt7ZSM-uOoSxWg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CA+TgmoZMH6mJyXX=ylsovj8julfqggxwzcr_rbkc1nj+177...@mail.gmail.com [3] https://www.postgresql.org/message-id/flat/CAEepm%3D3d8o8XdVwYT6O%3DbHKsKAM2pu2D6sV1S_%3D4d%2BjStVCE7w%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
rip-out-tqueue-remapping-v1.patch
Description: Binary data
shared-record-typmod-registry-v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers