On 2015-07-03 13:59:07 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > > 2) Replace relation forks, with the exception of the init fork which is > > special anyway, with separate relfilenodes. Stored in seperate > > columns in pg_class. > > Different AMs have different fork needs; for heaps you want one main > fork, one VM, one fsm. But for indexes, the VM fork is not necessary, > and some AMs might want different ones. For instance, GIN would benefit > from having separate forks to store the internal indexes, and BRIN would > benefit from a separate fork for the revmap. > > What I'm saying is that I'm not sure it's okay to store the forks in > pg_class columns
Right. Part of the point of this design is that you could easily add further forks without system wide knowledge and that it is *not* required anymore that all relfilenodes are in one column. I think it'd probably make sense to have at least _vm and _fsm in pg_class, but we could easily add further ones elsewhere. > instead perhaps we should have a separate catalog on > which each relation can have many forks, or perhaps have the pg_class > entry store an array (ick). Or perhaps rather than "relvmfork" (the > pg_class column for the relfilenode for the VM fork) we could store > relfilenode1, relfilenode2 where the value for each N fork is > AM-specific. (so for heaps 1 is main, 2 is FSM, 3 is VM; for BRIN 1 is > main, 2 is revmap; and so forth). None of these sound particularly pretty to me. An array of relfilenodes would probably be the least ugly one. > FWIW the whole idea seems reasonable to me. I worry about concurrent > traffic into the pg_relfilenode shared table -- if temp table creation > is common across many databases, is it going to become a contention > point? I don't think it'll be. It's essentially just inserts into a tiny table with a single index, right? We can do a bootload of inserts into one of these. In an *assert enabled* build: CREATE TABLE pg_relfilenode() WITH OIDS; ALTER TABLE pg_relfilenode ADD CONSTRAINT pg_relfilenode_pkey PRIMARY KEY(oid); which is pretty much how pg_relfilenode would look like. Although we'd not go through the whole executor for the inserts. pgbench -h localhost -p 5440 postgres -n -f /tmp/pg_relfilenode.sql -j 16 -c 16 -T 20 -P 1 cat /tmp/pg_relfilenode.sql: INSERT INTO pg_relfilenode DEFAULT VALUES; progress: 5.0 s, 32168.4 tps, lat 0.495 ms stddev 1.728 progress: 6.0 s, 33719.6 tps, lat 0.473 ms stddev 0.773 andres@awork2:~/src/postgresql$ pgbench -h localhost -p 5440 postgres -n -f /tmp/temptable.sql -j 16 -c 16 -T 20 -P 1 CREATE TEMPORARY TABLE blarg() ON COMMIT DROP; progress: 6.0 s, 5018.2 tps, lat 3.185 ms stddev 3.671 progress: 7.0 s, 4890.9 tps, lat 3.272 ms stddev 4.346 and that's with zero actual columns. If you instead add some: CREATE TEMPORARY TABLE blarg(id serial primary key, data text, blarg int) ON COMMIT DROP; progress: 7.0 s, 974.1 tps, lat 16.462 ms stddev 9.058 progress: 8.0 s, 999.9 tps, lat 16.045 ms stddev 7.011 So no, I don't think this'll be a relevant problem. We do so many inserts for a single temp table that a single insert into another one completely vanishes in comparison. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers