On Fri, Sep 23, 2022 at 12:11 AM John Naylor <john.nay...@enterprisedb.com> wrote: > > > On Thu, Sep 22, 2022 at 11:46 AM John Naylor <john.nay...@enterprisedb.com> > wrote: > > One thing I want to try soon is storing fewer than 16/32 etc entries, so > > that the whole node fits comfortably inside a power-of-two allocation. That > > would allow us to use aset without wasting space for the smaller nodes, > > which would be faster and possibly would solve the fragmentation problem > > Andres referred to in > > > https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de > > While calculating node sizes that fit within a power-of-two size, I noticed > the current base node is a bit wasteful, taking up 8 bytes. The node kind > only has a small number of values, so it doesn't really make sense to use an > enum here in the struct (in fact, Andres' prototype used a uint8 for > node_kind). We could use a bitfield for the count and kind: > > uint16 -- kind and count bitfield > uint8 shift; > uint8 chunk; > > That's only 4 bytes. Plus, if the kind is ever encoded in a pointer tag, the > bitfield can just go back to being count only.
Good point, agreed. > > Here are the v6 node kinds: > > node4: 8 + 4 +(4) + 4*8 = 48 bytes > node16: 8 + 16 + 16*8 = 152 > node32: 8 + 32 + 32*8 = 296 > node128: 8 + 256 + 128/8 + 128*8 = 1304 > node256: 8 + 256/8 + 256*8 = 2088 > > And here are the possible ways we could optimize nodes for space using aset > allocation. Parentheses are padding bytes. Even if my math has mistakes, the > numbers shouldn't be too far off: > > node3: 4 + 3 +(1) + 3*8 = 32 bytes > node6: 4 + 6 +(6) + 6*8 = 64 > node13: 4 + 13 +(7) + 13*8 = 128 > node28: 4 + 28 + 28*8 = 256 > node31: 4 + 256 + 32/8 + 31*8 = 512 (XXX not good) > node94: 4 + 256 + 96/8 + 94*8 = 1024 > node220: 4 + 256 + 224/8 + 220*8 = 2048 > node256: = 4096 > > The main disadvantage is that node256 would balloon in size. Yeah, node31 and node256 are bloated. We probably could use slab for node256 independently. It's worth trying a benchmark to see how it affects the performance and the tree size. BTW We need to consider not only aset/slab but also DSA since we allocate dead tuple TIDs on DSM in parallel vacuum cases. FYI DSA uses the following size classes: static const uint16 dsa_size_classes[] = { sizeof(dsa_area_span), 0, /* special size classes */ 8, 16, 24, 32, 40, 48, 56, 64, /* 8 classes separated by 8 bytes */ 80, 96, 112, 128, /* 4 classes separated by 16 bytes */ 160, 192, 224, 256, /* 4 classes separated by 32 bytes */ 320, 384, 448, 512, /* 4 classes separated by 64 bytes */ 640, 768, 896, 1024, /* 4 classes separated by 128 bytes */ 1280, 1560, 1816, 2048, /* 4 classes separated by ~256 bytes */ 2616, 3120, 3640, 4096, /* 4 classes separated by ~512 bytes */ 5456, 6552, 7280, 8192 /* 4 classes separated by ~1024 bytes */ }; node256 will be classed as 2616, which is still not good. Anyway, I'll implement DSA support for radix tree. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com