Re: [HACKERS] Free Space Map data structure

Heikki Linnakangas Fri, 11 Apr 2008 05:09:37 -0700

Hannu Krosing wrote:

BTW, I'm pretty sure I have figured out the method for storing minimal
required binary tree as an array, where adding each page adds exactly
one upper node. The node added is often not the immediate parent, but is
always the one needed for covering all nodes.


I just have to write it up in an understandable way and then you all can
look at it and tell if it is something well-known from Knuth or Date ;)


Find sample code attached:

not optimal at all, meant as proof-of-concept.

just run the file to see how it works, some comments in the beginning.

currently it interleaves leaf and internal nodes, but it may be better
for some uses (like seqscanning leaf nodes when searching for clustered
pos) to separate them, for example having 1st 4k for lef nodes and 2nd
for internal nodes.

also, I think that having a fan-out factor above 2 (making it more like
b-tree instead of binary tree) would be more efficient due to CPU

caches, but it takes some more work to figure it out.

At least it would be more storage efficient, as you wouldn't need asmany non-leaf nodes. You would need more comparisons when traversing orupdating the tree, but as you pointed out that might be very cheapbecause of cache effects.

Within a page, the traditional array representation, where root is atposition 0, it's children are at 1, 2 and so forth, is actually OK. Whenthe depth of the tree changes, you'll need to memcpy data around andrebuild parent nodes, but that's acceptable. Or we can fix the depth ofthe tree to the max. that fits on a page, and fill it with zeros whenit's not full; all but the rightmost pages are always full anyway.

Scaling into multiple pages, we don't want to move nodes across pageboundaries when extending the relation, so we do need something likeyour scheme for that. Taking the upper nodes into account, one FSM pagecan hold free space information of ~4k heap (or lower level FSM) pages.IOW, we have a fan out of 4k across pages. With a fanout like that, weonly need 3-4 levels to address over 500 TB of data.

I've attached a diagram illustrating what I have in mind. In thediagram, each page can hold only 7 nodes, but in reality that would be ~BLCKSZ/2, or the 4k with default block size. The heap consists of 10pages, the bottom leaf nodes correspond the heap pages. The leaf nodesof the upper FSM page store the max the lower FSM pages; they shouldmatch the top nodes of the lower FSM pages. The rightmost nodes in thetree, colored grey, are unused.

I'm pretty happy with this structure. The concurrency seems reasonablygood, though will need to figure out how exactly the locking should work.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

<<inline: fsm-drawing.png>>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Free Space Map data structure

Reply via email to