Re: [OT] Efficient file structure for very large lookup tables?

Ola Fosheim Grøstad Tue, 17 Dec 2013 13:31:05 -0800

On Tuesday, 17 December 2013 at 20:54:56 UTC, H. S. Teoh wrote:

big it got before I killed the process). Suffice it to say thatthis isa combinatorial problem, so the number of entries growexponentially;anything that can help reduce the storage requirements / I/Olatency
would be a big help.

If you can buffer the queries in a queue then you can issue aprefetch-request to the OS to bring in the memory-page from diskwhen you put it into the queue to prevent the process from beingput to sleep, the length of the queue has to be tailored to howfast it can load the page.

If the data is uniformly distributed then perhaps you couldpartition the disk-space with a n-dimensional grid, and then havea key-value store that you page into memory?

If you can do some queries out of order then you probably shouldset up some "buckets"/"bins" in the queue and prefetch the pagethat is referenced by the fullest/oldest bucket if you can dosome queries out of order. Just to avoid that pages are pushed inand out of memory all the time.

Otherwise a kd-tree with a scaled grid per leaf that stays withina memorypage, probably could work, but that sounds like a lot ofwork. Implementing n-dimensional datastructures is conceptuallyeasy, but tedious in reality (can you trust it to be bug free? Itis hard to visualize in text.).

If you have to do the spatial datastructure yourself, thenperhaps an octree or n-dimensional oc-tree would be helpful. Youcould make it shallow and in memory and use it both to bufferqueries and to store indexes to disk-data. It would be 2^N nodesper level. (2D data has 4 nodes, 3D has 8 nodes)

I once read a paper of mapping GIS data to a regular database(mapping 2D to 1D) using hilbert curves. Not sure if that is ofany help.

Hm.

Re: [OT] Efficient file structure for very large lookup tables?

Reply via email to