On Tuesday, 17 December 2013 at 20:54:56 UTC, H. S. Teoh wrote:
big it got before I killed the process). Suffice it to say that this is a combinatorial problem, so the number of entries grow exponentially; anything that can help reduce the storage requirements / I/O latency
would be a big help.

If you can buffer the queries in a queue then you can issue a prefetch-request to the OS to bring in the memory-page from disk when you put it into the queue to prevent the process from being put to sleep, the length of the queue has to be tailored to how fast it can load the page.

If the data is uniformly distributed then perhaps you could partition the disk-space with a n-dimensional grid, and then have a key-value store that you page into memory?

If you can do some queries out of order then you probably should set up some "buckets"/"bins" in the queue and prefetch the page that is referenced by the fullest/oldest bucket if you can do some queries out of order. Just to avoid that pages are pushed in and out of memory all the time.

Otherwise a kd-tree with a scaled grid per leaf that stays within a memorypage, probably could work, but that sounds like a lot of work. Implementing n-dimensional datastructures is conceptually easy, but tedious in reality (can you trust it to be bug free? It is hard to visualize in text.).

If you have to do the spatial datastructure yourself, then perhaps an octree or n-dimensional oc-tree would be helpful. You could make it shallow and in memory and use it both to buffer queries and to store indexes to disk-data. It would be 2^N nodes per level. (2D data has 4 nodes, 3D has 8 nodes)

I once read a paper of mapping GIS data to a regular database (mapping 2D to 1D) using hilbert curves. Not sure if that is of any help.

Hm.

Reply via email to