Re: [OT] Best algorithm for extremely large hashtable?

Vladimir Panteleev Fri, 15 Nov 2013 11:31:06 -0800

On Friday, 15 November 2013 at 18:43:12 UTC, H. S. Teoh wrote:

This isn't directly related to D (though the code will be inD), and I
thought this would be a good place to ask.
I'm trying to implement an algorithm that traverses a verylarge graph,and I need some kind of data structure to keep track of whichnodes havebeen visited, that (1) allows reasonably fast lookups(preferably O(1)),and (2) doesn't require GB's of storage (i.e., some kind ofcompression
would be nice).

A while ago I set out to write a solver for a group of problemswhich can be described as traversing in breath extremely largeimplicit graphs. Some code here (C++):https://github.com/CyberShadow/DDD. The project uses delayedduplicate detection, to allow the number of nodes to exceedavailable RAM.

What we found was that in certain cases, delayed duplicatedetection beat hash tables even while filtering duplicates inmemory, I think mainly because DDD is more parallelizable thanhash tables.

You mentioned compression - perhaps a bloom filter will fit yourneeds, as an optimization?

Re: [OT] Best algorithm for extremely large hashtable?

Reply via email to