On Tue, Jun 22, 2021 at 6:51 PM David Rowley <dgrowle...@gmail.com> wrote: > I guess we could also ask ourselves how many join algorithms we need.
David and I discussed this a bit off-list, and I just wanted to share how I understand the idea so far in case it helps someone else. There are essentially three subcomponents working together: 1. A data structure similar in some ways to a C++ std::deque<T>, which gives O(1) access to elements by index, is densely packed to enable cache-friendly scanning of all elements, has stable addresses (as long as you only add new elements at the end or overwrite existing slots), and is internally backed by an array of pointers to a set of chunks. 2. A bitmapset that tracks unused elements in 1, making it easy to find the lowest-index hole when looking for a place to put a new one by linear search for a 1 bit, so that we tend towards maximum density despite having random frees from time to time (seems good, the same idea is used in kernels to allocate the lowest unused file descriptor number). 3. A hash table that has as elements indexes into 1. It somehow hides the difference between keys (what callers look things up with) and keys reachable by following an index into 1 (where elements' keys live). One thought is that you could do 1 as a separate component as the "primary" data structure, and use a plain old simplehash for 3 as a kind of index into it, but use pointers (rather than indexes) to objects in 1 as elements. I don't know if it's better.