hash tables with prototype chaining: failed thought experiment

Kragen Javier Sitaker Thu, 10 Jun 2010 00:37:06 -0700

JavaScript has a built-in inheritance feature where a slot lookup in one
hash table ("object") can "fail through" to another one (its
"prototype") in order to factor out commonalities between many objects.


I was reading <http://lambda-the-ultimate.org/node/3851#comment-57693>
when it occurred to me that there was a clever, simple and relatively
efficient way to handle this.

Represent each object as a smallish hash table with separate chaining;
say, 8, 16, or 32 entries. But tag each non-null pointer with an
"indirection bit" like the one in the Burroughs 5000 hardware, which
indicates whether it points to a hash table entry or to a pointer to
one. So the logic for following a pointer looks like this:

        inline hashent *follow(hashent *pointer) {
                while (indirect_tagged(pointer)) pointer = *untag(pointer);
                return pointer;
        }

One possible definition for indirect_tagged and untag, on a 32-bit
machine, using the LSB for the tag, would be:

        inline int indirect_tagged(hashent *pointer) { 
                return (uint32)pointer & 1; 
        }
        inline hashent **untag(hashent *pointer) {
                return (uint32)pointer & ~1;
        }

These are used both for the pointers in the hash table slots themselves,
and the next pointers on the hash entries.

The purpose of this indirection bit is to permit a hash entry chain to
terminate, not at null, but on the corresponding hash table slot of a
different hash table, from which the lookup process can continue.

So when you create a new hash table, you just link each of its hash
table slots to the corresponding slot of its prototype. Insertions,
deletions, and lookups work correctly on either table. Resizing a hash
table to make it bigger only works if there are no such prototype
references to its hash slots, so to support resizing, you need to have
space in the table somewhere to track whether it may be so referenced.
Overwriting requires detecting whether any indirect-tagged pointers were
followed.

So, is this "clever" approach better or worse than the straightforward
data structure? The straightforward data structure simply has a
"prototype pointer" on the hash table, and its lookup function uses a
nested loop --- an outer loop over a particular hash chain, and an inner
loop over the prototype chain. That's somewhat backwards from this
approach, where the inner loop is essentially a loop over the prototype
chain (although invoked on every hash entry). Which loop is better to
have on the inside?

The straightforward data structure should lead to clearer code, and I
can't tell a priori that the "clever" approach has any real efficiency
advantages.
-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-tol

hash tables with prototype chaining: failed thought experiment

Reply via email to