Hi all,

Apropos hash tables.

For some years now, we have generic lists in gnulib. "Generic" means that the
programmer can switch the implementation easily, because he's programming
against an abstract API that has a number of different implementations.

Here is my draft for applying the same approach to unordered sets. The normal
implementation would be a hash table, but there are important variations:
  - Is the key a pointer, or a memory block of arbitrary size?
  - Is the value stored, or implicit?
  - Does inserting a (key,value) pair make a copy of the key?
  - Does the table allow removals?

Feedback is highly appreciated! Do you see some interesting use case that I
have overlooked?


There shall be three abstract types:
  * Unordered set
  * Mapping from an unordered set to a 'void *' value
  * Unordered multiset


Unordered Set
=============

There are two flavours, depending on the key type: either a pointer
(no copying involved upon insertion), or a memory block of varying size
(copied during insertion).

1) Pointers.                                  Files: gl_set.[hc]
   The constructor takes an equals and hashcode method.

   Variants:
   a) array                                   Files: gl_array_set.[hc]
      arrayhc   with cache of hash code       Files: gl_arrayhc_set.[hc]

   b) hash                                    Files: gl_hash_set.[hc]


   Operation                  ARRAY     HASH
                             ARRAYHC

   gl_set_size                 O(1)     O(1)
   gl_set_add                  O(n)     O(1)
   gl_set_remove               O(n)     O(1)
   gl_set_search               O(n)     O(1)
   gl_set_iterator             O(1)     O(1)
   gl_set_iterator_next        O(1)     O(1)

   Here the search method returns a void *, with value (void*)-1
   denoting "not found". Hmm, or should the search method better take a
   'bool *' argument???

2) Memory block of varying size               Files: gl_blobset.[hc]
   It has the same API, with different parameters (address and size of blob).
   The constructor uses a vtable parameter that specifies the allocator:
     a) for a set that allows removals: malloc-based
     b) for a set with no removals (hoarding): obstack-based

    
Mapping from an unordered set to a 'void *' value
=================================================

Likewise, it depends on the key type.

1) Pointers.                                  Files: gl_map.[hc]
   The constructor takes an equals and hashcode method and a free method
   for the values.

   Variants: as above.
   a) array                                   Files: gl_array_map.[hc]
      arrayhc   with cache of hash code       Files: gl_arrayhc_map.[hc]

   b) hash                                    Files: gl_hash_map.[hc]

   Here the search method returns a void**, pointer to the value cell,
   or NULL if not found.

2) Memory block of varying size               Files: gl_blobmap.[hc]
   It has the same API, with different parameters (address and size of blob).
   The constructor uses a vtable parameter that specifies the allocator:
     a) for a set that allows removals: malloc-based
     b) for a set with no removals (hoarding): obstack-based


Unordered multiset
==================

They are implemented as a mapping from an unordered set to an 'uintptr_t' value.


Bruno

Reply via email to