Hash Table pattern:
Links to objects - Hashtable implemented not as a tree where each hash-value is
a knot refering to list of objects with same hash-values. It is represented as
linear list of links (indexes) - data_links where for each hash-value several
consecutive cells may be reserved. So hash-table element refers to first cell
in that sequence and previous element in hash-table refers to next sequence and
first element refers to last position of data_link list. The last element of
hash-table refers to first item in data-list and first element of hash table
refers to the end of data-list, hash-function never return 0 and 0 value always
refer to the end on data_link list - so this internal agreement save me some
checks - so this increase performance bit more and simplify algorithm. If
element has no objects attached to its hash-value, no any consecutive cells,
not even one cell - then next element will have reference to same list cell.
Initially hash-table initialized with zero - so all elements are empty and
refers to the last cell in data_links.
static long *data_links;
data_links = xcalloc(sizeof (long));
static long data_index[HASH_SIZE];
bzero(zone_index, HASH_SIZE * sizeof (long));
Object name lengths - to increase performance and not to calculate string
lengths every time I have data_links shadow which keeps all data_name lengths
(calculating string length (strlen() or similar) takes sometime as compare two
strings - with this shadow array we can speed up this part twice).
static int *data_lens;
zone_lens = xcalloc(sizeof (int));
To insert new object in hashtable:
0. Resize data_links and data_lens - this happen when I add new object with all
other object attributes. Obviously number of links from hash-table to object is
same as number of objects.
data_links = xrealloc(data_links, (sizeof (long)) * data_realloc_limit);
1. Calculate hash-value (as a side effect it also calculate name length):
key = hash_key(package, &len);
2. Take index refering this hash-value to data_links.
index = data_index[key];
3. Insert link to data_links list to the index. So if this hash-value not empty
or if it is empty - does not matter it will now refers to new object.
(void) memmove(data_links + index + 1, data_links + index, (data_num - index -
1) * sizeof (long));
data_links[index] = data;
4. Update index - all indexes from previous to first will be simple
incremented, so if they empty or not they will be just shifted one item up.
for (k = key - 1; k >= 0; k --) {
package_index[k] ++;
}
This is pretty simple, small and elegant hash-table implementation. It is
serves their purposes very well. It has bit heavy operation - memmove. but
assumption is that I create it once at the beginning and then do not add new
elements, so most important is efficient access not efficient update.
And then access looks like this:
key = hash_key(data_name, &len);
for (k = data_index[key]; k != data_index[key - 1]; k ++) {
if (len == package_lens[k])
if (namecmp( data_names[data_links[k]],
data_name, len)) {
return (data_links[k]);
}
}
}
return (-1);
And this works very fast - exactly what we need.
This looks like assembler programming where both data and algorithm created as
one solid solution which delivers performance.
And, of course, hash-key function:
static long
hash_key(char *key_string, int *len)
{
long key = 0;
for ((*len) = 0; (*len) < PATH_MAX; (*len)++) {
if (key_string[(*len)] == '\0') {
break;
}
key <<= 1;
key += (key_string[(*len)]);
}
key += HASH_MASK;
return (key + 1);
}
Shift and addition...
vassun
This message posted from opensolaris.org