[install-discuss] Re: pdo.c - introduction

Vasiliy Tue, 31 Oct 2006 10:45:56 PST

Hash Table pattern:

Links to objects - Hashtable implemented not as a tree where each hash-value is 
a knot refering to list of objects with same hash-values. It is represented as 
linear list of links (indexes) - data_links where for each hash-value several 
consecutive cells may be reserved. So hash-table element refers to first cell 
in that sequence and previous element in hash-table refers to next sequence and 
first element refers to last position of data_link list. The last element of 
hash-table refers to first item in data-list and first element of hash table 
refers to the end of data-list, hash-function never return 0 and 0 value always 
refer to the end on data_link list - so this internal agreement save me some 
checks - so this increase performance bit more and simplify algorithm. If 
element has no objects attached to its hash-value, no any consecutive cells, 
not even one cell - then next element will have reference to same list cell. 
Initially hash-table initialized with zero - so all elements are empty and 
refers to the last cell in data_links.


static long    *data_links;
data_links = xcalloc(sizeof (long));

static long    data_index[HASH_SIZE];
bzero(zone_index, HASH_SIZE * sizeof (long));

Object name lengths - to increase performance and not to calculate string 
lengths every time I have data_links shadow which keeps all data_name lengths 
(calculating string length (strlen() or similar) takes sometime as compare two 
strings - with this shadow array we can speed up this part twice).

static int       *data_lens;
zone_lens  = xcalloc(sizeof (int));

To insert new object in hashtable:

0. Resize data_links and data_lens - this happen when I add new object with all 
other object attributes. Obviously number of links from hash-table to object is 
same as number of objects.

data_links  = xrealloc(data_links,  (sizeof (long)) * data_realloc_limit);

1. Calculate hash-value (as a side effect it also calculate name length):

key = hash_key(package, &len);

2. Take index refering this hash-value to data_links.

index = data_index[key];

3. Insert link to data_links list to the index. So if this hash-value not empty 
or if it is empty - does not matter it will now refers to new object.

(void) memmove(data_links + index + 1, data_links + index,  (data_num - index - 
1) * sizeof (long));
data_links[index] = data;

4. Update index - all indexes from previous to first will be simple 
incremented, so if they empty or not they will be just shifted one item up.

       for (k = key - 1; k >= 0; k --) {
               package_index[k] ++;
       }

This is pretty simple, small and elegant hash-table implementation. It is 
serves their purposes very well. It has bit heavy operation - memmove. but 
assumption is that I create it once at the beginning and then do not add new 
elements, so most important is efficient access not efficient update.

And then access looks like this:

key = hash_key(data_name, &len);

for (k = data_index[key]; k != data_index[key - 1]; k ++) {
      if (len == package_lens[k])
              if (namecmp( data_names[data_links[k]],
                       data_name, len)) {
                       return (data_links[k]);
              }
      }
}

return (-1);

And this works very fast - exactly what we need.

This looks like assembler programming where both data and algorithm created as 
one solid solution which delivers performance.

And, of course, hash-key function:

static long
hash_key(char *key_string, int *len)
{
       long key = 0;

       for ((*len) = 0; (*len) < PATH_MAX; (*len)++) {
               if (key_string[(*len)] == '\0') {
                       break;
               }
               key <<= 1;
               key += (key_string[(*len)]);
       }
       key += HASH_MASK;
       return (key + 1);
}

Shift and addition...

vassun
 
 
This message posted from opensolaris.org

[install-discuss] Re: pdo.c - introduction

Reply via email to