Re: [PATCH] TC: bug fixes to the "sample" clause

Russell Stuart Wed, 15 Mar 2006 18:46:18 -0800

On Wed, 2006-03-15 at 20:28 -0500, jamal wrote:
> The variables again are:
> a) the mask/mask size 
> b) the size of the buckets available example if you are masking on 2
> bits, then it doesnt matter if you have 256 buckets - only 4 get used.
> So creating more than 4 is a waste of memory.
> c) The offset within the 32 bit. I dont think this is a big factor if
> you keep shifting by increments of a byte.
> 
> The last one is the range of values in your test data. Example if you
> have only 8 bits that you are masking on, then you dont need an input
> dataset of more than 256 - it adds no value. 
> 
> IIRC, the old 2.4.x hash is not only inefficient over a wide set of
> values it will be considered plain _buggy_ for lower values.
> 
> I think over the weekend i will try to write a little program myself to
> simulate the different cases myself.


Hmmm, you triggered an idea.  You seem to be interested in
one byte values. I think you will have trouble proving 
that 2.4 isn't better for multi byte values.

Which leaves the issue of one byte values.  Up till now I
could not see how 2.4 could be worse than 2.6 in that case
because the 2.4 and 2.6 are identical for 1 byte aligned
values.

OK, so assume they are not aligned.  2.6 effectively does
a N bit shift, and 2.4 does not.  So assume you have an
8 bit value like this sitting in a 16 bit word:

   +--------------------------------------------------+
   | .  .  .  . B0 B1 B2 B3 | B4 B5 B6 B7 .  .  .  .  |
   +--------------------------------------------------+

The rest of the bits are masked off.  2.6 will shift to
get at the bits by shifting, so will the hashed value
will be:

   +-------------------------+
   | B0 B1 B2 B3 B4 B5 B6 B7 |
   +-------------------------+

In we compute the hash by XOR'ing the two bytes, so the
result will be:

   +-------------------------+
   | B4 B5 B6 B7 B0 B1 B2 B3 |
   +-------------------------+

In this case 2.4 and 2.6 compute different values, but the
effectiveness of the hash will be the same.  At this point
I gave up: I didn't see how 2.6 could every be better than
2.4.  But then I am more interested in multi byte values.

But I see now you are interested in patterns of less than
8 bits.  I bet it is the TOS bits in the IP header - you 
are using "hashkey u8 0x1C at 1".  At first sight this is
a simple 1 byte hash - but everything changes if you use
a divisor of 8 - which I presume you are.  In that case the
finish hash for 2.6 will be spread evenly across the 8
buckets.  The 2.4 hash will use only 2 buckets with 4 in
each.  This arises because 2.6 shifts and 2.4 XOR's.

The 2.4 problem disappears if you use a divisor of 32 for
this case, meaning that if you use a divisor of 32 then
2.4 runs as fast as 2.6.  The overhead for doing is is 
take an extra 96 bytes (= (32-8)*4, each hash table has
an overhead of a pointer - 4 bytes) in the kernel.

So the trade offs appear to be:
  For < 1 byte values: 2.4 may require more memory than
                       2.6 to run at the same speed.
  For 1 byte values:   The hashes are identical.
  For > 1 byte values: 2.6 will usually run slower than 2.4.

Given todays technology, I think memory for speed was the
right trade off.

But, there is room for compromise.  We could have the best 
of both worlds, by combining the 2.4 and 2.6 hashes.  Ie, 
something like
this:

  hash = (value & MASK) >> shift;     // Identical to 2.6
  hash = (hash >> 16) | hash;         // Identical to 2.4
  hash = (hash >> 8) | hash;          // Identical to 2.4
  return hash & 0xFF;                 // Identical to both.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] TC: bug fixes to the "sample" clause

Reply via email to