Re: [PATCH] TC: bug fixes to the "sample" clause

jamal Thu, 16 Mar 2006 05:52:25 -0800

On Thu, 2006-16-03 at 15:47 +1000, Russell Stuart wrote:
> On Wed, 2006-03-15 at 22:07 -0500, jamal wrote:


[..]
> With a divisor of 64 bytes, the 2.6 hash produces 6 bit
> quantity which enumerates to 64 unique values, and the 2.4
> hash produces 4 bits which enumerates to 16 unique values.  
> Ergo, each 2.4 bucket will hold 4 values (=64 / 16), whereas 
> the 2.6 buckets will hold one each (=64 / 64).
> 
> Thus in this case, we can say that either:
>    2.4 and 2.6 use the same amount of memory, but 2.4 runs slower.
>    2.4 and 2.6 run at the same speed, but 2.4 uses more memory.
> Take your pick.
> 


Indeed. I think you understood me and i dont have to bother finding
my scripts ;-> And we probably dont even need to do the test - we'll
see.

Now - lets compute RAM requirements; 
Did you said 968 bytes from 256->64?
In the scheme i was using this could easily add up to many megs in a
short time. Think at looking at a different octet at each horizontal
hash table level.
In one setup this went to at least 256x256 hash tables [Without details:
Assume that 2 different octets containing a lot of entropy and i have a
lot of users]. Clearly looking at the traffic patterns i was able to
deduce that i didnt need more than 6 bits in the first level and 5 bits
in the second (which would only need 32 buckets).
This means all i needed was 64x32 hash tables instead of 256x256.
If we said roughly the 256->32 bkts also used 1Kbyte,  then the math
says  we are going to use about 60MB less of RAM by reducing the hash
table sizes - without counting that my hash tables have also gone down
incredibly. 60MB even by my standards is high when unnecessary. 
Initially i remember just cutting it down to 128 hash buckets to
try and compensate but that resulted in the lookups not getting better
and still i was spending a lot of RAM.
 
So I am willing to trade lookups for RAM but not when i can avoid it;->

> BTW, in this example, the new hash I suggested would be as good 
> as the 2.6 case.
> 

Yes, if you used 256 buckets per hash table ;->
Of which 75% of them will never ever be used. I wouldnt call that good
enough. Thats _really really really bad_ ;-> 
Also try reducing the bits to 5, 4 and it gets even worse. And these
are real world valid setups. Wait till you start getting into V6. Or try
building an effective fast lookup scheme for a five tuple lookup using
u32; five tuples being {src/dst IP, IP protocol, src/dst port}
So in simple setups where you say dont exceed a few hundred hash tables,
and a few hundred to thousand filters, the old hash was fine.

> Now lets take the case that we hashing a number of bytes with
> a 256 divisor (my case).  If these bytes contain truly random 
> values, then again 2.4 and 2.6 will be the same.  

But they are not.

> This is
> because the hash value 2.6 uses, the low order byte is already
> perfectly random - so it can't be improved on.  

Thats not the right philosophy for building good classification schemes.
You cant have a universal classifier - for that you need infinite
memory. You can do two things to balance memory vs lookups:
a) know your traffic patterns.
b) go over all possible values.

Whether traffic patterns seen are deterministic or random is not
important.

> The 2.4 XOR's
> the two values together.  XOR has the property that it "adds"
> the randomness of the bits together, unless they are correlated.
> So if you take two partially random bits, and XOR them together,
> then the resulting bit will be more random that the original two
> bits.  An illustration of this from crypto is a stream cypher 
> like rc4.  rc4 effectively produces a random stream of bits.
> To use rc4, you XOR your plain text with this random stream.
> Even though your plain text is highly non-random, the cypher
> text is at least as random as the rc4 stream - and thus looks
> like gibberish.  Anyway, the end result for 2.4 is that if you
> XOR two perfectly random bytes, the result is a perfectly random
> byte.  So for random data 2.6 and 2.4 are the same.
> 

To use a term which makes sense up here, you are treading on thin ice
now ;->  You dont wanna skate on this river ;->

Randomness has nothing to do with this. It doesnt matter whether
random traffic patterns arrive at all. 
- assume that in 8 bits of data, 6 bits provide the most entropy.
- assume 256 hosts (just to cover the range of the 8 bits)

a) If i built a hash table with 256 buckets, i can guarantee that
i will find any host using either scheme in 4 lookups.
Except 75% of the buckets will not be used by either.

b) If i built a hash table with 128 buckets, I can guarantee i will
find any host in 4 lookups with the new scheme but it will take
a best case of 8 lookups with the old scheme.
Except 50% of the buckets will not be used by the new scheme and
75% not be used by the old scheme.

c) if i built a hash table with 64 buckets, I can guarantee that i will
find any host in 4 lookups with the new scheme and 16 in the old scheme.
100% of the buckets will be used by the new scheme; only 25% will be
used by the old scheme.

d) If i built a hash table of 32 buckets, I can guarantee that i will
find any host in 8 lookups with new scheme and _32_ with old.
100% used by the new scheme and 25% by old

See the pattern? But this is not the worst part. The nasty part
is if i built a newer level of hash tables so i can reduce the lookups,
it totally reduces my effectiveness; i need to figure out which buckets
are being hit etc.

[deleted: same arguement as above]

> 
> And you can go on, but the only cases where 2.6 will get a
> hash a good as 2.4 is when the entire value is perfectly
> random, or when it is fixed.  Neither is very likely.
> 

No, thats _not_ how people build classifiers - go and read the literally
hundreds of classifier papers out there. The _primary_ goal of all
classifiers is to do faster lookups. You make observations of how things
look and then try to improve on lookups and cut down on RAM use. When
you build classifiers for an edge or core, different characteristics are
used.

> I hope I haven't lost you in there somewhere.  The bottom
> line is that unless there are correlated bits (and there
> haven't been in any real life examples either you or I 
> have presented so far),

And this is an important piece you are missing i think. People actually
go to the trouble of collecting traffic and looking at patterns.
As an example, assume i was doing a 5 tuple lookup for my laptop.
Clearly on outgoing, the src ip adds no entropy; and on incoming
the dst adds none. If i was doing this for a router this is no longer
true. etc.

>  the 2.4 hash will always be better than the 2.6 one for multibyte values.

I think this is the only thing you have going for you at the
moment - but your old data was unfortunately insufficient. 
This is why i said more tests are needed.

> But the new hash I suggested behaves as well as the 2.4
> version on multibyte values, and as well as the 2.6
> version in sub-byte values.  Replacing the current
> hash function with it would may us both happy.
> 

You have not convinced me that, in the case of multibyte, the old one is
good for anything other than the one example you had with a fixed mask
and fixed number of buckets.  
I hope you see the value of varying the parameters i mentioned now (it
should be very clear on the hash bucket count and mask i hope).
Sorry, but a lot more work is needed and you seem to be in a rush to get
there ;->

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] TC: bug fixes to the "sample" clause

Reply via email to