Re: [Cake] an experiment with an alternate hasher

2017-03-26 Thread Dave Taht
On Sun, Mar 26, 2017 at 9:16 AM, Jonathan Morton  wrote:
>
>> On 26 Mar, 2017, at 19:00, Dave Taht  wrote:
>>
>> popcount is, regrettably, an sse4.2-only instruction
>
> A read through the ARM ISA Quick Reference Card:
>
> http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf
>
> …shows that there is no equivalent instruction on ARM CPUs at least up to 
> ARMv7, which I think covers all current-generation consumer-grade routers.

All the x86_64 routing platforms at my command have it, notably the
pcengines apu2.

finding a suitable algorithm(s) for arm and mips remains on my mind.

>
> However, the operation can be constructed using log2(N) operations on any 
> modern CPU as a sequence of masks, shifts and adds.  GCC has a “builtin” 
> intrinsic function to use a popcnt instruction where present, and this 
> algorithm otherwise.

yes, I have the __builtin_popcount version too under test. Something
like 20ins without -msse4.2. :(

There are a wide variety of popcnt implementations for sse and neon.

https://github.com/WojciechMula/sse-popcount.git

The extreme value in the sse4.2 implementation is that it works in the
main register set (can be live patched in, too), not the sse regs

and it only takes a clock.

Many cool popcount implementations here:

https://github.com/WojciechMula/sse-popcount.git

One thing that really irks me about all these sorts of benchmarks
(there's a good one for hashes, too) is that the startup cost really
dominates - we do three hashes, and move on.

>
> Obviously this will only be of any use if the resulting hash is of good 
> quality.

Yep, I need to run this through some real data. I just really enjoyed
fitting the whole routine into 28 bytes.

> An obvious problem with popcnt is that inputs of 1, 2, 4, 8, etc have the 
> same popcnt (1),

srcport,dstport, protocol have plenty of bits.

Not really sure what the distribution would look like on real data,
but (as one example) dnsmasq tries to hand out ips not sequentially
but on your mac address, so you get a bit better distribution than
sequential. Maybe.

>and it is trivial for an attacker to exploit this property.

cake is a set associative hash. Any "attacker" merely has to send 1k+
different kinds of flows to saturate it.


>  - Jonathan Morton
>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] an experiment with an alternate hasher

2017-03-26 Thread Jonathan Morton

> On 26 Mar, 2017, at 19:00, Dave Taht  wrote:
> 
> popcount is, regrettably, an sse4.2-only instruction

A read through the ARM ISA Quick Reference Card:

http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf

…shows that there is no equivalent instruction on ARM CPUs at least up to 
ARMv7, which I think covers all current-generation consumer-grade routers.

However, the operation can be constructed using log2(N) operations on any 
modern CPU as a sequence of masks, shifts and adds.  GCC has a “builtin” 
intrinsic function to use a popcnt instruction where present, and this 
algorithm otherwise.

Obviously this will only be of any use if the resulting hash is of good 
quality.  An obvious problem with popcnt is that inputs of 1, 2, 4, 8, etc have 
the same popcnt (1), and it is trivial for an attacker to exploit this property.

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] an experiment with an alternate hasher

2017-03-26 Thread Dave Taht
I am trying to see if I can get an adaquate avalanche distribution using
a hash of popcount(src),(popcount(dst),srcport,dstport, protocol, seed.

popcount is, regrettably, an sse4.2-only instruction, and this version
 of the assembly routine can actually popcount up to 8 ipv6 addresses
in a row, but you typically just pass it 2. I had a great deal of fun
writing this tho I haven't got around to actually seeing how good a
resulting hash would be!

The ipv4 version does both src and dst with a single popcnt.

The startup cost to further hash this is pretty insane - 30 cycles in
http://burtleburtle.net/bob/hash/spooky.html. Xor? CRC?

004005a0 :
  4005a0:   31 c0   xor%eax,%eax
  4005a2:   89 f1   mov%esi,%ecx

004005a4 :
  4005a4:   48 c1 e0 08 shl$0x8,%rax
  4005a8:   f3 4c 0f b8 07  popcnt (%rdi),%r8
  4005ad:   48 83 c7 10 add$0x10,%rdi
  4005b1:   4c 09 c0or %r8,%rax
  4005b4:   f3 4c 0f b8 47 f8   popcnt -0x8(%rdi),%r8
  4005ba:   4c 01 c0add%r8,%rax
  4005bd:   e2 e5   loop   4005a4 
  4005bf:   c3  retq


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake