On Sun, Mar 26, 2017 at 9:16 AM, Jonathan Morton wrote:
>
>> On 26 Mar, 2017, at 19:00, Dave Taht wrote:
>>
>> popcount is, regrettably, an sse4.2-only instruction
>
> A read through the ARM ISA Quick Reference Card:
>
> http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf
>
> …shows that there is no equivalent instruction on ARM CPUs at least up to
> ARMv7, which I think covers all current-generation consumer-grade routers.
All the x86_64 routing platforms at my command have it, notably the
pcengines apu2.
finding a suitable algorithm(s) for arm and mips remains on my mind.
>
> However, the operation can be constructed using log2(N) operations on any
> modern CPU as a sequence of masks, shifts and adds. GCC has a “builtin”
> intrinsic function to use a popcnt instruction where present, and this
> algorithm otherwise.
yes, I have the __builtin_popcount version too under test. Something
like 20ins without -msse4.2. :(
There are a wide variety of popcnt implementations for sse and neon.
https://github.com/WojciechMula/sse-popcount.git
The extreme value in the sse4.2 implementation is that it works in the
main register set (can be live patched in, too), not the sse regs
and it only takes a clock.
Many cool popcount implementations here:
https://github.com/WojciechMula/sse-popcount.git
One thing that really irks me about all these sorts of benchmarks
(there's a good one for hashes, too) is that the startup cost really
dominates - we do three hashes, and move on.
>
> Obviously this will only be of any use if the resulting hash is of good
> quality.
Yep, I need to run this through some real data. I just really enjoyed
fitting the whole routine into 28 bytes.
> An obvious problem with popcnt is that inputs of 1, 2, 4, 8, etc have the
> same popcnt (1),
srcport,dstport, protocol have plenty of bits.
Not really sure what the distribution would look like on real data,
but (as one example) dnsmasq tries to hand out ips not sequentially
but on your mac address, so you get a bit better distribution than
sequential. Maybe.
>and it is trivial for an attacker to exploit this property.
cake is a set associative hash. Any "attacker" merely has to send 1k+
different kinds of flows to saturate it.
> - Jonathan Morton
>
--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake