Hi Jason,
> I'd be inclined to roll with your implementation if it can eventually
> become competitive with Andy Polyakov's, [...]
I think for the SSSE3/AVX2 code paths it is competitive; especially for
small sizes it is faster, which is not that unimportant when
implementing layer 3 VPNs.
> there are still no AVX-512 paths, which means it's considerably
> slower on all newer generation Intel chips. Andy's has the AVX-512VL
> implementation for Skylake (using ymm, so as not to hit throttling)
> and AVX-512F for Cannon Lake and beyond (using zmm).
I don't think that having AVX-512F is that important until it is really
usable on CPUs in the market.
Adding AVX-512VL support is relatively simple. I have a patchset mostly
ready that is more than competitive with the code from Zinc. I'll clean
that up and do more testing before posting it later this week.
Best regards
Martin