> >> In modern CPUs, there's always an issue with cache lines. But for a >> parallel implementation, it really isn't going to matter. The CPU >> that finishes last and needs to check the ICV isn't particularly >> likely to be the CPU that processed the initial header anyway. > > While that would be possible for some algorithms, I've never seen that > a single cipher request is handled by multiple CPUs. I guess that > would lead to cacheline bouncing, and for GCM an atomic synchronization > of the counter would be needed. > > Usually parallelization is achieved by using AVX registers/instructions > where multiple cipher blocks can be handled simultaneously with a single > instruction. > > So it might make sense to have the ICV at the end because it is > likely cache hot when needed.
This is not what we have seen in measurements. Caches are usually much larger than a single packet. This is especially true for small packets, where current software en/decrypters have performance issues with. Since the packet is passed further through the stack the header needs to be "hot” anyways. However, what we have seen: appending at front and end leads to situations where a new cache line is required at both ends. More cumbersome in my option is the whole zero-copy/dealing with segment lists issue, when having to add things in front and at the end.
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ IPsec mailing list IPsec@ietf.org https://www.ietf.org/mailman/listinfo/ipsec