> 
>> In modern CPUs, there's always an issue with cache lines.  But for a
>> parallel implementation, it really isn't going to matter.  The CPU
>> that finishes last and needs to check the ICV isn't particularly
>> likely to be the CPU that processed the initial header anyway.
> 
> While that would be possible for some algorithms, I've never seen that
> a single cipher request is handled by multiple CPUs. I guess that
> would lead to cacheline bouncing, and for GCM an atomic synchronization
> of the counter would be needed.
> 
> Usually parallelization is achieved by using AVX registers/instructions
> where multiple cipher blocks can be handled simultaneously with a single
> instruction.
> 
> So it might make sense to have the ICV at the end because it is
> likely cache hot when needed.

This is not what we have seen in measurements. Caches are usually much
larger than a single packet. This is especially true for small packets, where
current software en/decrypters have performance issues with. Since the packet
is passed further through the stack the header needs to be "hot” anyways.

However, what we have seen: appending at front and end leads to situations
where a new cache line is required at both ends.

More cumbersome in my option is the whole zero-copy/dealing with segment
lists issue, when having to add things in front and at the end.



Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

Reply via email to