Here are a series of performance improvements on the Internet checksum. With these changes applied I get about 20-30% better performance on x86 and PowerPC.
Even though we got off on the wrong foot I got curious enough to do some more investigation and I leared more about "add with carry" and how gcc handle them. Feel free to ignore these patches, I just wanted to at least document my findings in patch form. Joakim Tjernlund (5): checksum: improve add32 checksum: Optimize add32() for PowerPC checksum: use pre increment. checksum: optimize loop and get rid of add16() checksum: Optimize first addition. lib/checksum.c | 61 ++++++++++++++++++++++++++++++++----------------------- 1 files changed, 35 insertions(+), 26 deletions(-)