I actually benchmarked my last suggestion to be 20% faster on an ad-hoc microbenchmark. We can go faster yet on x86 if we use unaligned 4-byte uint32 reads. Maybe in a future change.
In general, hardware is moving in the direction of allowing unaligned reads without performance penalty, especially on x86, and it would be good to start taking advantage of that. On Fri, Mar 6, 2015 at 9:50 PM, Martin Buchholz <marti...@google.com> wrote: > Err... > > #define PKZIP_SIGNATURE_AT(p, b2, b3) \ > (((p)[0] == 'P') & ((p)[1] == 'K') & ((p)[2] == b2) & ((p)[3] == b3)) > >