On 10/11/2015 10:13, Juan Quintela wrote:
>> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo 
>> > length'
>> > then write a test program to check a large amount of zero pages, and
>> > use the 'time' to
>> > recode the time takes by different optimization. Test result is like this:
>> >
>> > SSE2:
>> > ------------------------------------------------------
>> >               |            test 1         |     test 2
>> > ----------------------------------------------------
>> > Time(S):|       13.696            | 13.533  
>> > ------------------------------------------------
>> >
>> >
>> > AVX2:
>> > -------------------------------------------
>> >               |        test 1     | test 2
>> > -------------------------------------------
>> > Time (S):|      10.583      |  10.306
>> > -------------------------------------------
>> >
>> > memeqzero4_paolo:
>> > ---------------------------------------
>> >               |        test 1     | test 2
>> > ---------------------------------------
>> > Time (S):|      9.718     |  9.817
>> > ----------------------------------------
>> >
>> >
>> > Paolo's implementation has the best performance. It seems that we can
>> > remove the SSE2 related Intrinsics.

Note that you can simplify my implementation a lot, because
buffer_find_nonzero_offset already assumes that the buffer is aligned to
sizeof(VECTYPE), i.e. 16 bytes.  For example you can just check the
first 4 unsigned longs against zero and then call memcmp.

Paolo

> How should I understand that comment?  That you are about to send an
> email to remove the sse2 support and that I can forget about this patch?

Reply via email to