On 10/11/2015 10:13, Juan Quintela wrote: >> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo >> > length' >> > then write a test program to check a large amount of zero pages, and >> > use the 'time' to >> > recode the time takes by different optimization. Test result is like this: >> > >> > SSE2: >> > ------------------------------------------------------ >> > | test 1 | test 2 >> > ---------------------------------------------------- >> > Time(S):| 13.696 | 13.533 >> > ------------------------------------------------ >> > >> > >> > AVX2: >> > ------------------------------------------- >> > | test 1 | test 2 >> > ------------------------------------------- >> > Time (S):| 10.583 | 10.306 >> > ------------------------------------------- >> > >> > memeqzero4_paolo: >> > --------------------------------------- >> > | test 1 | test 2 >> > --------------------------------------- >> > Time (S):| 9.718 | 9.817 >> > ---------------------------------------- >> > >> > >> > Paolo's implementation has the best performance. It seems that we can >> > remove the SSE2 related Intrinsics.
Note that you can simplify my implementation a lot, because buffer_find_nonzero_offset already assumes that the buffer is aligned to sizeof(VECTYPE), i.e. 16 bytes. For example you can just check the first 4 unsigned longs against zero and then call memcmp. Paolo > How should I understand that comment? That you are about to send an > email to remove the sse2 support and that I can forget about this patch?