> >> > > >> > I use your new code: > >> > ------------------------------------------------- > >> > unsigned long *p = ... > >> > if (p[0] || p[1] || p[2] || p[3] > >> > || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0) > >> > return BUFFER_NOT_ZERO; > >> > else > >> > return BUFFER_ZERO; > >> > --------------------------------------------------- > >> > and the result is almost the same. I also tried the check 8, 16 > >> > long data at the beginning, same result. > >> > >> Interesting... Well, all I can say is that applaud you for testing > >> your hypothesis with the benchmark. > >> > >> Probably the setup cost of memcmp is too high, because the testing > >> loop is already very optimized. > >> > >> Please submit the AVX2 version if it helps! > > I read the email in the wrong order. Forget about my other email. > > Sorry, Juan. >
One thing I still can't understand, why the unit test in host environment shows 'memcmp()' have better performance? Liang > > > > > Yes, the AVX2 version really helps. I have already submitted it, could > > you help to review it? > > > > I am curious about the original intention to add the SSE2 Intrinsics, > > is the same reason? > > > > I even suspect the VM may impact the 'memcmp()' performance, is it > possible? > > > > Liang > > > >> Paolo