On 12/11/2015 09:53, Li, Liang Z wrote: >> On 12/11/2015 03:49, Li, Liang Z wrote: >>> I am very surprised about the live migration performance result when >>> I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to >>> check the zero pages. >> >> What code were you using? Remember I suggested using only unsigned long >> checks, like >> >> unsigned long *p = ... >> if (p[0] || p[1] || p[2] || p[3] >> || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0) >> return BUFFER_NOT_ZERO; >> else >> return BUFFER_ZERO; >> > > I use the following code: > > > bool memeqzero4_paolo(const void *data, size_t length) > { > ... > }
The code you used is very generic and not optimized for the kind of data you see during migration, hence the existing code in QEMU fares better. >>> The total live migration time increased about >>> 8%! Not decreased. Although in the unit test your ' >>> memeqzero4_paolo' has better performance, any idea? >> >> You only tested the case of zero pages. But real pages usually are not zero, >> even if they have a few zero bytes at the beginning. It's very important to >> optimize the initial check before the memcmp call. >> > > In the unit test, I only test zero pages too, and the performance of > 'memeqzero4_paolo' is better. > But when merged into QEMU, it caused performance drop. Why? Because QEMU is not migrating zero pages only. Paolo