Hi, Juan. Thanks for your review. > -----Original Message----- > From: Juan Quintela [mailto:quint...@redhat.com] > Sent: Tuesday, February 25, 2014 11:25 PM > To: Gonglei (Arei) > Cc: qemu-devel@nongnu.org; Dr. David Alan Gilbert; owass...@redhat.com; > chenliang (T); Huangweidong (C) > Subject: Re: [PATCH v3] XBZRLE: Fix qemu crash when resize the xbzrle cache > > "Gonglei (Arei)" <arei.gong...@huawei.com> wrote: > > Resizing the xbzrle cache during migration causes qemu-crash, > > because the main-thread and migration-thread modify the xbzrle > > cache size concurrently without lock-protection. > > > > Signed-off-by: ChenLiang <chenlian...@huawei.com> > > Signed-off-by: Gonglei <arei.gong...@huawei.com> > > Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > > Sorry for the late review. > > > --- > > Changes against the previous version: > > *Remove the temporary variable cache in the xbzrle_cache_resize. > > > > arch_init.c | 58 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 55 insertions(+), 3 deletions(-) > > > > diff --git a/arch_init.c b/arch_init.c > > index 80574a0..003640f 100644 > > --- a/arch_init.c > > +++ b/arch_init.c > > @@ -164,26 +164,69 @@ static struct { > > uint8_t *encoded_buf; > > /* buffer for storing page content */ > > uint8_t *current_buf; > > - /* Cache for XBZRLE */ > > + /* Cache for XBZRLE, Protected by lock. */ > > PageCache *cache; > > + QemuMutex lock; > > } XBZRLE = { > > .encoded_buf = NULL, > > .current_buf = NULL, > > .cache = NULL, > > + /* use the lock carefully */ > > + .lock = {PTHREAD_MUTEX_INITIALIZER}, > > }; > > Have you tried to compile for windows? I think this would make it > break. We need to init it with qemu_mutex_init() in ram_save_setup() > after the call to cache_init()?
Yeah,it isn't compatible with windows. There is a problem that we must initialize the lock at qemu startup. Because the xbzrle_cache_resize can be called anytime. I don't know how to handle it in windows. Do you have any suggestion? > > > > > /* buffer used for XBZRLE decoding */ > > static uint8_t *xbzrle_decoded_buf; > > > > +static void XBZRLE_cache_lock(void) > > +{ > > + qemu_mutex_lock(&XBZRLE.lock); > > +} > > + > > if we change this to: > > if (migrate_use_xbzrle()) > qemu_mutex_lock(&XBZRLE.lock); > > On both cache operations, we would remove the overhead in the no XBRLE > case, right? Check. > > > +static void XBZRLE_cache_unlock(void) > > +{ > > + qemu_mutex_unlock(&XBZRLE.lock); > > +} > > + > > int64_t xbzrle_cache_resize(int64_t new_size) > > { > > + PageCache *new_cache, *old_cache; > > + > > if (new_size < TARGET_PAGE_SIZE) { > > return -1; > > } > > > > + XBZRLE_cache_lock(); > > if (XBZRLE.cache != NULL) { > > - return cache_resize(XBZRLE.cache, new_size / > TARGET_PAGE_SIZE) * > > - TARGET_PAGE_SIZE; > > + /* check XBZRLE.cache again later */ > > + XBZRLE_cache_unlock(); > > + if (pow2floor(new_size) == migrate_xbzrle_cache_size()) { > > + return pow2floor(new_size); > > + } > > + new_cache = cache_init(new_size / TARGET_PAGE_SIZE, > > + TARGET_PAGE_SIZE); > > + if (!new_cache) { > > + DPRINTF("Error creating cache\n"); > > + return -1; > > + } > > + > > + XBZRLE_cache_lock(); > > + /* the XBZRLE.cache may have be destroyed, check it again */ > > + if (XBZRLE.cache != NULL) { > > + old_cache = XBZRLE.cache; > > + XBZRLE.cache = new_cache; > > + new_cache = NULL; > > + } > > + XBZRLE_cache_unlock(); > > + > > + if (NULL == new_cache) { > > + cache_fini(old_cache); > > + } else { > > + cache_fini(new_cache); > > + } > > + } else { > > + XBZRLE_cache_unlock(); > > } > > Can we change this to: > > if (pow2floor(new_size) == migrate_xbzrle_cache_size()) { > return pow2floor(new_size); > } > > new_cache = cache_init(new_size / TARGET_PAGE_SIZE, > TARGET_PAGE_SIZE); > if (!new_cache) { > DPRINTF("Error creating cache\n"); > return -1; > } > > XBZRLE_cache_lock(); > /* the XBZRLE.cache may have be destroyed, check it again */ > if (XBZRLE.cache != NULL) { > cache_to_free = XBZRLE.cache; > XBZRLE.cache = new_cache; > } else { > cache_to_free = new_cache; > } > XBZRLE_cache_unlock(); > > cache_fini(cache_to_free); > > I think that looking is clearer this way. BTW, we are losing > _everything_ that is on the cache if we resize it. I don't know if that > is what we want. If we have a big cache, and make it bigger because we > are having too many misses, just dropping the whole cache looks like a > bad idea :-( > Dropping the whole cache is a pity. I have thought about reuse the cache_resize in the xbzrle_cache_resize. But the cache_resize may consume a little time, and it will block migration thread too long. > > And my understanding is that we should do a: > > migrate_get_current()->xbzrle_cache_size = new_size; > > independently of what CACHE is equal or different from NULL? > (this was already wrong before your patch, just asking because you are > looking at the code). We should do it definitely. The old code do it too. But the old code don't handle the return value of the xbzrle_cache_resize. The return value may be -1. I will fix it . In addition, I also do some work to fix another XBZRLE bug and optimize XBZRLE. The patches will be committed later. > > Thanks, Juan. Best regards, -Gonglei