On Fri, 27 Mar 2020 at 01:14, Emilio G. Cota <c...@braap.org> wrote: > > (Apologies if I missed some Cc's; I was not Cc'ed in patch 0 > so I'm blindly crafting a reply.)
Sorry I forgot to including you in patch 0, my bad. Will be sure to include you in the future. > On Thu, Mar 26, 2020 at 15:30:43 -0400, Robert Foley wrote: > > This is a continuation of the series created by Emilio Cota. > > We are picking up this patch set with the goal to apply > > any fixes or updates needed to get this accepted. > > Thanks for picking this up! > > > Listed below are the changes for this version of the patch, > > aside from the merge related changes. > > > > Changes for V8: > > - Fixed issue where in rr mode we could destroy the BQL twice. > > I remember doing little to no testing in record-replay mode, so > there should be more bugs hiding in there :-) Thanks for the tip! We will give record-replay some extra testing to hopefully shake some things out. :) > > > - Found/fixed bug that had been hit in testing previously during > > the last consideration of this patch. > > We reproduced the issue hit in the qtest: bios-tables-test. > > The issue was introduced by dropping the BQL, and found us > > (very rarely) missing the condition variable wakeup in > > qemu_tcg_rr_cpu_thread_fn(). > > Aah, this one: > https://patchwork.kernel.org/patch/10838149/#22516931 > How did you identify the problem? Was it code inspection or using a tool > like rr? I remember this being hard to reproduce reliably. Same here, it was hard to reproduce. I did try to use rr on some shorter runs but no luck there. We ran it overnight on one of our ARM servers and it eventually reproduced after about 12 hours in a loop across all the bios-table-test(s) (no rr). Never got it to reproduce on an x86 server. It was fairly consistent too on the same ARM host, it always reproduced within 8-12 hrs or so, and we were able to reproduce it several times. Thanks & Regards, -Rob