Jason Wang <jasow...@redhat.com> writes: > On 2019/1/24 下午5:51, Peter Xu wrote: >> On Thu, Jan 24, 2019 at 09:11:15AM +0000, Dr. David Alan Gilbert wrote: >>> * Jason Wang (jasow...@redhat.com) wrote: >>>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote: >>>>> * Jason Wang (jasow...@redhat.com) wrote: >>>>>> On 2019/1/22 上午2:56, Peter Maydell wrote: >>>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasow...@redhat.com> wrote: >>>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote: >>>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert >>>>>>>>> <dgilb...@redhat.com <mailto:dgilb...@redhat.com>> wrote: >>>>>>>>> >>>>>>>>> * Peter Maydell (peter.mayd...@linaro.org >>>>>>>>> <mailto:peter.mayd...@linaro.org>) wrote: >>>>>>>>> > Recently I've noticed that test-filter-mirror has been >>>>>>>>> hanging >>>>>>>>> > intermittently, typically when run on some other TCG >>>>>>>>> architecture. >>>>>>>>> > In the instance I've just looked at, this was with s390x >>>>>>>>> guest on >>>>>>>>> > x86-64 host, though I've also seen it on other host archs and >>>>>>>>> > perhaps with other guests. >>>>>>>>> >>>>>>>>> Watch out to see if you really do see it for other guests; >>>>>>>>> it carefully avoids using virtio-net to avoid vhost; but on >>>>>>>>> s390x it >>>>>>>>> uses virtio-net-ccw - could that hit the vhost it was trying >>>>>>>>> to avoid? >>>>>>>>> >>>>>>>>> > Below is a backtrace, though it seems to be pretty unhelpful. >>>>>>>>> > Anybody got any theories ? Does the mirror test rely on dirty >>>>>>>>> > memory bitmaps like the migration test (which also hangs >>>>>>>>> > occasionally with TCG due to some bug I'm sure we've >>>>>>>>> investigated >>>>>>>>> > in the past) ? >>>>>>>>> >>>>>>>>> I don't think it relies on the CPU at all. >>>>>>>>> I have no idea about this currently, but Jason and I designed the >>>>>>>>> test case. >>>>>>>>> Add Jason: Have any comments about this ? >>>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the >>>>>>>> test should be independent to any kinds of emulation. It should pass >>>>>>>> when mainloop work. >>>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is >>>>>>> indeed not specific to s390x guest (and so not specific to >>>>>>> virtio-net either, since the ppc64 guest setup uses e1000). >>>>>>> >>>>>>> thanks >>>>>>> -- PMM >>>>>> Finally reproduced locally after hundreds (sometimes thousands) times of >>>>>> running. >>>>>> >>>>>> Bisection points to OOB monitor[1]. >>>>>> >>>>>> It looks to me after OOB is used unconditionally we lose a barrier to >>>>>> make >>>>>> sure socket is connected before sending packets in test-filter-mirror.c. >>>>>> Is >>>>>> there any other similar and simple thing that we could do to kick the >>>>>> mainloop? >>>>> Do you mean the: >>>>> >>>>> /* send a qmp command to guarantee that 'connected' is setting to >>>>> true. */ >>>>> qmp_discard_response(qts, "{ 'execute' : 'query-status'}"); >>>> >>>> Yes. >>>> >>>> >>>>> why was that ever sufficient to know the socket was ready? >>>> >>>> It was suggested by Fam, I don't remember the details. Can we make sure all >>>> pending events has been processed (UNIX socket was set to connected) after >>>> query-status is returned with an non OOB monitor? >>> I'm not sure - it doesn't sound like a 'query-status' should ensure >>> anything else. >>> How about something like a 'query-chardev' - can that tell you what you >>> need and loop until it's ready? >> Yeah it sounds hacky to use "query status" to make sure a specific >> chardev is connected even before the OOB... > > > Probably, but anyway it works before OOB.
I don't doubt it worked. Relying on inappropriate assumptions always works just fine right until the assumptions become invalid :) [...]