Re: [PATCH 1/2] audio: remove qemu_spice_audio_init()
On Tue, Dec 15, 2020 at 09:07:19AM +0100, Gerd Hoffmann wrote: > > > +if (using_spice) { > > > +/* > > > + * When using spice allow the spice audio driver being picked > > > + * as default. > > > + * > > > + * Temporary hack. Using audio devices without explicit > > > + * audiodev= property is already deprecated. Same goes for > > > + * the -soundhw switch. Once this support gets finally > > > + * removed we can also drop the concept of a default audio > > > + * backend and this can go away. > > > + */ > > > +driver = audio_driver_lookup("spice"); > > > +driver->can_be_default = 1; > > > > fyi, one of my libvirt/QEMU guests now segfaults here. > > See: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=977301 > > Hmm, surely doesn't hurt to add a "if (driver)" check here. > > I'm wondering though how you end up with spice being enabled > but spiceaudio driver not being available. There is no separate > config switch so you should have both spice + spiceaudio or > none of them ... Beats me. I'm seeing this for all of my guests, which I believe were just created with virt-install or virtinst. Here's the log, in case it helps: 2020-12-13 17:49:15.028+: starting up libvirt version: 6.9.0, package: 1+b2 (amd64 / i386 Build Daemon (x86-ubc-01) Mon, 07 Dec 2020 09:45:52 +), qemu version: 5.2.0Debian 1:5.2+dfsg-2, kernel: 5.9.0-1-amd64, hostname: xps13.dannf LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \ HOME=/var/lib/libvirt/qemu/domain-4-debian10 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-4-debian10/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-4-debian10/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-4-debian10/.config \ QEMU_AUDIO_DRV=spice \ /usr/bin/qemu-system-x86_64 \ -name guest=debian10,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-debian10/master-key.aes \ -machine pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,memory-backend=pc.ram \ -cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off \ -m 1024 \ -object memory-backend-ram,id=pc.ram,size=1073741824 \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid 623816ab-33f1-420d-9fc6-2a11afb5715d \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=36,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot menu=on,strict=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \ -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/debian10.raw","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-3-format","read-only":false,"driver":"raw","file":"libvirt-3-storage"}' \ -device virtio-blk-pci,bus=pci.4,addr=0x0,drive=libvirt-3-format,id=virtio-disk0,bootindex=1 \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/debian10-seed.img","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"raw","file":"libvirt-2-storage"}' \ -device virtio-blk-pci,bus=pci.7,addr=0x0,drive=libvirt-2-format,id=virtio-disk1 \ -device ide-cd,bus=ide.0,id=sata0-0-0 \ -netdev tap,fd=38,id=hostnet0,vhost=on,vhostfd=39 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:86:da:65,bus=pci.1,addr=0x0 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,fd=40,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -chardev spicevmc,id=charchannel1,name=vdagent \ -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -spice
Re: [PATCH 1/2] audio: remove qemu_spice_audio_init()
On Wed, Sep 16, 2020 at 2:42 AM Gerd Hoffmann wrote: > > Handle the spice special case in audio_init instead. > > With the qemu_spice_audio_init() symbol dependency being > gone we can build spiceaudio as module. > > Signed-off-by: Gerd Hoffmann > --- > include/ui/qemu-spice.h | 1 - > audio/audio.c | 16 > audio/spiceaudio.c | 5 - > ui/spice-core.c | 1 - > 4 files changed, 16 insertions(+), 7 deletions(-) > > diff --git a/include/ui/qemu-spice.h b/include/ui/qemu-spice.h > index 8c23dfe71797..12474d88f40e 100644 > --- a/include/ui/qemu-spice.h > +++ b/include/ui/qemu-spice.h > @@ -29,7 +29,6 @@ extern int using_spice; > > void qemu_spice_init(void); > void qemu_spice_input_init(void); > -void qemu_spice_audio_init(void); > void qemu_spice_display_init(void); > int qemu_spice_display_add_client(int csock, int skipauth, int tls); > int qemu_spice_add_interface(SpiceBaseInstance *sin); > diff --git a/audio/audio.c b/audio/audio.c > index ce8c6dec5f47..76cdba0943d1 100644 > --- a/audio/audio.c > +++ b/audio/audio.c > @@ -34,6 +34,7 @@ > #include "qemu/module.h" > #include "sysemu/replay.h" > #include "sysemu/runstate.h" > +#include "ui/qemu-spice.h" > #include "trace.h" > > #define AUDIO_CAP "audio" > @@ -1658,6 +1659,21 @@ static AudioState *audio_init(Audiodev *dev, const > char *name) > /* silence gcc warning about uninitialized variable */ > AudiodevListHead head = QSIMPLEQ_HEAD_INITIALIZER(head); > > +if (using_spice) { > +/* > + * When using spice allow the spice audio driver being picked > + * as default. > + * > + * Temporary hack. Using audio devices without explicit > + * audiodev= property is already deprecated. Same goes for > + * the -soundhw switch. Once this support gets finally > + * removed we can also drop the concept of a default audio > + * backend and this can go away. > + */ > +driver = audio_driver_lookup("spice"); > +driver->can_be_default = 1; fyi, one of my libvirt/QEMU guests now segfaults here. See: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=977301 -dann > +} > + > if (dev) { > /* -audiodev option */ > legacy_config = false; > diff --git a/audio/spiceaudio.c b/audio/spiceaudio.c > index b6b5da4812f2..aae420cff997 100644 > --- a/audio/spiceaudio.c > +++ b/audio/spiceaudio.c > @@ -310,11 +310,6 @@ static struct audio_driver spice_audio_driver = { > .voice_size_in = sizeof (SpiceVoiceIn), > }; > > -void qemu_spice_audio_init (void) > -{ > -spice_audio_driver.can_be_default = 1; > -} > - > static void register_audio_spice(void) > { > audio_driver_register(_audio_driver); > diff --git a/ui/spice-core.c b/ui/spice-core.c > index ecc2ec2c55c2..10aa309f78f7 100644 > --- a/ui/spice-core.c > +++ b/ui/spice-core.c > @@ -804,7 +804,6 @@ void qemu_spice_init(void) > qemu_spice_add_interface(_migrate.base); > > qemu_spice_input_init(); > -qemu_spice_audio_init(); > > qemu_add_vm_change_state_handler(vm_change_state_handler, NULL); > qemu_spice_display_stop(); > -- > 2.27.0 > >
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Changed in: kunpeng920/ubuntu-18.04-hwe Status: Triaged => Fix Committed ** Changed in: kunpeng920/ubuntu-18.04 Status: Triaged => Fix Committed ** Changed in: kunpeng920 Status: Triaged => Fix Committed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Fix Committed Status in kunpeng920 ubuntu-18.04 series: Fix Committed Status in kunpeng920 ubuntu-18.04-hwe series: Fix Committed Status in kunpeng920 ubuntu-19.10 series: Fix Released Status in kunpeng920 ubuntu-20.04 series: Fix Released Status in kunpeng920 upstream-kernel series: Invalid Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Bionic: Fix Committed Status in qemu source package in Eoan: Fix Released Status in qemu source package in Focal: Fix Released Bug description: SRU TEAM REVIEWER: This has already been SRUed for Focal, Eoan and Bionic. Unfortunately the Bionic SRU did not work and we had to reverse the change. Since then we had another update and now I'm retrying the SRU. After discussing with @paelzer (and @dannf as a reviewer) extensively, Christian and I agreed that we should scope this SRU as Aarch64 only AND I was much, much more conservative in question of what is being changed in the AIO qemu code. New code has been tested against the initial Test Case and the new one, regressed for Bionic. More information (about tests and discussion) can be found in the MR at ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-bionic-refix BIONIC REGRESSION BUG: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419 [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] INITIAL * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. BIONIC REGRESSED ISSUE https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419 [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
Verified w/ over 500 successful iterations on a m6g.metal instance, and over 300 in an armhf chroot on the same. ** Tags removed: verification-needed verification-needed-bionic ** Tags added: verification-done verification-done-bionic -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Fix Released Status in kunpeng920 ubuntu-20.04 series: Fix Released Status in kunpeng920 upstream-kernel series: Invalid Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Bionic: Fix Committed Status in qemu source package in Eoan: Fix Released Status in qemu source package in Focal: Fix Released Bug description: SRU TEAM REVIEWER: This has already been SRUed for Focal, Eoan and Bionic. Unfortunately the Bionic SRU did not work and we had to reverse the change. Since then we had another update and now I'm retrying the SRU. After discussing with @paelzer (and @dannf as a reviewer) extensively, Christian and I agreed that we should scope this SRU as Aarch64 only AND I was much, much more conservative in question of what is being changed in the AIO qemu code. New code has been tested against the initial Test Case and the new one, regressed for Bionic. More information (about tests and discussion) can be found in the MR at ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-bionic-refix BIONIC REGRESSION BUG: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419 [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] INITIAL * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. BIONIC REGRESSED ISSUE https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419 [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860)
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
I ran the new PPA build (1:2.11+dfsg-1ubuntu7.29~ppa01) on both a ThunderX2 system and a Hi1620 system overnight, and both survived (6574 & 12919 iterations, respectively). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Fix Released Status in kunpeng920 ubuntu-20.04 series: Fix Released Status in kunpeng920 upstream-kernel series: Invalid Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Bionic: In Progress Status in qemu source package in Eoan: Fix Released Status in qemu source package in Focal: Fix Released Bug description: [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
Ike's backport in https://launchpad.net/~ikepanhc/+archive/ubuntu/lp1805256 tests well for me on Cavium Sabre. One minor note is that the function in_aio_context_home_thread() is being called in aio-win32.c, but that function didn't exist in 2.11. We probably want to change that to aio_context_in_iothread(). It was renamed in https://git.qemu.org/?p=qemu.git;a=commitdiff;h=d2b63ba8dd20c1091b3f1033e6a95ef95b18149d -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Triaged Status in kunpeng920 ubuntu-20.04 series: Triaged Status in kunpeng920 upstream-kernel series: Fix Committed Status in QEMU: Fix Released Status in qemu package in Ubuntu: In Progress Status in qemu source package in Bionic: In Progress Status in qemu source package in Disco: In Progress Status in qemu source package in Eoan: In Progress Status in qemu source package in Focal: In Progress Bug description: [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does
Re: [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
On Wed, May 6, 2020 at 1:20 PM Philippe Mathieu-Daudé <1805...@bugs.launchpad.net> wrote: > > Isn't this fixed by commit 5710a3e09f9? See comment #43. The discussions hence are about testing/integration of that fix. -dann -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Triaged Status in kunpeng920 ubuntu-20.04 series: Triaged Status in kunpeng920 upstream-kernel series: Fix Committed Status in QEMU: Fix Released Status in qemu package in Ubuntu: In Progress Status in qemu source package in Bionic: In Progress Status in qemu source package in Disco: In Progress Status in qemu source package in Eoan: In Progress Status in qemu source package in Focal: In Progress Bug description: [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760,
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Description changed: [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. + * dannf suggests we also check for performance regressions; e.g. how + long it takes to convert a cloud image on high-core systems. + [Other Info] - * Original Description bellow: - + * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
fyi, I backported that fix also to focal/groovy and eoan, and with those builds. On my test systems the hang reliable occurs within 20 iterations. After the fix, they have survived > 500 iterations thus far. I'll leave running overnight just to be sure. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: New Status in kunpeng920 ubuntu-18.04-hwe series: New Status in kunpeng920 ubuntu-19.10 series: New Status in kunpeng920 ubuntu-20.04 series: New Status in kunpeng920 upstream-kernel series: Fix Committed Status in QEMU: In Progress Status in qemu package in Ubuntu: Incomplete Status in qemu source package in Bionic: Incomplete Status in qemu source package in Disco: Incomplete Status in qemu source package in Eoan: Incomplete Status in qemu source package in Focal: Incomplete Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
fyi, what I tested in Comment #35 was upstream QEMU (@ aceeaa69d2) with a port of the patch in Comment #34 applied. I've attached that patch here. While it did avoid the issue in my testing, I agree with Rafael's Comment #36 that it does not appear to address the root cause (as I understand it), and is therefore unlikely something we'd ship in Ubuntu. ** Patch added: "comment-34-ported-to-upstream.patch" https://bugs.launchpad.net/qemu/+bug/1805256/+attachment/5313631/+files/comment-34-ported-to-upstream.patch -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Confirmed Status in QEMU: In Progress Status in qemu package in Ubuntu: Confirmed Status in qemu source package in Bionic: Confirmed Status in qemu source package in Disco: Confirmed Status in qemu source package in Eoan: In Progress Status in qemu source package in Focal: Confirmed Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
I tested the patch in Comment #34, and it was able to pass 500 iterations. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Confirmed Status in QEMU: In Progress Status in qemu package in Ubuntu: Confirmed Status in qemu source package in Bionic: Confirmed Status in qemu source package in Disco: Confirmed Status in qemu source package in Eoan: In Progress Status in qemu source package in Focal: Confirmed Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Changed in: kunpeng920 Status: New => Confirmed ** Changed in: qemu (Ubuntu Bionic) Status: New => Confirmed ** Changed in: qemu (Ubuntu Disco) Status: New => Confirmed ** Changed in: qemu (Ubuntu Focal) Status: New => Confirmed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Confirmed Status in QEMU: In Progress Status in qemu package in Ubuntu: Confirmed Status in qemu source package in Bionic: Confirmed Status in qemu source package in Disco: Confirmed Status in qemu source package in Eoan: In Progress Status in qemu source package in Focal: Confirmed Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions
Re: [PATCH-for-5.0] roms/edk2-funcs.sh: Use available GCC for ARM/Aarch64 targets
On Fri, Dec 06, 2019 at 06:07:58AM +0100, Philippe Mathieu-Daudé wrote: > On 12/5/19 8:35 PM, Laszlo Ersek wrote: > > On 12/05/19 17:50, Ard Biesheuvel wrote: > > > On Thu, 5 Dec 2019 at 16:27, Philippe Mathieu-Daudé > > > wrote: > > > > > > > > On 12/5/19 5:13 PM, Laszlo Ersek wrote: > > > > > Hi Phil, > > > > > > > > > > (+Ard) > > > > > > > > > > On 12/04/19 23:12, Philippe Mathieu-Daudé wrote: > > > > > > Centos 7.7 only provides cross GCC 4.8.5, but the script forces > > > > > > us to use GCC5. Since the same machinery is valid to check the > > > > > > GCC version, remove the $emulation_target check. > > > > > > > > > > > > $ cat /etc/redhat-release > > > > > > CentOS Linux release 7.7.1908 (Core) > > > > > > > > > > > > $ aarch64-linux-gnu-gcc -v 2>&1 | tail -1 > > > > > > gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) > > > > > > > > > > this patch is not correct, in my opinion. ARM / AARCH64 support in > > > > > edk2 > > > > > requires GCC5 as a minimum. It was never tested with an earlier > > > > > toolchain, to my understanding. Not on my part, anyway. > > > > > > > > > > To be more precise: when I tested cross-gcc toolchains earlier than > > > > > that, the ArmVirtQemu builds always failed. Minimally, those > > > > > toolchains > > > > > didn't recognize some of the AARCH64 system registers. > > > > > > > > > > If CentOS 7.7 does not provide a suitable (>=GCC5) toolchain, then we > > > > > can't build ArmVirtQemu binaries on CentOS 7.7, in my opinion. > > > > > > > > > > Personally, on my RHEL7 laptop, over time I've used the following > > > > > toolchains, to satisfy the GCC5 requirement of ArmVirtQemu (which > > > > > requirement I took as experimental evidence): > > > > > > > > > > - Initially (last quarter of 2014), I used binary distributions -- > > > > > tarballs -- of cross-binutils and cross-gcc, from Linaro. > > > > > > > > > > - Later (last quarter of 2016), I rebuilt some SRPMs that were at the > > > > > time Fedora-only for RHEL7. Namely: > > > > > > > > > > - cross-binutils-2.27-3.fc24 > > > > > https://koji.fedoraproject.org/koji/buildinfo?buildID=801348 > > > > > > > > > > - gcc-6.1.1-2.fc24 > > > > > https://koji.fedoraproject.org/koji/buildinfo?buildID=761767 > > > > > > > > > > - Most recently, I've been using cross-binutils updated from EPEL7: > > > > > > > > > > - cross-binutils-2.27-9.el7.1 > > > > > https://koji.fedoraproject.org/koji/buildinfo?buildID=918474 > > > > > > > > > > To my knowledge, there is still no suitable cross-compiler available > > > > > on > > > > > RHEL7, from any trustworthy RPM repository. So, to this day, I use > > > > > gcc-6.1.1-2 for cross-building ArmVirtQemu, on my RHEL7 laptop. > > > > > > > > > > Again: I believe it does not matter if the gcc-4.8.5-based > > > > > cross-compiler in CentOS 7 "happens" to work. That's a compiler that I > > > > > have never tested with, or vetted for, upstream ArmVirtQemu. > > > > > > > > > > Now, I realize that in edk2, we have stuff like > > > > > > > > > > GCC48_AARCH64_CC_FLAGS > > > > > > > > > > in "BaseTools/Conf/tools_def.template" -- coming from commit > > > > > 7a9dbf2c94d1 ("BaseTools/Conf/tools_def.template: drop ARM/AARCH > > > > > support > > > > > from GCC46/GCC47", 2019-01-08). That doesn't change the fact that I've > > > > > never built or tested ArmVirtQemu with such a compiler. And so this > > > > > patch makes me quite uncomfortable. > > > > > > > > > > If that rules out CentOS 7 as a QEMU project build / CI platform for > > > > > the > > > > > bundled ArmVirtQemu binaries, then we need a more recent platform > > > > > (perhaps CentOS 8, not sure). > > > > > > > > Unfortunately CentOS 8 is not available as a Docker image, which is a > > > > convenient way to build EDK2 in a CI. > > > > > > > > > I think it's also educational to check the origin of the code that > > > > > your > > > > > patch proposes to remove. Most recently it was moved around from a > > > > > different place, in QEMU commit 65a109ab4b1a ('roms: lift > > > > > "edk2-funcs.sh" from "tests/uefi-test-tools/build.sh"', 2019-04-17). > > > > > > > > > > In that commit, for some reason I didn't keep the original code > > > > > comments > > > > > (perhaps it would have been too difficult or messy to preserve the > > > > > comments sanely with the restructured / factored-out code). But, they > > > > > went like this (originally from commit 77db55fc8155, > > > > > "tests/uefi-test-tools: add build scripts", 2019-02-21): > > > > > > > > > > # Expose cross_prefix (which is possibly empty) to the edk2 tools. > > > > > While at it, > > > > > # determine the suitable edk2 toolchain as well. > > > > > # - For ARM and AARCH64, edk2 only offers the GCC5 toolchain tag, > > > > > which covers > > > > > # the gcc-5+ releases. > > > > > # - For IA32 and X64, edk2 offers the GCC44 through GCC49 toolchain > > > > > tags, in > > > > > # addition
Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Fri, Oct 11, 2019 at 06:05:25AM +, Jan Glauber wrote: > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote: > > On 09/10/19 10:02, Jan Glauber wrote: > > > > I'm still not sure what the actual issue is here, but could it be some bad > > > interaction between the notify_me and the list_lock? The are both 4 byte > > > and side-by-side: > > > > > > address notify_me: 0xdb528aa0 sizeof notify_me: 4 > > > address list_lock: 0xdb528aa4 sizeof list_lock: 4 > > > > > > AFAICS the generated code looks OK (all load/store exclusive done > > > with 32 bit size): > > > > > > e6c: 885ffc01ldaxr w1, [x0] > > > e70: 11000821add w1, w1, #0x2 > > > e74: 8802fc01stlxr w2, w1, [x0] > > > > > > ...but if I bump notify_me size to uint64_t the issue goes away. > > > > Ouch. :) Is this with or without my patch(es)? > > > > Also, what if you just add a dummy uint32_t after notify_me? > > With the dummy the testcase also runs fine for 500 iterations. > > Dann, can you try if this works on the Hi1620 too? On Hi1620, it hung on the first iteration. Here's the complete patch I'm running with: diff --git a/include/block/aio.h b/include/block/aio.h index 6b0d52f732..e6fd6f1a1a 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -82,7 +82,7 @@ struct AioContext { * Instead, the aio_poll calls include both the prepare and the * dispatch phase, hence a simple counter is enough for them. */ -uint32_t notify_me; +uint64_t notify_me; /* A lock to protect between QEMUBH and AioHandler adders and deleter, * and to ensure that no callbacks are removed while we're walking and diff --git a/util/async.c b/util/async.c index ca83e32c7f..024c4c567d 100644 --- a/util/async.c +++ b/util/async.c @@ -242,7 +242,7 @@ aio_ctx_check(GSource *source) aio_notify_accept(ctx); for (bh = ctx->first_bh; bh; bh = bh->next) { -if (bh->scheduled) { +if (atomic_mb_read(>scheduled)) { return true; } } @@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) void aio_notify(AioContext *ctx) { -/* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or + * atomic_add in aio_poll. */ -smp_mb(); -if (ctx->notify_me) { -event_notifier_set(>notifier); +if (atomic_mb_read(>notify_me)) { + event_notifier_set(>notifier); atomic_mb_set(>notified, true); } }
[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Fri, Oct 11, 2019 at 08:30:02AM +, Jan Glauber wrote: > On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote: > > On 11/10/19 08:05, Jan Glauber wrote: > > > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote: > > >>> ...but if I bump notify_me size to uint64_t the issue goes away. > > >> > > >> Ouch. :) Is this with or without my patch(es)? > > > > You didn't answer this question. > > Oh, sorry... I did but the mail probably didn't make it out. > I have both of your changes applied (as I think they make sense). > > > >> Also, what if you just add a dummy uint32_t after notify_me? > > > > > > With the dummy the testcase also runs fine for 500 iterations. > > > > You might be lucky and causing list_lock to be in another cache line. > > What if you add __attribute__((aligned(16)) to notify_me (and keep the > > dummy)? > > Good point. I'll try to force both into the same cacheline. On the Hi1620, this still hangs in the first iteration: diff --git a/include/block/aio.h b/include/block/aio.h index 6b0d52f732..00e56a5412 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -82,7 +82,7 @@ struct AioContext { * Instead, the aio_poll calls include both the prepare and the * dispatch phase, hence a simple counter is enough for them. */ -uint32_t notify_me; +__attribute__((aligned(16))) uint64_t notify_me; /* A lock to protect between QEMUBH and AioHandler adders and deleter, * and to ensure that no callbacks are removed while we're walking and diff --git a/util/async.c b/util/async.c index ca83e32c7f..024c4c567d 100644 --- a/util/async.c +++ b/util/async.c @@ -242,7 +242,7 @@ aio_ctx_check(GSource *source) aio_notify_accept(ctx); for (bh = ctx->first_bh; bh; bh = bh->next) { -if (bh->scheduled) { +if (atomic_mb_read(>scheduled)) { return true; } } @@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) void aio_notify(AioContext *ctx) { -/* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or + * atomic_add in aio_poll. */ -smp_mb(); -if (ctx->notify_me) { -event_notifier_set(>notifier); +if (atomic_mb_read(>notify_me)) { + event_notifier_set(>notifier); atomic_mb_set(>notified, true); } } -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: New Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Bionic: New Status in qemu source package in Disco: New Status in qemu source package in Eoan: In Progress Status in qemu source package in FF-Series: New Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP
Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote: > On 02/10/19 11:23, Jan Glauber wrote: > > I've looked into this on ThunderX2. The arm64 code generated for the > > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any > > memory barriers. It is just plain ldaxr/stlxr. > > > > From my understanding this is not sufficient for SMP sync. > > > > If I read this comment correct: > > > > void aio_notify(AioContext *ctx) > > { > > /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs > > * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. > > */ > > smp_mb(); > > if (ctx->notify_me) { > > > > it points out that the smp_mb() should be paired. But as > > I said the used atomics don't generate any barriers at all. > > Based on the rest of the thread, this patch should also fix the bug: > > diff --git a/util/async.c b/util/async.c > index 47dcbfa..721ea53 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source) > aio_notify_accept(ctx); > > for (bh = ctx->first_bh; bh; bh = bh->next) { > -if (bh->scheduled) { > +if (atomic_mb_read(>scheduled)) { > return true; > } > } > > > And also the memory barrier in aio_notify can actually be replaced > with a SEQ_CST load: > > diff --git a/util/async.c b/util/async.c > index 47dcbfa..721ea53 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) > > void aio_notify(AioContext *ctx) > { > -/* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs > - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. > +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before > + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or > + * atomic_add in aio_poll. > */ > -smp_mb(); > -if (ctx->notify_me) { > +if (atomic_mb_read(>notify_me)) { > event_notifier_set(>notifier); > atomic_mb_set(>notified, true); > } > > > Would you be able to test these (one by one possibly)? Paolo, I tried them both separately and together on a Hi1620 system, each time it hung in the first iteration. Here's a backtrace of a run with both patches applied: (gdb) thread apply all bt Thread 3 (Thread 0x8154b820 (LWP 63900)): #0 0x8b9402cc in __GI___sigtimedwait (set=, set@entry=0xf1e08070, info=info@entry=0x8154ad98, timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42 #1 0x8ba77fac in __sigwait (set=set@entry=0xf1e08070, sig=sig@entry=0x8154ae74) at ../sysdeps/unix/sysv/linux/sigwait.c:28 #2 0xb7dc1610 in sigwait_compat (opaque=0xf1e08070) at util/compatfd.c:35 #3 0xb7dc3e80 in qemu_thread_start (args=) at util/qemu-thread-posix.c:519 #4 0x8ba6d088 in start_thread (arg=0xceefbf4f) at pthread_create.c:463 #5 0x8b9dd4ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 2 (Thread 0x81d4c820 (LWP 63899)): #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 #1 0xb7dc4cd8 in qemu_futex_wait (val=, f=) at /home/ubuntu/qemu/include/qemu/futex.h:29 #2 qemu_event_wait (ev=ev@entry=0xb7e48708 ) at util/qemu-thread-posix.c:459 #3 0xb7ddf44c in call_rcu_thread (opaque=) at util/rcu.c:260 #4 0xb7dc3e80 in qemu_thread_start (args=) at util/qemu-thread-posix.c:519 #5 0x8ba6d088 in start_thread (arg=0xceefc05f) at pthread_create.c:463 #6 0x8b9dd4ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 1 (Thread 0x81e83010 (LWP 63898)): #0 0x8b9d4154 in __GI_ppoll (fds=0xf1e0dbc0, nfds=187650205809964, timeout=, timeout@entry=0x0, sigmask=0xceefbef0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xb7dbedb0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:340 #3 0xb7dbfd2c in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:236 #4 main_loop_wait (nonblocking=) at util/main-loop.c:517 #5 0xb7ce86e8 in convert_do_copy (s=0xceefc068) at qemu-img.c:2028 #6 img_convert (argc=, argv=) at qemu-img.c:2520 #7 0xb7ce1e54 in main (argc=8, argv=) at qemu-img.c:5097 > > I've tried to verify me theory with this patch and didn't run into the > > issue for ~500 iterations (usually I would trigger the issue ~20 > > iterations). > > Sorry for asking the obvious---500 iterations of what? $ for i in $(seq 1 500); do echo "==$i=="; ./qemu/qemu-img convert -p -f qcow2 -O qcow2 bionic-server-cloudimg-arm64.img out.img; done ==1== (37.19/100%) -dann
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Also affects: kunpeng920 Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: New Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Bionic: New Status in qemu source package in Disco: New Status in qemu source package in Eoan: In Progress Status in qemu source package in FF-Series: New Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions
Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Wed, Sep 11, 2019 at 04:09:25PM -0300, Rafael David Tinoco wrote: > > Zhengui's theory that notify_me doesn't work properly on ARM is more > > promising, but he couldn't provide a clear explanation of why he thought > > notify_me is involved. In particular, I would have expected notify_me to > > be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example: > > > > > > glib_pollfds_fill > > g_main_context_prepare > > aio_ctx_prepare > > atomic_or(>notify_me, 1) > > qemu_poll_ns > > glib_pollfds_poll > > g_main_context_check > > aio_ctx_check > > atomic_and(>notify_me, ~1) > > g_main_context_dispatch > > aio_ctx_dispatch > > /* do something for event */ > > qemu_poll_ns > > > > Paolo, > > I tried confining execution in a single NUMA domain (cpu & mem) and > still faced the issue, then, I added a mutex "ctx->notify_me_lcktest" > into context to protect "ctx->notify_me", like showed bellow, and it > seems to have either fixed or mitigated it. > > I was able to cause the hung once every 3 or 4 runs. I have already ran > qemu-img convert more than 30 times now and couldn't reproduce it again. > > Next step is to play with the barriers and check why existing ones > aren't enough for ordering access to ctx->notify_me ... or should I > try/do something else in your opinion ? > > This arch/machine (Huawei D06): > > $ lscpu > Architecture:aarch64 > Byte Order: Little Endian > CPU(s): 96 > On-line CPU(s) list: 0-95 > Thread(s) per core: 1 > Core(s) per socket: 48 > Socket(s): 2 > NUMA node(s):4 > Vendor ID: 0x48 > Model: 0 > Stepping:0x0 > CPU max MHz: 2000. > CPU min MHz: 200. > BogoMIPS:200.00 > L1d cache: 64K > L1i cache: 64K > L2 cache:512K > L3 cache:32768K > NUMA node0 CPU(s): 0-23 > NUMA node1 CPU(s): 24-47 > NUMA node2 CPU(s): 48-71 > NUMA node3 CPU(s): 72-95 > Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics > cpuid asimdrdm dcpop Note that I'm also seeing this on a ThunderX2 (same calltrace): $ lscpu Architecture:aarch64 Byte Order: Little Endian CPU(s): 224 On-line CPU(s) list: 0-223 Thread(s) per core: 4 Core(s) per socket: 28 Socket(s): 2 NUMA node(s):2 Vendor ID: Cavium Model: 1 Model name: ThunderX2 99xx Stepping:0x1 BogoMIPS:400.00 L1d cache: 32K L1i cache: 32K L2 cache:256K L3 cache:32768K NUMA node0 CPU(s): 0-111 NUMA node1 CPU(s): 112-223 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid asimdrdm -dann > > > diff --git a/include/block/aio.h b/include/block/aio.h > index 0ca25dfec6..0724086d91 100644 > --- a/include/block/aio.h > +++ b/include/block/aio.h > @@ -84,6 +84,7 @@ struct AioContext { > * dispatch phase, hence a simple counter is enough for them. > */ > uint32_t notify_me; > +QemuMutex notify_me_lcktest; > > /* A lock to protect between QEMUBH and AioHandler adders and deleter, > * and to ensure that no callbacks are removed while we're walking and > diff --git a/util/aio-posix.c b/util/aio-posix.c > index 51c41ed3c9..031d6e2997 100644 > --- a/util/aio-posix.c > +++ b/util/aio-posix.c > @@ -529,7 +529,9 @@ static bool run_poll_handlers(AioContext *ctx, > int64_t max_ns, int64_t *timeout) > bool progress; > int64_t start_time, elapsed_time; > > +qemu_mutex_lock(>notify_me_lcktest); > assert(ctx->notify_me); > +qemu_mutex_unlock(>notify_me_lcktest); > assert(qemu_lockcnt_count(>list_lock) > 0); > > trace_run_poll_handlers_begin(ctx, max_ns, *timeout); > @@ -601,8 +603,10 @@ bool aio_poll(AioContext *ctx, bool blocking) > * so disable the optimization now. > */ > if (blocking) { > +qemu_mutex_lock(>notify_me_lcktest); > assert(in_aio_context_home_thread(ctx)); > atomic_add(>notify_me, 2); > +qemu_mutex_unlock(>notify_me_lcktest); > } > > qemu_lockcnt_inc(>list_lock); > @@ -647,8 +651,10 @@ bool aio_poll(AioContext *ctx, bool blocking) > } > > if (blocking) { > +qemu_mutex_lock(>notify_me_lcktest); > atomic_sub(>notify_me, 2); > aio_notify_accept(ctx); > +qemu_mutex_unlock(>notify_me_lcktest); > } > > /* Adjust polling time */ > diff --git a/util/async.c b/util/async.c > index c10642a385..140e1e86f5 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -221,7 +221,9 @@ aio_ctx_prepare(GSource *source, gint*timeout) > { > AioContext *ctx = (AioContext *) source; > > +qemu_mutex_lock(>notify_me_lcktest); > atomic_or(>notify_me, 1); > +qemu_mutex_unlock(>notify_me_lcktest); > >
[Qemu-devel] [Bug 1824053] Re: Qemu-img convert appears to be stuck on aarch64 host with low probability
*** This bug is a duplicate of bug 1805256 *** https://bugs.launchpad.net/bugs/1805256 ** This bug has been marked a duplicate of bug 1805256 qemu-img hangs on high core count ARM system -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1824053 Title: Qemu-img convert appears to be stuck on aarch64 host with low probability Status in QEMU: Confirmed Bug description: Hi, I found a problem that qemu-img convert appears to be stuck on aarch64 host with low probability. The convert command line is "qemu-img convert -f qcow2 -O raw disk.qcow2 disk.raw ". The bt is below: Thread 2 (Thread 0x4b776e50 (LWP 27215)): #0 0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6 #1 0x4a39c60c in sigwait () from /lib64/libpthread.so.0 #2 0xaae82610 in sigwait_compat (opaque=0xc5163b00) at util/compatfd.c:37 #3 0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) at util/qemu_thread_posix.c:496 #4 0x4a3918bc in start_thread () from /lib64/libpthread.so.0 #5 0x4a492b2c in thread_start () from /lib64/libc.so.6 Thread 1 (Thread 0x4b573370 (LWP 27214)): #0 0x4a489020 in ppoll () from /lib64/libc.so.6 #1 0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=) at qemu_timer.c:391 #3 0xaadae014 in os_host_main_loop_wait (timeout=) at main_loop.c:272 #4 0xaadae190 in main_loop_wait (nonblocking=) at main_loop.c:534 #5 0xaad97be0 in convert_do_copy (s=0xdc32eb48) at qemu-img.c:1923 #6 0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414 #7 0xaad99ac4 in main (argc=7, argv=) at qemu-img.c:5305 The problem seems to be very similar to the phenomenon described by this patch (https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm- ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch), which force main loop wakeup with SIGIO. But this patch was reverted by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm- Revert-aio_notify-force-main-loop-wakeup-with-SIGIO-.patch). I can reproduce this problem with qemu.git/matser. It still exists in qemu.git/matser. I found that when an IO return in worker threads and want to call aio_notify to wake up main_loop, but it found that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by calling atomic_and(>notify_me, ~1) . So worker thread won't write enventfd to notify main_loop. If such a scene happens, the main_loop will hang: main loopworker thread1 worker thread2 -- qemu_poll_ns aio_worker qemu_bh_schedule(pool->completion_bh) glib_pollfds_poll g_main_context_check aio_ctx_check aio_worker atomic_and(>notify_me, ~1) qemu_bh_schedule(pool->completion_bh) /* do something for event */ qemu_poll_ns /* hangs !!!*/ As we known ,ctx->notify_me will be visited by worker thread and main loop. I thank we should add a lock protection for ctx->notify_me to avoid this happend. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1824053/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
** Also affects: qemu (Ubuntu) Importance: Undecided Status: New ** Changed in: qemu (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: Confirmed Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
No, sorry - this bugs still persists w/ latest upstream (@ afccfc0). I found a report of similar symptoms: https://patchwork.kernel.org/patch/10047341/ https://bugzilla.redhat.com/show_bug.cgi?id=1524770#c13 To be clear, ^ is already fixed upstream, so it is not the *same* issue - but perhaps related. ** Bug watch added: Red Hat Bugzilla #1524770 https://bugzilla.redhat.com/show_bug.cgi?id=1524770 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
** Changed in: qemu Status: New => Confirmed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
ext4 filesystem, SATA drive: (gdb) thread apply all bt Thread 3 (Thread 0x9bffc9a0 (LWP 9015)): #0 0xaaa462cc in __GI___sigtimedwait (set=, set@entry=0xe725c070, info=info@entry=0x9bffbf18, timeout=0x3ff1, timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42 #1 0xaab7dfac in __sigwait (set=set@entry=0xe725c070, sig=sig@entry=0x9bffbff4) at ../sysdeps/unix/sysv/linux/sigwait.c:28 #2 0xd998a628 in sigwait_compat (opaque=0xe725c070) at util/compatfd.c:36 #3 0xd998bce0 in qemu_thread_start (args=) at util/qemu-thread-posix.c:498 #4 0xaab73088 in start_thread (arg=0xc528531f) at pthread_create.c:463 #5 0xaaae34ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 2 (Thread 0xa0e779a0 (LWP 9014)): #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 #1 0xd998c9e8 in qemu_futex_wait (val=, f=) at /home/ubuntu/qemu/include/qemu/futex.h:29 #2 qemu_event_wait (ev=ev@entry=0xd9a091c0 ) at util/qemu-thread-posix.c:442 #3 0xd99a6834 in call_rcu_thread (opaque=) at util/rcu.c:261 #4 0xd998bce0 in qemu_thread_start (args=) at util/qemu-thread-posix.c:498 #5 0xaab73088 in start_thread (arg=0xc528542f) at pthread_create.c:463 #6 0xaaae34ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 1 (Thread 0xa0fa8010 (LWP 9013)): #0 0xaaada154 in __GI_ppoll (fds=0xe7291dc0, nfds=187650771816320, timeout=, timeout@entry=0x0, sigmask=0xc52852e0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xd9987f00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xd9988f80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xd98b7a30 in convert_do_copy (s=0xc52854e8) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xd98b033c in main (argc=7, argv=) at qemu-img.c:4975 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: New Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] [NEW] qemu-img hangs on high core count ARM system
Public bug reported: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: New Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1719196] Re: [arm64 ocata] newly created instances are unable to raise network interfaces
** Tags removed: verification-needed-zesty ** Tags added: verification-done-zesty -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1719196 Title: [arm64 ocata] newly created instances are unable to raise network interfaces Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive ocata series: Fix Committed Status in libvirt: New Status in QEMU: Fix Released Status in libvirt package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Zesty: Fix Committed Bug description: [Impact] * A change in qemu 2.8 (83d768b virtio: set ISR on dataplane notifications) broke virtio handling on platforms without a controller. Those encounter flaky networking due to missed IRQs * Fix is a backport of the upstream fix b4b9862b: virtio: Fix no interrupt when not creating msi controller [Test Case] * On Arm with Zesty (or Ocata) run a guest without PCI based devices * Example in e.g. c#23 * Without the fix the networking does not work reliably (as it losses IRQs), with the fix it works fine. [Regression Potential] * Changing the IRQ handling of virtio could affect virtio in general. But when reviwing the patch you'll see that it is small and actually only changes to enable IRQ on one more place. That could cause more IRQs than needed in the worst case, but those are usually not breaking but only slowing things down. Also this fix is upstream quite a while, increasing confidence. [Other Info] * There is currently 1720397 in flight in the SRU queue, so acceptance of this upload has to wait until that completes. --- arm64 Ocata , I'm testing to see I can get Ocata running on arm64 and using the openstack-base bundle to deploy it. I have added the bundle to the log file attached to this bug. When I create a new instance via nova, the VM comes up and runs, however fails to raise its eth0 interface. This occurs on both internal and external networks. ubuntu@openstackaw:~$ nova list +--+-+++-++ | ID | Name| Status | Task State | Power State | Networks | +--+-+++-++ | dcaf6d51-f81e-4cbd-ac77-0c5d21bde57c | sfeole1 | ACTIVE | - | Running | internal=10.5.5.3 | | aa0b8aee-5650-41f4-8fa0-aeccdc763425 | sfeole2 | ACTIVE | - | Running | internal=10.5.5.13 | +--+-+++-++ ubuntu@openstackaw:~$ nova show aa0b8aee-5650-41f4-8fa0-aeccdc763425 +--+--+ | Property | Value | +--+--+ | OS-DCF:diskConfig| MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | awrep3 | | OS-EXT-SRV-ATTR:hypervisor_hostname | awrep3.maas | | OS-EXT-SRV-ATTR:instance_name| instance-0003 | | OS-EXT-STS:power_state | 1 | | OS-EXT-STS:task_state| - | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2017-09-24T14:23:08.00 | | OS-SRV-USG:terminated_at | - | | accessIPv4 | | | accessIPv6 | | | config_drive | | | created | 2017-09-24T14:22:41Z | | flavor | m1.small (717660ae-0440-4b19-a762-ffeb32a0575c) | | hostId | 5612a00671c47255d2ebd6737a64ec9bd3a5866d1233ecf3e988b025 | | id | aa0b8aee-5650-41f4-8fa0-aeccdc763425 | |
[Qemu-devel] [Bug 1719196] Re: [arm64 ocata] newly created instances are unable to raise network interfaces
Thanks Christian - I've now verified this. I took a stepwise approach: 1) We originally observed this issue w/ the ocata cloud archive on xenial, so I redeployed that. I verified that I was still seeing the problem. I then created a PPA[*] w/ an arm64 build of QEMU from the ocata-staging PPA, which is a backport of the zesty-proposed package, and upgraded my nova-compute nodes to this version. I rebooted my test guests, and the problem was resolved. 2) I then updated my sources.list to point to zesty (w/ proposed enabled), and upgraded qemu-system-arm. This way I could test the actual build in zesty-proposed, as opposed to my backport. This continued to work. 3) Finally, I dist-upgraded this system from xenial to zesty - so that I'm actually testing the zesty build in a zesty environment, and rebooted. Still worked :) [*] https://launchpad.net/~dannf/+archive/ubuntu/lp1719196 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1719196 Title: [arm64 ocata] newly created instances are unable to raise network interfaces Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive ocata series: Fix Committed Status in libvirt: New Status in QEMU: Fix Released Status in libvirt package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Zesty: Fix Committed Bug description: [Impact] * A change in qemu 2.8 (83d768b virtio: set ISR on dataplane notifications) broke virtio handling on platforms without a controller. Those encounter flaky networking due to missed IRQs * Fix is a backport of the upstream fix b4b9862b: virtio: Fix no interrupt when not creating msi controller [Test Case] * On Arm with Zesty (or Ocata) run a guest without PCI based devices * Example in e.g. c#23 * Without the fix the networking does not work reliably (as it losses IRQs), with the fix it works fine. [Regression Potential] * Changing the IRQ handling of virtio could affect virtio in general. But when reviwing the patch you'll see that it is small and actually only changes to enable IRQ on one more place. That could cause more IRQs than needed in the worst case, but those are usually not breaking but only slowing things down. Also this fix is upstream quite a while, increasing confidence. [Other Info] * There is currently 1720397 in flight in the SRU queue, so acceptance of this upload has to wait until that completes. --- arm64 Ocata , I'm testing to see I can get Ocata running on arm64 and using the openstack-base bundle to deploy it. I have added the bundle to the log file attached to this bug. When I create a new instance via nova, the VM comes up and runs, however fails to raise its eth0 interface. This occurs on both internal and external networks. ubuntu@openstackaw:~$ nova list +--+-+++-++ | ID | Name| Status | Task State | Power State | Networks | +--+-+++-++ | dcaf6d51-f81e-4cbd-ac77-0c5d21bde57c | sfeole1 | ACTIVE | - | Running | internal=10.5.5.3 | | aa0b8aee-5650-41f4-8fa0-aeccdc763425 | sfeole2 | ACTIVE | - | Running | internal=10.5.5.13 | +--+-+++-++ ubuntu@openstackaw:~$ nova show aa0b8aee-5650-41f4-8fa0-aeccdc763425 +--+--+ | Property | Value | +--+--+ | OS-DCF:diskConfig| MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | awrep3 | | OS-EXT-SRV-ATTR:hypervisor_hostname | awrep3.maas | | OS-EXT-SRV-ATTR:instance_name| instance-0003 | | OS-EXT-STS:power_state | 1 | | OS-EXT-STS:task_state| - | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2017-09-24T14:23:08.00 | |
[Qemu-devel] [Bug 1719196] Re: [arm64 ocata] newly created instances are unable to raise network interfaces
Thanks so much for doing that Sean. Omitting expected changes (uuid, mac address, etc), here's are the significant changes I see: 1) N uses the QEMU 'virt' model, O uses 'virt-2.8' 2) N and O both expose a pci root, but N also exposed 2 PCI bridges that O does not. 3) N exposes an additional serial device. 4) N and O both use an apparmor seclabel. However, O also has a DAC model. #4 is the most interesting to me. Is there a way to configure ocata nova to not enable DAC? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1719196 Title: [arm64 ocata] newly created instances are unable to raise network interfaces Status in libvirt: New Status in QEMU: New Bug description: arm64 Ocata , I'm testing to see I can get Ocata running on arm64 and using the openstack-base bundle to deploy it. I have added the bundle to the log file attached to this bug. When I create a new instance via nova, the VM comes up and runs, however fails to raise its eth0 interface. This occurs on both internal and external networks. ubuntu@openstackaw:~$ nova list +--+-+++-++ | ID | Name| Status | Task State | Power State | Networks | +--+-+++-++ | dcaf6d51-f81e-4cbd-ac77-0c5d21bde57c | sfeole1 | ACTIVE | - | Running | internal=10.5.5.3 | | aa0b8aee-5650-41f4-8fa0-aeccdc763425 | sfeole2 | ACTIVE | - | Running | internal=10.5.5.13 | +--+-+++-++ ubuntu@openstackaw:~$ nova show aa0b8aee-5650-41f4-8fa0-aeccdc763425 +--+--+ | Property | Value | +--+--+ | OS-DCF:diskConfig| MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | awrep3 | | OS-EXT-SRV-ATTR:hypervisor_hostname | awrep3.maas | | OS-EXT-SRV-ATTR:instance_name| instance-0003 | | OS-EXT-STS:power_state | 1 | | OS-EXT-STS:task_state| - | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2017-09-24T14:23:08.00 | | OS-SRV-USG:terminated_at | - | | accessIPv4 | | | accessIPv6 | | | config_drive | | | created | 2017-09-24T14:22:41Z | | flavor | m1.small (717660ae-0440-4b19-a762-ffeb32a0575c) | | hostId | 5612a00671c47255d2ebd6737a64ec9bd3a5866d1233ecf3e988b025 | | id | aa0b8aee-5650-41f4-8fa0-aeccdc763425 | | image| zestynosplash (e88fd1bd-f040-44d8-9e7c-c462ccf4b945) | | internal network | 10.5.5.13 | | key_name | mykey | | metadata | {} | | name | sfeole2 | | os-extended-volumes:volumes_attached | [] | | progress | 0 | | security_groups | default | | status | ACTIVE | | tenant_id| 9f7a21c1ad264fec81abc09f3960ad1d
[Qemu-devel] [PATCH] seccomp: loosen library version dependency
Drop the libseccomp required version back to 2.1.0, restoring the ability to build w/ --enable-seccomp on Ubuntu 14.04. Commit 4cc47f8b3cc4f32586ba2f7fce1dc267da774a69 tightened the dependency on libseccomp from version 2.1.0 to 2.1.1. This broke building on Ubuntu 14.04, the current Ubuntu LTS release. The commit message didn't mention any specific functional need for 2.1.1, just that it was the most recent stable version at the time. I reviewed the changes between 2.1.0 and 2.1.1, but it looks like that update just contained minor fixes and cleanups - no obvious (to me) new interfaces or critical bug fixes. Signed-off-by: dann frazier <dann.fraz...@canonical.com> --- configure | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure b/configure index 7d5aab2..8a9794b 100755 --- a/configure +++ b/configure @@ -1878,7 +1878,7 @@ fi if test "$seccomp" != "no" ; then case "$cpu" in i386|x86_64) -libseccomp_minver="2.1.1" +libseccomp_minver="2.1.0" ;; arm|aarch64) libseccomp_minver="2.2.3" -- 2.6.1
[Qemu-devel] [PATCH] seccomp: loosen library version dependency
Drop the libseccomp required version back to 2.1.0, restoring the ability to build w/ --enable-seccomp on Ubuntu 14.04. Commit 4cc47f8b3cc4f32586ba2f7fce1dc267da774a69 tightened the dependency on libseccomp from version 2.1.0 to 2.1.1. This broke building on Ubuntu 14.04, the current Ubuntu LTS release. The commit message didn't mention any specific functional need for 2.1.1, just that it was the most recent stable version at the time. I reviewed the changes between 2.1.0 and 2.1.1, but it looks like that update just contained minor fixes and cleanups - no obvious (to me) new interfaces or critical bug fixes. Signed-off-by: dann frazier <dann.fraz...@canonical.com> --- configure | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 913ae4a..b6f4694 100755 --- a/configure +++ b/configure @@ -1873,13 +1873,13 @@ fi if test "$seccomp" != "no" ; then if test "$cpu" = "i386" || test "$cpu" = "x86_64" && -$pkg_config --atleast-version=2.1.1 libseccomp; then +$pkg_config --atleast-version=2.1.0 libseccomp; then libs_softmmu="$libs_softmmu `$pkg_config --libs libseccomp`" QEMU_CFLAGS="$QEMU_CFLAGS `$pkg_config --cflags libseccomp`" seccomp="yes" else if test "$seccomp" = "yes"; then -feature_not_found "libseccomp" "Install libseccomp devel >= 2.1.1" +feature_not_found "libseccomp" "Install libseccomp devel >= 2.1.0" fi seccomp="no" fi -- 2.6.1
[Qemu-devel] [Bug 1289527] Re: qemu-aarch64-static: java dies with SIGILL
Using 2.1+dfsg-2ubuntu2 from utopic within a trusty ubuntu core: # /usr/bin/qemu-aarch64-static -d unimp /usr/bin/java host mmap_min_addr=0x1 Reserved 0x12000 bytes of guest address space Relocating guest address space from 0x0040 to 0x40 guest_base 0x0 startend size prot 0040-00401000 1000 r-x 0041-00412000 2000 rw- 0040-00401000 1000 --- 00401000-004000801000 0080 rw- 004000801000-00400081c000 0001b000 r-x 00400081c000-00400082c000 0001 --- 00400082c000-00400082f000 3000 rw- start_brk 0x end_code0x00400834 start_code 0x0040 start_data 0x00410db0 end_data0x00411030 start_stack 0x0040008007b0 brk 0x00411038 entry 0x004000801f80 qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault (core dumped) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1289527 Title: qemu-aarch64-static: java dies with SIGILL Status in QEMU: New Status in “qemu” package in Ubuntu: Incomplete Bug description: qemu-aarch64-static from qemu-user-static 1.7.0+dfsg-3ubuntu5 (I haven't tried reproducing w/ upstream git yet) In an arm64 trusty chroot on an amd64 system: dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$ java # # A fatal error has been detected by the Java Runtime Environment: # # SIGILL (0x4) at pc=0x0040098e8070, pid=15034, tid=274902467056 # # JRE version: (7.0_51-b31) (build ) # Java VM: OpenJDK 64-Bit Server VM (25.0-b59 mixed mode linux-aarch64 compressed oops) # Problematic frame: # v ~BufferBlob::flush_icache_stub # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again # # An error report file with more information is saved as: # /home/dannf/hs_err_pid15034.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # qemu: uncaught target signal 6 (Aborted) - core dumped Aborted (core dumped) dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$ To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1289527/+subscriptions
[Qemu-devel] [Bug 1289527] Re: qemu-aarch64-static: java dies with SIGILL
I'm also seeing a SEGV (not a SIGILL) when testing the version of QEMU that shipped in trusty. So, we might just consider this bug fixed and track this segfault issue separately. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1289527 Title: qemu-aarch64-static: java dies with SIGILL Status in QEMU: New Status in “qemu” package in Ubuntu: Incomplete Bug description: qemu-aarch64-static from qemu-user-static 1.7.0+dfsg-3ubuntu5 (I haven't tried reproducing w/ upstream git yet) In an arm64 trusty chroot on an amd64 system: dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$ java # # A fatal error has been detected by the Java Runtime Environment: # # SIGILL (0x4) at pc=0x0040098e8070, pid=15034, tid=274902467056 # # JRE version: (7.0_51-b31) (build ) # Java VM: OpenJDK 64-Bit Server VM (25.0-b59 mixed mode linux-aarch64 compressed oops) # Problematic frame: # v ~BufferBlob::flush_icache_stub # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again # # An error report file with more information is saved as: # /home/dannf/hs_err_pid15034.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # qemu: uncaught target signal 6 (Aborted) - core dumped Aborted (core dumped) dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$ To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1289527/+subscriptions
Re: [Qemu-devel] Call for testing QEMU aarch64-linux-user emulation
On Tue, Feb 25, 2014 at 1:39 AM, Alex Bennée alex.ben...@linaro.org wrote: Dann Frazier dann.fraz...@canonical.com writes: On Mon, Feb 17, 2014 at 6:40 AM, Alex Bennée alex.ben...@linaro.org wrote: Hi, Thanks to all involved for your work here! After a solid few months of work the QEMU master branch [1] has now reached instruction feature parity with the suse-1.6 [6] tree that a lot of people have been using to build various aarch64 binaries. In addition to the snip I've tested against the following aarch64 rootfs: * SUSE [2] * Debian [3] * Ubuntu Saucy [4] fyi, I've been doing my testing with Ubuntu Trusty. Good stuff, I shall see if I can set one up. Is the package coverage between trusty and saucy much different? I noticed for example I couldn't find zile and various build-deps for llvm. Oops, must've missed this question before. I'd say in general they should be quite similar, but there are obviously exceptions (zile is one). I'm not sure why it was omitted. Also - I've found an issue with running OpenJDK in the latest upstream git: root@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:/root# java # [thread 274902467056 also had an error]# A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x, pid=9960, tid=297441608176 # # JRE version: OpenJDK Runtime Environment (7.0_51-b31) (build 1.7.0_51-b31) # Java VM: OpenJDK 64-Bit Server VM (25.0-b59 mixed mode linux-aarch64 compressed oops) # Problematic frame: # qemu: uncaught target signal 6 (Aborted) - core dumped Aborted (core dumped) This is openjdk-7-jre-headless, version 7u51-2.4.5-1ubuntu1. I'm not sure what debug info would be useful here, but let me know and I can collect it.
[Qemu-devel] [Bug 1289527] Re: qemu-aarch64-static: java dies with SIGILL
** Also affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1289527 Title: qemu-aarch64-static: java dies with SIGILL Status in QEMU: New Status in “qemu” package in Ubuntu: Confirmed Bug description: qemu-aarch64-static from qemu-user-static 1.7.0+dfsg-3ubuntu5 (I haven't tried reproducing w/ upstream git yet) In an arm64 trusty chroot on an amd64 system: dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$ java # # A fatal error has been detected by the Java Runtime Environment: # # SIGILL (0x4) at pc=0x0040098e8070, pid=15034, tid=274902467056 # # JRE version: (7.0_51-b31) (build ) # Java VM: OpenJDK 64-Bit Server VM (25.0-b59 mixed mode linux-aarch64 compressed oops) # Problematic frame: # v ~BufferBlob::flush_icache_stub # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again # # An error report file with more information is saved as: # /home/dannf/hs_err_pid15034.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # qemu: uncaught target signal 6 (Aborted) - core dumped Aborted (core dumped) dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$ To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1289527/+subscriptions
[Qemu-devel] [Bug 1285363] Re: qemu-aarch64-static segfaults
@Serge: I can confirm that this is fixed in 1.7.0+dfsg-3ubuntu5sig1 from your ppa. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1285363 Title: qemu-aarch64-static segfaults Status in QEMU: New Status in “qemu” package in Ubuntu: Confirmed Bug description: I've found a couple conditions that causes qemu-user-static to core dump fairly reliably - same with upstream git - while a binary built from suse's aarch64-1.6 branch seems to consistently work fine. Testing suggests they are resolved by the sigprocmask wrapper patches included in suse's tree. 1) dh_fixperms is a script that commonly runs at the end of a package build. Its basically doing a `find | xargs chmod`. 2) debootstrap --second-stage This is used to configure an arm64 chroot that was built using debootstrap on a non-native host. It is basically invoking a bunch of shell scripts (postinst, etc). When it blows up, the stack consistently looks like this: Core was generated by `/usr/bin/qemu-aarch64-static /bin/sh -e /debootstrap/debootstrap --second-stage'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0, __dest=0x400082c330) at /usr/include/x86_64-linux-gnu/bits/string3.h:51 51 return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); (gdb) bt #0 0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0, __dest=0x400082c330) at /usr/include/x86_64-linux-gnu/bits/string3.h:51 #1 stq_p (v=274886476624, ptr=0x400082c330) at /mnt/qemu.upstream/include/qemu/bswap.h:280 #2 stq_le_p (v=274886476624, ptr=0x400082c330) at /mnt/qemu.upstream/include/qemu/bswap.h:315 #3 target_setup_sigframe (set=0x7fff62ae3530, env=0x62d9c678, sf=0x400082b0d0) at /mnt/qemu.upstream/linux-user/signal.c:1167 #4 target_setup_frame (usig=usig@entry=17, ka=ka@entry=0x604ec1e0 sigact_table+512, info=info@entry=0x0, set=set@entry=0x7fff62ae3530, env=env@entry=0x62d9c678) at /mnt/qemu.upstream/linux-user/signal.c:1286 #5 0x60059f46 in setup_frame (env=0x62d9c678, set=0x7fff62ae3530, ka=0x604ec1e0 sigact_table+512, sig=17) at /mnt/qemu.upstream/linux-user/signal.c:1322 #6 process_pending_signals (cpu_env=cpu_env@entry=0x62d9c678) at /mnt/qemu.upstream/linux-user/signal.c:5747 #7 0x60056e60 in cpu_loop (env=env@entry=0x62d9c678) at /mnt/qemu.upstream/linux-user/main.c:1082 #8 0x60005079 in main (argc=optimized out, argv=optimized out, envp=optimized out) at /mnt/qemu.upstream/linux-user/main.c:4374 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1285363/+subscriptions
Re: [Qemu-devel] Call for testing QEMU aarch64-linux-user emulation
[Adding Alex Barcelo to the CC] On Thu, Feb 27, 2014 at 6:20 AM, Michael Matz m...@suse.de wrote: Hi, On Wed, 26 Feb 2014, Dann Frazier wrote: I've narrowed down the changes that seem to prevent both types of segfaults to the following changes that introduce a wrapper around sigprocmask: https://github.com/susematz/qemu/commit/f1542ae9fe10d5a241fc2624ecaef5f0948e3472 https://github.com/susematz/qemu/commit/4e5e1607758841c760cda4652b0ee7a6bc6eb79d https://github.com/susematz/qemu/commit/63eb8d3ea58f58d5857153b0c632def1bbd05781 I'm not sure if this is a real fix or just papering over my issue - It's fixing the issue, but strictly speaking introduces an QoI problem. SIGSEGV must not be controllable by the guest, it needs to be always deliverable to qemu; that is what's fixed. The QoI problem introduced is that with the implementation as is, the fiddling with SIGSEGV is detectable by the guest. E.g. if it installs a segv handler, blocks segv, then forces a segfault, checks that it didn't arrive, then unblocks segv and checks that it now arrives, such testcase would be able to detect that in fact it couldn't block SIGSEGV. Luckily for us, the effect of blocking SIGSEGV and then generating one in other ways than kill/sigqueue/raise (e.g. by writing to NULL) are undefined, so in practice that QoI issue doesn't matter. To fix also that latter part it'd need a further per-thread flag segv_blocked_p which needs to be checked before actually delivering a guest-directed SIGSEGV (in comparison to a qemu-directed SEGV), and otherwise requeue it. That's made a bit complicated when the SEGV was process-directed (not thread-directed) because in that case it needs to be delivered as long as there's _any_ thread which has it unblocked. So given the above undefinedness for sane uses of SEGVs it didn't seem worth the complication of having an undetectable virtualization of SIGSEGV. Thanks for the explanation. but either way, are these changes reasonable for upstream submission? IIRC the first two commits (from Alex Barcelo) were submitted once in the past, but fell through the cracks. Alex: are you interested in resubmitting these - or would you like me to attempt to on your behalf? -dann Ciao, Michael.
Re: [Qemu-devel] Call for testing QEMU aarch64-linux-user emulation
On Tue, Feb 25, 2014 at 1:39 AM, Alex Bennée alex.ben...@linaro.org wrote: Dann Frazier dann.fraz...@canonical.com writes: On Mon, Feb 17, 2014 at 6:40 AM, Alex Bennée alex.ben...@linaro.org wrote: Hi, Thanks to all involved for your work here! After a solid few months of work the QEMU master branch [1] has now reached instruction feature parity with the suse-1.6 [6] tree that a lot of people have been using to build various aarch64 binaries. In addition to the snip I've tested against the following aarch64 rootfs: * SUSE [2] * Debian [3] * Ubuntu Saucy [4] fyi, I've been doing my testing with Ubuntu Trusty. Good stuff, I shall see if I can set one up. Is the package coverage between trusty and saucy much different? I noticed for example I couldn't find zile and various build-deps for llvm. snip Feedback I'm interested in == * Any instruction failure (please include the log line with the unsupported message) * Any aarch64 specific failures (i.e. not generic QEMU threading flakeiness). I'm not sure if this qualifies as generic QEMU threading flakiness or not. I've found a couple conditions that causes master to core dump fairly reliably, while the aarch64-1.6 branch seems to consistently work fine. 1) dh_fixperms is a script that commonly runs at the end of a package build. Its basically doing a `find | xargs chmod`. 2) debootstrap --second-stage This is used to configure an arm64 chroot that was built using debootstrap on a non-native host. It is basically invoking a bunch of shell scripts (postinst, etc). When it blows up, the stack consistently looks like this: I've narrowed down the changes that seem to prevent both types of segfaults to the following changes that introduce a wrapper around sigprocmask: https://github.com/susematz/qemu/commit/f1542ae9fe10d5a241fc2624ecaef5f0948e3472 https://github.com/susematz/qemu/commit/4e5e1607758841c760cda4652b0ee7a6bc6eb79d https://github.com/susematz/qemu/commit/63eb8d3ea58f58d5857153b0c632def1bbd05781 I'm not sure if this is a real fix or just papering over my issue - but either way, are these changes reasonable for upstream submission? -dann Core was generated by `/usr/bin/qemu-aarch64-static /bin/sh -e /debootstrap/debootstrap --second-stage'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0, __dest=0x400082c330) at /usr/include/x86_64-linux-gnu/bits/string3.h:51 51 return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); (gdb) bt #0 0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0, __dest=0x400082c330) at /usr/include/x86_64-linux-gnu/bits/string3.h:51 #1 stq_p (v=274886476624, ptr=0x400082c330) at /mnt/qemu.upstream/include/qemu/bswap.h:280 #2 stq_le_p (v=274886476624, ptr=0x400082c330) at /mnt/qemu.upstream/include/qemu/bswap.h:315 #3 target_setup_sigframe (set=0x7fff62ae3530, env=0x62d9c678, sf=0x400082b0d0) at /mnt/qemu.upstream/linux-user/signal.c:1167 #4 target_setup_frame (usig=usig@entry=17, ka=ka@entry=0x604ec1e0 sigact_table+512, info=info@entry=0x0, set=set@entry=0x7fff62ae3530, env=env@entry=0x62d9c678) at /mnt/qemu.upstream/linux-user/signal.c:1286 #5 0x60059f46 in setup_frame (env=0x62d9c678, set=0x7fff62ae3530, ka=0x604ec1e0 sigact_table+512, sig=17) at /mnt/qemu.upstream/linux-user/signal.c:1322 #6 process_pending_signals (cpu_env=cpu_env@entry=0x62d9c678) at /mnt/qemu.upstream/linux-user/signal.c:5747 #7 0x60056e60 in cpu_loop (env=env@entry=0x62d9c678) at /mnt/qemu.upstream/linux-user/main.c:1082 #8 0x60005079 in main (argc=optimized out, argv=optimized out, envp=optimized out) at /mnt/qemu.upstream/linux-user/main.c:4374 There are some pretty large differences between these trees with respect to signal syscalls - is that the likely culprit? Quite likely. We explicitly concentrated on the arch64 specific instruction emulation leaving more generic patches to flow in from SUSE as they matured. I guess it's time to go through the remaining patches and see what's up-streamable. Alex/Michael, Are any of these patches in flight now? Cheers, -- Alex Bennée QEMU/KVM Hacker for Linaro
Re: [Qemu-devel] Call for testing QEMU aarch64-linux-user emulation
On Mon, Feb 17, 2014 at 6:40 AM, Alex Bennée alex.ben...@linaro.org wrote: Hi, Thanks to all involved for your work here! After a solid few months of work the QEMU master branch [1] has now reached instruction feature parity with the suse-1.6 [6] tree that a lot of people have been using to build various aarch64 binaries. In addition to the SUSE work we have fixed numerous edge cases and finished off classes of instructions. All instructions have been verified with Peter's RISU random instruction testing tool. I have also built and run many packages as well as built gcc and passed most of the aarch64 specific tests. I've tested against the following aarch64 rootfs: * SUSE [2] * Debian [3] * Ubuntu Saucy [4] fyi, I've been doing my testing with Ubuntu Trusty. In my tree the remaining insns that the GCC aarch64 tests need to implement are: FRECPE FRECPX CLS (2 misc variant) CLZ (2 misc variant) FSQRT FRINTZ FCVTZS Which I'm currently working though now. However for most build tasks I expect the instructions in master [1] will be enough. If you want the latest instructions working their way to mainline you are free to use my tree [5] which currently has: * Additional NEON/SIMD instructions * sendmsg syscall * Improved helper scripts for setting up binfmt_misc * The ability to set QEMU_LOG_FILENAME to /path/to/something-%d.log - this is useful when tests are failing N-levels deep as %d is replaced with the pid Feedback I'm interested in == * Any instruction failure (please include the log line with the unsupported message) * Any aarch64 specific failures (i.e. not generic QEMU threading flakeiness). I'm not sure if this qualifies as generic QEMU threading flakiness or not. I've found a couple conditions that causes master to core dump fairly reliably, while the aarch64-1.6 branch seems to consistently work fine. 1) dh_fixperms is a script that commonly runs at the end of a package build. Its basically doing a `find | xargs chmod`. 2) debootstrap --second-stage This is used to configure an arm64 chroot that was built using debootstrap on a non-native host. It is basically invoking a bunch of shell scripts (postinst, etc). When it blows up, the stack consistently looks like this: Core was generated by `/usr/bin/qemu-aarch64-static /bin/sh -e /debootstrap/debootstrap --second-stage'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0, __dest=0x400082c330) at /usr/include/x86_64-linux-gnu/bits/string3.h:51 51 return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); (gdb) bt #0 0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0, __dest=0x400082c330) at /usr/include/x86_64-linux-gnu/bits/string3.h:51 #1 stq_p (v=274886476624, ptr=0x400082c330) at /mnt/qemu.upstream/include/qemu/bswap.h:280 #2 stq_le_p (v=274886476624, ptr=0x400082c330) at /mnt/qemu.upstream/include/qemu/bswap.h:315 #3 target_setup_sigframe (set=0x7fff62ae3530, env=0x62d9c678, sf=0x400082b0d0) at /mnt/qemu.upstream/linux-user/signal.c:1167 #4 target_setup_frame (usig=usig@entry=17, ka=ka@entry=0x604ec1e0 sigact_table+512, info=info@entry=0x0, set=set@entry=0x7fff62ae3530, env=env@entry=0x62d9c678) at /mnt/qemu.upstream/linux-user/signal.c:1286 #5 0x60059f46 in setup_frame (env=0x62d9c678, set=0x7fff62ae3530, ka=0x604ec1e0 sigact_table+512, sig=17) at /mnt/qemu.upstream/linux-user/signal.c:1322 #6 process_pending_signals (cpu_env=cpu_env@entry=0x62d9c678) at /mnt/qemu.upstream/linux-user/signal.c:5747 #7 0x60056e60 in cpu_loop (env=env@entry=0x62d9c678) at /mnt/qemu.upstream/linux-user/main.c:1082 #8 0x60005079 in main (argc=optimized out, argv=optimized out, envp=optimized out) at /mnt/qemu.upstream/linux-user/main.c:4374 There are some pretty large differences between these trees with respect to signal syscalls - is that the likely culprit? -dann If you need to catch me in real time I'm available on #qemu (stsquad) and #linaro-virtualization (ajb-linaro). Many thanks to the SUSE guys for getting the aarch64 train rolling. I hope your happy with the final result ;-) Cheers, -- Alex Bennée QEMU/KVM Hacker for Linaro [1] git://git.qemu.org/qemu.git master [2] http://download.opensuse.org/ports/aarch64/distribution/13.1/appliances/openSUSE-13.1-ARM-JeOS.aarch64-rootfs.aarch64-1.12.1-Build32.1.tbz [3] http://people.debian.org/~wookey/bootstrap/rootfs/debian-unstable-arm64.tar.gz [4] http://people.debian.org/~wookey/bootstrap/rootfs/saucy-arm64.tar.gz [5] https://github.com/stsquad/qemu/tree/ajb-a64-working [6] https://github.com/susematz/qemu/tree/aarch64-1.6
[Qemu-devel] [PATCH] e1000: Don't set the Capabilities List bit
[Originally sent to qemu-kvm list, but I was redirected here] The Capabilities Pointer is NULL, so this bit shouldn't be set. The state of this bit doesn't appear to change any behavior on Linux/Windows versions we've tested, but it does cause Windows' PCI/PCI Express Compliance Test to balk. I happen to have a physical 82540EM controller, and it also sets the Capabilities Bit, but it actually has items on the capabilities list to go with it :) Signed-off-by: dann frazier dann.fraz...@canonical.com --- hw/e1000.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/hw/e1000.c b/hw/e1000.c index 6a3a941..ce8fc8b 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -1151,8 +1151,6 @@ static int pci_e1000_init(PCIDevice *pci_dev) pci_conf = d-dev.config; -/* TODO: we have no capabilities, so why is this bit set? */ -pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST); /* TODO: RST# value should be 0, PCI spec 6.2.4 */ pci_conf[PCI_CACHE_LINE_SIZE] = 0x10; -- 1.7.6.3