Re: [PATCH 1/2] audio: remove qemu_spice_audio_init()

2020-12-15 Thread dann frazier
On Tue, Dec 15, 2020 at 09:07:19AM +0100, Gerd Hoffmann wrote:
> > > +if (using_spice) {
> > > +/*
> > > + * When using spice allow the spice audio driver being picked
> > > + * as default.
> > > + *
> > > + * Temporary hack.  Using audio devices without explicit
> > > + * audiodev= property is already deprecated.  Same goes for
> > > + * the -soundhw switch.  Once this support gets finally
> > > + * removed we can also drop the concept of a default audio
> > > + * backend and this can go away.
> > > + */
> > > +driver = audio_driver_lookup("spice");
> > > +driver->can_be_default = 1;
> > 
> > fyi, one of my libvirt/QEMU guests now segfaults here.
> > See: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=977301
> 
> Hmm, surely doesn't hurt to add a "if (driver)" check here.
> 
> I'm wondering though how you end up with spice being enabled
> but spiceaudio driver not being available.  There is no separate
> config switch so you should have both spice + spiceaudio or
> none of them ...

Beats me. I'm seeing this for all of my guests, which I believe were
just created with virt-install or virtinst. Here's the log, in
case it helps:

2020-12-13 17:49:15.028+: starting up libvirt version: 6.9.0, package: 1+b2 
(amd64 / i386 Build Daemon (x86-ubc-01) 
 Mon, 07 Dec 2020 09:45:52 +), 
qemu version: 5.2.0Debian 1:5.2+dfsg-2, kernel: 5.9.0-1-amd64, hostname: 
xps13.dannf
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
HOME=/var/lib/libvirt/qemu/domain-4-debian10 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-4-debian10/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-4-debian10/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-4-debian10/.config \
QEMU_AUDIO_DRV=spice \
/usr/bin/qemu-system-x86_64 \
-name guest=debian10,debug-threads=on \
-S \
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-debian10/master-key.aes
 \
-machine 
pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,memory-backend=pc.ram
 \
-cpu 
Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off
 \
-m 1024 \
-object memory-backend-ram,id=pc.ram,size=1073741824 \
-overcommit mem-lock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid 623816ab-33f1-420d-9fc6-2a11afb5715d \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=36,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot menu=on,strict=on \
-device 
pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2
 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 \
-blockdev 
'{"driver":"file","filename":"/var/lib/libvirt/images/debian10.raw","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}'
 \
-blockdev 
'{"node-name":"libvirt-3-format","read-only":false,"driver":"raw","file":"libvirt-3-storage"}'
 \
-device 
virtio-blk-pci,bus=pci.4,addr=0x0,drive=libvirt-3-format,id=virtio-disk0,bootindex=1
 \
-blockdev 
'{"driver":"file","filename":"/var/lib/libvirt/images/debian10-seed.img","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}'
 \
-blockdev 
'{"node-name":"libvirt-2-format","read-only":false,"driver":"raw","file":"libvirt-2-storage"}'
 \
-device 
virtio-blk-pci,bus=pci.7,addr=0x0,drive=libvirt-2-format,id=virtio-disk1 \
-device ide-cd,bus=ide.0,id=sata0-0-0 \
-netdev tap,fd=38,id=hostnet0,vhost=on,vhostfd=39 \
-device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:86:da:65,bus=pci.1,addr=0x0 
\
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=40,server,nowait \
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 \
-chardev spicevmc,id=charchannel1,name=vdagent \
-device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-spice 

Re: [PATCH 1/2] audio: remove qemu_spice_audio_init()

2020-12-13 Thread dann frazier
On Wed, Sep 16, 2020 at 2:42 AM Gerd Hoffmann  wrote:
>
> Handle the spice special case in audio_init instead.
>
> With the qemu_spice_audio_init() symbol dependency being
> gone we can build spiceaudio as module.
>
> Signed-off-by: Gerd Hoffmann 
> ---
>  include/ui/qemu-spice.h |  1 -
>  audio/audio.c   | 16 
>  audio/spiceaudio.c  |  5 -
>  ui/spice-core.c |  1 -
>  4 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/include/ui/qemu-spice.h b/include/ui/qemu-spice.h
> index 8c23dfe71797..12474d88f40e 100644
> --- a/include/ui/qemu-spice.h
> +++ b/include/ui/qemu-spice.h
> @@ -29,7 +29,6 @@ extern int using_spice;
>
>  void qemu_spice_init(void);
>  void qemu_spice_input_init(void);
> -void qemu_spice_audio_init(void);
>  void qemu_spice_display_init(void);
>  int qemu_spice_display_add_client(int csock, int skipauth, int tls);
>  int qemu_spice_add_interface(SpiceBaseInstance *sin);
> diff --git a/audio/audio.c b/audio/audio.c
> index ce8c6dec5f47..76cdba0943d1 100644
> --- a/audio/audio.c
> +++ b/audio/audio.c
> @@ -34,6 +34,7 @@
>  #include "qemu/module.h"
>  #include "sysemu/replay.h"
>  #include "sysemu/runstate.h"
> +#include "ui/qemu-spice.h"
>  #include "trace.h"
>
>  #define AUDIO_CAP "audio"
> @@ -1658,6 +1659,21 @@ static AudioState *audio_init(Audiodev *dev, const 
> char *name)
>  /* silence gcc warning about uninitialized variable */
>  AudiodevListHead head = QSIMPLEQ_HEAD_INITIALIZER(head);
>
> +if (using_spice) {
> +/*
> + * When using spice allow the spice audio driver being picked
> + * as default.
> + *
> + * Temporary hack.  Using audio devices without explicit
> + * audiodev= property is already deprecated.  Same goes for
> + * the -soundhw switch.  Once this support gets finally
> + * removed we can also drop the concept of a default audio
> + * backend and this can go away.
> + */
> +driver = audio_driver_lookup("spice");
> +driver->can_be_default = 1;

fyi, one of my libvirt/QEMU guests now segfaults here.
See: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=977301

  -dann

> +}
> +
>  if (dev) {
>  /* -audiodev option */
>  legacy_config = false;
> diff --git a/audio/spiceaudio.c b/audio/spiceaudio.c
> index b6b5da4812f2..aae420cff997 100644
> --- a/audio/spiceaudio.c
> +++ b/audio/spiceaudio.c
> @@ -310,11 +310,6 @@ static struct audio_driver spice_audio_driver = {
>  .voice_size_in  = sizeof (SpiceVoiceIn),
>  };
>
> -void qemu_spice_audio_init (void)
> -{
> -spice_audio_driver.can_be_default = 1;
> -}
> -
>  static void register_audio_spice(void)
>  {
>  audio_driver_register(_audio_driver);
> diff --git a/ui/spice-core.c b/ui/spice-core.c
> index ecc2ec2c55c2..10aa309f78f7 100644
> --- a/ui/spice-core.c
> +++ b/ui/spice-core.c
> @@ -804,7 +804,6 @@ void qemu_spice_init(void)
>  qemu_spice_add_interface(_migrate.base);
>
>  qemu_spice_input_init();
> -qemu_spice_audio_init();
>
>  qemu_add_vm_change_state_handler(vm_change_state_handler, NULL);
>  qemu_spice_display_stop();
> --
> 2.27.0
>
>



[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-08-14 Thread dann frazier
** Changed in: kunpeng920/ubuntu-18.04-hwe
   Status: Triaged => Fix Committed

** Changed in: kunpeng920/ubuntu-18.04
   Status: Triaged => Fix Committed

** Changed in: kunpeng920
   Status: Triaged => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Fix Committed
Status in kunpeng920 ubuntu-18.04 series:
  Fix Committed
Status in kunpeng920 ubuntu-18.04-hwe series:
  Fix Committed
Status in kunpeng920 ubuntu-19.10 series:
  Fix Released
Status in kunpeng920 ubuntu-20.04 series:
  Fix Released
Status in kunpeng920 upstream-kernel series:
  Invalid
Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Bionic:
  Fix Committed
Status in qemu source package in Eoan:
  Fix Released
Status in qemu source package in Focal:
  Fix Released

Bug description:
  
  SRU TEAM REVIEWER: This has already been SRUed for Focal, Eoan and Bionic. 
Unfortunately the Bionic SRU did not work and we had to reverse the change. 
Since then we had another update and now I'm retrying the SRU.

  After discussing with @paelzer (and @dannf as a reviewer) extensively,
  Christian and I agreed that we should scope this SRU as Aarch64 only
  AND I was much, much more conservative in question of what is being
  changed in the AIO qemu code.

  New code has been tested against the initial Test Case and the new
  one, regressed for Bionic. More information (about tests and
  discussion) can be found in the MR at
  ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-bionic-refix

  BIONIC REGRESSION BUG:

  https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419

  [Impact]

  * QEMU locking primitives might face a race condition in QEMU Async
  I/O bottom halves scheduling. This leads to a dead lock making either
  QEMU or one of its tools to hang indefinitely.

  [Test Case]

  INITIAL

  * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs in Aarch64.

  [Regression Potential]

  * This is a change to a core part of QEMU: The AIO scheduling. It
  works like a "kernel" scheduler, whereas kernel schedules OS tasks,
  the QEMU AIO code is responsible to schedule QEMU coroutines or event
  listeners callbacks.

  * There was a long discussion upstream about primitives and Aarch64.
  After quite sometime Paolo released this patch and it solves the
  issue. Tested platforms were: amd64 and aarch64 based on his commit
  log.

  * Christian suggests that this fix stay little longer in -proposed to
  make sure it won't cause any regressions.

  * dannf suggests we also check for performance regressions; e.g. how
  long it takes to convert a cloud image on high-core systems.

  BIONIC REGRESSED ISSUE

  https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419

  [Other Info]

   * Original Description bellow:

  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 

[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-08-07 Thread dann frazier
Verified w/ over 500 successful iterations on a m6g.metal instance, and
over 300 in an armhf chroot on the same.

** Tags removed: verification-needed verification-needed-bionic
** Tags added: verification-done verification-done-bionic

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Triaged
Status in kunpeng920 ubuntu-18.04 series:
  Triaged
Status in kunpeng920 ubuntu-18.04-hwe series:
  Triaged
Status in kunpeng920 ubuntu-19.10 series:
  Fix Released
Status in kunpeng920 ubuntu-20.04 series:
  Fix Released
Status in kunpeng920 upstream-kernel series:
  Invalid
Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Bionic:
  Fix Committed
Status in qemu source package in Eoan:
  Fix Released
Status in qemu source package in Focal:
  Fix Released

Bug description:
  
  SRU TEAM REVIEWER: This has already been SRUed for Focal, Eoan and Bionic. 
Unfortunately the Bionic SRU did not work and we had to reverse the change. 
Since then we had another update and now I'm retrying the SRU.

  After discussing with @paelzer (and @dannf as a reviewer) extensively,
  Christian and I agreed that we should scope this SRU as Aarch64 only
  AND I was much, much more conservative in question of what is being
  changed in the AIO qemu code.

  New code has been tested against the initial Test Case and the new
  one, regressed for Bionic. More information (about tests and
  discussion) can be found in the MR at
  ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-bionic-refix

  BIONIC REGRESSION BUG:

  https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419

  [Impact]

  * QEMU locking primitives might face a race condition in QEMU Async
  I/O bottom halves scheduling. This leads to a dead lock making either
  QEMU or one of its tools to hang indefinitely.

  [Test Case]

  INITIAL

  * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs in Aarch64.

  [Regression Potential]

  * This is a change to a core part of QEMU: The AIO scheduling. It
  works like a "kernel" scheduler, whereas kernel schedules OS tasks,
  the QEMU AIO code is responsible to schedule QEMU coroutines or event
  listeners callbacks.

  * There was a long discussion upstream about primitives and Aarch64.
  After quite sometime Paolo released this patch and it solves the
  issue. Tested platforms were: amd64 and aarch64 based on his commit
  log.

  * Christian suggests that this fix stay little longer in -proposed to
  make sure it won't cause any regressions.

  * dannf suggests we also check for performance regressions; e.g. how
  long it takes to convert a cloud image on high-core systems.

  BIONIC REGRESSED ISSUE

  https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419

  [Other Info]

   * Original Description bellow:

  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) 

[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-07-15 Thread dann frazier
I ran the new PPA build (1:2.11+dfsg-1ubuntu7.29~ppa01) on both a
ThunderX2 system and a Hi1620 system overnight, and both survived (6574
& 12919 iterations, respectively).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Triaged
Status in kunpeng920 ubuntu-18.04 series:
  Triaged
Status in kunpeng920 ubuntu-18.04-hwe series:
  Triaged
Status in kunpeng920 ubuntu-19.10 series:
  Fix Released
Status in kunpeng920 ubuntu-20.04 series:
  Fix Released
Status in kunpeng920 upstream-kernel series:
  Invalid
Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Bionic:
  In Progress
Status in qemu source package in Eoan:
  Fix Released
Status in qemu source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * QEMU locking primitives might face a race condition in QEMU Async
  I/O bottom halves scheduling. This leads to a dead lock making either
  QEMU or one of its tools to hang indefinitely.

  [Test Case]

  * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs in Aarch64.

  [Regression Potential]

  * This is a change to a core part of QEMU: The AIO scheduling. It
  works like a "kernel" scheduler, whereas kernel schedules OS tasks,
  the QEMU AIO code is responsible to schedule QEMU coroutines or event
  listeners callbacks.

  * There was a long discussion upstream about primitives and Aarch64.
  After quite sometime Paolo released this patch and it solves the
  issue. Tested platforms were: amd64 and aarch64 based on his commit
  log.

  * Christian suggests that this fix stay little longer in -proposed to
  make sure it won't cause any regressions.

  * dannf suggests we also check for performance regressions; e.g. how
  long it takes to convert a cloud image on high-core systems.

  [Other Info]

   * Original Description bellow:

  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  

[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-05-07 Thread dann frazier
Ike's backport in
https://launchpad.net/~ikepanhc/+archive/ubuntu/lp1805256 tests well for
me on Cavium Sabre. One minor note is that the function
in_aio_context_home_thread() is being called in aio-win32.c, but that
function didn't exist in 2.11. We probably want to change that to
aio_context_in_iothread(). It was renamed in
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=d2b63ba8dd20c1091b3f1033e6a95ef95b18149d

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Triaged
Status in kunpeng920 ubuntu-18.04 series:
  Triaged
Status in kunpeng920 ubuntu-18.04-hwe series:
  Triaged
Status in kunpeng920 ubuntu-19.10 series:
  Triaged
Status in kunpeng920 ubuntu-20.04 series:
  Triaged
Status in kunpeng920 upstream-kernel series:
  Fix Committed
Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  In Progress
Status in qemu source package in Bionic:
  In Progress
Status in qemu source package in Disco:
  In Progress
Status in qemu source package in Eoan:
  In Progress
Status in qemu source package in Focal:
  In Progress

Bug description:
  [Impact]

  * QEMU locking primitives might face a race condition in QEMU Async
  I/O bottom halves scheduling. This leads to a dead lock making either
  QEMU or one of its tools to hang indefinitely.

  [Test Case]

  * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs in Aarch64.

  [Regression Potential]

  * This is a change to a core part of QEMU: The AIO scheduling. It
  works like a "kernel" scheduler, whereas kernel schedules OS tasks,
  the QEMU AIO code is responsible to schedule QEMU coroutines or event
  listeners callbacks.

  * There was a long discussion upstream about primitives and Aarch64.
  After quite sometime Paolo released this patch and it solves the
  issue. Tested platforms were: amd64 and aarch64 based on his commit
  log.

  * Christian suggests that this fix stay little longer in -proposed to
  make sure it won't cause any regressions.

  * dannf suggests we also check for performance regressions; e.g. how
  long it takes to convert a cloud image on high-core systems.

  [Other Info]

   * Original Description bellow:

  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does 

Re: [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-05-06 Thread dann frazier
On Wed, May 6, 2020 at 1:20 PM Philippe Mathieu-Daudé
<1805...@bugs.launchpad.net> wrote:
>
> Isn't this fixed by commit 5710a3e09f9?

See comment #43. The discussions hence are about testing/integration
of that fix.

  -dann

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Triaged
Status in kunpeng920 ubuntu-18.04 series:
  Triaged
Status in kunpeng920 ubuntu-18.04-hwe series:
  Triaged
Status in kunpeng920 ubuntu-19.10 series:
  Triaged
Status in kunpeng920 ubuntu-20.04 series:
  Triaged
Status in kunpeng920 upstream-kernel series:
  Fix Committed
Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  In Progress
Status in qemu source package in Bionic:
  In Progress
Status in qemu source package in Disco:
  In Progress
Status in qemu source package in Eoan:
  In Progress
Status in qemu source package in Focal:
  In Progress

Bug description:
  [Impact]

  * QEMU locking primitives might face a race condition in QEMU Async
  I/O bottom halves scheduling. This leads to a dead lock making either
  QEMU or one of its tools to hang indefinitely.

  [Test Case]

  * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs in Aarch64.

  [Regression Potential]

  * This is a change to a core part of QEMU: The AIO scheduling. It
  works like a "kernel" scheduler, whereas kernel schedules OS tasks,
  the QEMU AIO code is responsible to schedule QEMU coroutines or event
  listeners callbacks.

  * There was a long discussion upstream about primitives and Aarch64.
  After quite sometime Paolo released this patch and it solves the
  issue. Tested platforms were: amd64 and aarch64 based on his commit
  log.

  * Christian suggests that this fix stay little longer in -proposed to
  make sure it won't cause any regressions.

  * dannf suggests we also check for performance regressions; e.g. how
  long it takes to convert a cloud image on high-core systems.

  [Other Info]

   * Original Description bellow:

  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  

[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-05-06 Thread dann frazier
** Description changed:

  [Impact]
  
  * QEMU locking primitives might face a race condition in QEMU Async I/O
  bottom halves scheduling. This leads to a dead lock making either QEMU
  or one of its tools to hang indefinitely.
  
  [Test Case]
  
  * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
  
  Hangs indefinitely approximately 30% of the runs in Aarch64.
  
  [Regression Potential]
  
  * This is a change to a core part of QEMU: The AIO scheduling. It works
  like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU
  AIO code is responsible to schedule QEMU coroutines or event listeners
  callbacks.
  
  * There was a long discussion upstream about primitives and Aarch64.
  After quite sometime Paolo released this patch and it solves the issue.
  Tested platforms were: amd64 and aarch64 based on his commit log.
  
  * Christian suggests that this fix stay little longer in -proposed to
  make sure it won't cause any regressions.
  
+ * dannf suggests we also check for performance regressions; e.g. how
+ long it takes to convert a cloud image on high-core systems.
+ 
  [Other Info]
  
-  * Original Description bellow:
- 
+  * Original Description bellow:
  
  Command:
  
  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
  
  Hangs indefinitely approximately 30% of the runs.
  
  
  
  Workaround:
  
  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
  
  Run "qemu-img convert" with "a single coroutine" to avoid this issue.
  
  
  
  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...
  
  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()
  
  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start
  
  
  
  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2
  
  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]
  
  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]
  
  
  """
  
  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).
  
  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).
  
  
  
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:
  
  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
  
  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.
  
  Once hung, attaching gdb gives the following backtrace:
  
  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=,
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=,
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975
  
  Reproduced w/ latest QEMU git (@ 53744e0a182)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  

[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-05-05 Thread dann frazier
fyi, I backported that fix also to focal/groovy and eoan, and with those
builds. On my test systems the hang reliable occurs within 20
iterations. After the fix, they have survived > 500 iterations thus far.
I'll leave running overnight just to be sure.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Triaged
Status in kunpeng920 ubuntu-18.04 series:
  New
Status in kunpeng920 ubuntu-18.04-hwe series:
  New
Status in kunpeng920 ubuntu-19.10 series:
  New
Status in kunpeng920 ubuntu-20.04 series:
  New
Status in kunpeng920 upstream-kernel series:
  Fix Committed
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  Incomplete
Status in qemu source package in Bionic:
  Incomplete
Status in qemu source package in Disco:
  Incomplete
Status in qemu source package in Eoan:
  Incomplete
Status in qemu source package in Focal:
  Incomplete

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  [ Original Description ]

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=,
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=,
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions



[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2019-12-18 Thread dann frazier
fyi, what I tested in Comment #35 was upstream QEMU (@ aceeaa69d2) with
a port of the patch in Comment #34 applied. I've attached that patch
here. While it did avoid the issue in my testing, I agree with Rafael's
Comment #36 that it does not appear to address the root cause (as I
understand it), and is therefore unlikely something we'd ship in Ubuntu.

** Patch added: "comment-34-ported-to-upstream.patch"
   
https://bugs.launchpad.net/qemu/+bug/1805256/+attachment/5313631/+files/comment-34-ported-to-upstream.patch

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Confirmed
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Bionic:
  Confirmed
Status in qemu source package in Disco:
  Confirmed
Status in qemu source package in Eoan:
  In Progress
Status in qemu source package in Focal:
  Confirmed

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  [ Original Description ]

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=,
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=,
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions



[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2019-12-17 Thread dann frazier
I tested the patch in Comment #34, and it was able to pass 500
iterations.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Confirmed
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Bionic:
  Confirmed
Status in qemu source package in Disco:
  Confirmed
Status in qemu source package in Eoan:
  In Progress
Status in qemu source package in Focal:
  Confirmed

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  [ Original Description ]

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=,
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=,
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions



[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2019-12-13 Thread dann frazier
** Changed in: kunpeng920
   Status: New => Confirmed

** Changed in: qemu (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: qemu (Ubuntu Disco)
   Status: New => Confirmed

** Changed in: qemu (Ubuntu Focal)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Confirmed
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Bionic:
  Confirmed
Status in qemu source package in Disco:
  Confirmed
Status in qemu source package in Eoan:
  In Progress
Status in qemu source package in Focal:
  Confirmed

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  [ Original Description ]

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=,
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=,
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions



Re: [PATCH-for-5.0] roms/edk2-funcs.sh: Use available GCC for ARM/Aarch64 targets

2019-12-08 Thread dann frazier
On Fri, Dec 06, 2019 at 06:07:58AM +0100, Philippe Mathieu-Daudé wrote:
> On 12/5/19 8:35 PM, Laszlo Ersek wrote:
> > On 12/05/19 17:50, Ard Biesheuvel wrote:
> > > On Thu, 5 Dec 2019 at 16:27, Philippe Mathieu-Daudé  
> > > wrote:
> > > > 
> > > > On 12/5/19 5:13 PM, Laszlo Ersek wrote:
> > > > > Hi Phil,
> > > > > 
> > > > > (+Ard)
> > > > > 
> > > > > On 12/04/19 23:12, Philippe Mathieu-Daudé wrote:
> > > > > > Centos 7.7 only provides cross GCC 4.8.5, but the script forces
> > > > > > us to use GCC5. Since the same machinery is valid to check the
> > > > > > GCC version, remove the $emulation_target check.
> > > > > > 
> > > > > > $ cat /etc/redhat-release
> > > > > > CentOS Linux release 7.7.1908 (Core)
> > > > > > 
> > > > > > $ aarch64-linux-gnu-gcc -v 2>&1 | tail -1
> > > > > > gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)
> > > > > 
> > > > > this patch is not correct, in my opinion. ARM / AARCH64 support in 
> > > > > edk2
> > > > > requires GCC5 as a minimum. It was never tested with an earlier
> > > > > toolchain, to my understanding. Not on my part, anyway.
> > > > > 
> > > > > To be more precise: when I tested cross-gcc toolchains earlier than
> > > > > that, the ArmVirtQemu builds always failed. Minimally, those 
> > > > > toolchains
> > > > > didn't recognize some of the AARCH64 system registers.
> > > > > 
> > > > > If CentOS 7.7 does not provide a suitable (>=GCC5) toolchain, then we
> > > > > can't build ArmVirtQemu binaries on CentOS 7.7, in my opinion.
> > > > > 
> > > > > Personally, on my RHEL7 laptop, over time I've used the following
> > > > > toolchains, to satisfy the GCC5 requirement of ArmVirtQemu (which
> > > > > requirement I took as experimental evidence):
> > > > > 
> > > > > - Initially (last quarter of 2014), I used binary distributions --
> > > > > tarballs -- of cross-binutils and cross-gcc, from Linaro.
> > > > > 
> > > > > - Later (last quarter of 2016), I rebuilt some SRPMs that were at the
> > > > > time Fedora-only for RHEL7. Namely:
> > > > > 
> > > > > - cross-binutils-2.27-3.fc24
> > > > >   https://koji.fedoraproject.org/koji/buildinfo?buildID=801348
> > > > > 
> > > > > - gcc-6.1.1-2.fc24
> > > > >   https://koji.fedoraproject.org/koji/buildinfo?buildID=761767
> > > > > 
> > > > > - Most recently, I've been using cross-binutils updated from EPEL7:
> > > > > 
> > > > > - cross-binutils-2.27-9.el7.1
> > > > >   https://koji.fedoraproject.org/koji/buildinfo?buildID=918474
> > > > > 
> > > > > To my knowledge, there is still no suitable cross-compiler available 
> > > > > on
> > > > > RHEL7, from any trustworthy RPM repository. So, to this day, I use
> > > > > gcc-6.1.1-2 for cross-building ArmVirtQemu, on my RHEL7 laptop.
> > > > > 
> > > > > Again: I believe it does not matter if the gcc-4.8.5-based
> > > > > cross-compiler in CentOS 7 "happens" to work. That's a compiler that I
> > > > > have never tested with, or vetted for, upstream ArmVirtQemu.
> > > > > 
> > > > > Now, I realize that in edk2, we have stuff like
> > > > > 
> > > > > GCC48_AARCH64_CC_FLAGS
> > > > > 
> > > > > in "BaseTools/Conf/tools_def.template" -- coming from commit
> > > > > 7a9dbf2c94d1 ("BaseTools/Conf/tools_def.template: drop ARM/AARCH 
> > > > > support
> > > > > from GCC46/GCC47", 2019-01-08). That doesn't change the fact that I've
> > > > > never built or tested ArmVirtQemu with such a compiler. And so this
> > > > > patch makes me quite uncomfortable.
> > > > > 
> > > > > If that rules out CentOS 7 as a QEMU project build / CI platform for 
> > > > > the
> > > > > bundled ArmVirtQemu binaries, then we need a more recent platform
> > > > > (perhaps CentOS 8, not sure).
> > > > 
> > > > Unfortunately CentOS 8 is not available as a Docker image, which is a
> > > > convenient way to build EDK2 in a CI.
> > > > 
> > > > > I think it's also educational to check the origin of the code that 
> > > > > your
> > > > > patch proposes to remove. Most recently it was moved around from a
> > > > > different place, in QEMU commit 65a109ab4b1a ('roms: lift
> > > > > "edk2-funcs.sh" from "tests/uefi-test-tools/build.sh"', 2019-04-17).
> > > > > 
> > > > > In that commit, for some reason I didn't keep the original code 
> > > > > comments
> > > > > (perhaps it would have been too difficult or messy to preserve the
> > > > > comments sanely with the restructured / factored-out code). But, they
> > > > > went like this (originally from commit 77db55fc8155,
> > > > > "tests/uefi-test-tools: add build scripts", 2019-02-21):
> > > > > 
> > > > > # Expose cross_prefix (which is possibly empty) to the edk2 tools. 
> > > > > While at it,
> > > > > # determine the suitable edk2 toolchain as well.
> > > > > # - For ARM and AARCH64, edk2 only offers the GCC5 toolchain tag, 
> > > > > which covers
> > > > > #   the gcc-5+ releases.
> > > > > # - For IA32 and X64, edk2 offers the GCC44 through GCC49 toolchain 
> > > > > tags, in
> > > > > #   addition 

Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-11 Thread dann frazier
On Fri, Oct 11, 2019 at 06:05:25AM +, Jan Glauber wrote:
> On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> > On 09/10/19 10:02, Jan Glauber wrote:
> 
> > > I'm still not sure what the actual issue is here, but could it be some bad
> > > interaction between the notify_me and the list_lock? The are both 4 byte
> > > and side-by-side:
> > > 
> > > address notify_me: 0xdb528aa0  sizeof notify_me: 4
> > > address list_lock: 0xdb528aa4  sizeof list_lock: 4
> > > 
> > > AFAICS the generated code looks OK (all load/store exclusive done
> > > with 32 bit size):
> > > 
> > >  e6c:   885ffc01ldaxr   w1, [x0]
> > >  e70:   11000821add w1, w1, #0x2
> > >  e74:   8802fc01stlxr   w2, w1, [x0]
> > > 
> > > ...but if I bump notify_me size to uint64_t the issue goes away.
> > 
> > Ouch. :)  Is this with or without my patch(es)?
> > 
> > Also, what if you just add a dummy uint32_t after notify_me?
> 
> With the dummy the testcase also runs fine for 500 iterations.
> 
> Dann, can you try if this works on the Hi1620 too?

On Hi1620, it hung on the first iteration. Here's the complete patch
I'm running with:

diff --git a/include/block/aio.h b/include/block/aio.h
index 6b0d52f732..e6fd6f1a1a 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -82,7 +82,7 @@ struct AioContext {
  * Instead, the aio_poll calls include both the prepare and the
  * dispatch phase, hence a simple counter is enough for them.
  */
-uint32_t notify_me;
+uint64_t notify_me;
 
 /* A lock to protect between QEMUBH and AioHandler adders and deleter,
  * and to ensure that no callbacks are removed while we're walking and
diff --git a/util/async.c b/util/async.c
index ca83e32c7f..024c4c567d 100644
--- a/util/async.c
+++ b/util/async.c
@@ -242,7 +242,7 @@ aio_ctx_check(GSource *source)
 aio_notify_accept(ctx);
 
 for (bh = ctx->first_bh; bh; bh = bh->next) {
-if (bh->scheduled) {
+if (atomic_mb_read(>scheduled)) {
 return true;
 }
 }
@@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
 
 void aio_notify(AioContext *ctx)
 {
-/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
- * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
+/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
+ * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
+ * atomic_add in aio_poll.
  */
-smp_mb();
-if (ctx->notify_me) {
-event_notifier_set(>notifier);
+if (atomic_mb_read(>notify_me)) {
+   event_notifier_set(>notifier);
 atomic_mb_set(>notified, true);
 }
 }



[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-11 Thread dann frazier
On Fri, Oct 11, 2019 at 08:30:02AM +, Jan Glauber wrote:
> On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote:
> > On 11/10/19 08:05, Jan Glauber wrote:
> > > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> > >>> ...but if I bump notify_me size to uint64_t the issue goes away.
> > >>
> > >> Ouch. :)  Is this with or without my patch(es)?
> > 
> > You didn't answer this question.
> 
> Oh, sorry... I did but the mail probably didn't make it out.
> I have both of your changes applied (as I think they make sense).
> 
> > >> Also, what if you just add a dummy uint32_t after notify_me?
> > > 
> > > With the dummy the testcase also runs fine for 500 iterations.
> > 
> > You might be lucky and causing list_lock to be in another cache line.
> > What if you add __attribute__((aligned(16)) to notify_me (and keep the
> > dummy)?
> 
> Good point. I'll try to force both into the same cacheline.

On the Hi1620, this still hangs in the first iteration:

diff --git a/include/block/aio.h b/include/block/aio.h
index 6b0d52f732..00e56a5412 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -82,7 +82,7 @@ struct AioContext {
  * Instead, the aio_poll calls include both the prepare and the
  * dispatch phase, hence a simple counter is enough for them.
  */
-uint32_t notify_me;
+__attribute__((aligned(16))) uint64_t notify_me;
 
 /* A lock to protect between QEMUBH and AioHandler adders and deleter,
  * and to ensure that no callbacks are removed while we're walking and
diff --git a/util/async.c b/util/async.c
index ca83e32c7f..024c4c567d 100644
--- a/util/async.c
+++ b/util/async.c
@@ -242,7 +242,7 @@ aio_ctx_check(GSource *source)
 aio_notify_accept(ctx);
 
 for (bh = ctx->first_bh; bh; bh = bh->next) {
-if (bh->scheduled) {
+if (atomic_mb_read(>scheduled)) {
 return true;
 }
 }
@@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
 
 void aio_notify(AioContext *ctx)
 {
-/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
- * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
+/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
+ * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
+ * atomic_add in aio_poll.
  */
-smp_mb();
-if (ctx->notify_me) {
-event_notifier_set(>notifier);
+if (atomic_mb_read(>notify_me)) {
+   event_notifier_set(>notifier);
 atomic_mb_set(>notified, true);
 }
 }

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  New
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  In Progress
Status in qemu source package in Bionic:
  New
Status in qemu source package in Disco:
  New
Status in qemu source package in Eoan:
  In Progress
Status in qemu source package in FF-Series:
  New

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 

Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-07 Thread dann frazier
On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
> On 02/10/19 11:23, Jan Glauber wrote:
> > I've looked into this on ThunderX2. The arm64 code generated for the
> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> > memory barriers. It is just plain ldaxr/stlxr.
> > 
> > From my understanding this is not sufficient for SMP sync.
> > 
> > If I read this comment correct:
> > 
> > void aio_notify(AioContext *ctx)
> > {
> > /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> >  * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> >  */
> > smp_mb();
> > if (ctx->notify_me) {
> > 
> > it points out that the smp_mb() should be paired. But as
> > I said the used atomics don't generate any barriers at all.
> 
> Based on the rest of the thread, this patch should also fix the bug:
> 
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
>  aio_notify_accept(ctx);
>  
>  for (bh = ctx->first_bh; bh; bh = bh->next) {
> -if (bh->scheduled) {
> +if (atomic_mb_read(>scheduled)) {
>  return true;
>  }
>  }
> 
> 
> And also the memory barrier in aio_notify can actually be replaced
> with a SEQ_CST load:
> 
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>  
>  void aio_notify(AioContext *ctx)
>  {
> -/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
> + * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
> + * atomic_add in aio_poll.
>   */
> -smp_mb();
> -if (ctx->notify_me) {
> +if (atomic_mb_read(>notify_me)) {
>  event_notifier_set(>notifier);
>  atomic_mb_set(>notified, true);
>  }
> 
> 
> Would you be able to test these (one by one possibly)?

Paolo,
  I tried them both separately and together on a Hi1620 system, each
time it hung in the first iteration. Here's a backtrace of a run with
both patches applied:

(gdb) thread apply all bt

Thread 3 (Thread 0x8154b820 (LWP 63900)):
#0  0x8b9402cc in __GI___sigtimedwait (set=, 
set@entry=0xf1e08070, 
info=info@entry=0x8154ad98, timeout=timeout@entry=0x0) at 
../sysdeps/unix/sysv/linux/sigtimedwait.c:42
#1  0x8ba77fac in __sigwait (set=set@entry=0xf1e08070, 
sig=sig@entry=0x8154ae74)
at ../sysdeps/unix/sysv/linux/sigwait.c:28
#2  0xb7dc1610 in sigwait_compat (opaque=0xf1e08070) at 
util/compatfd.c:35
#3  0xb7dc3e80 in qemu_thread_start (args=) at 
util/qemu-thread-posix.c:519
#4  0x8ba6d088 in start_thread (arg=0xceefbf4f) at 
pthread_create.c:463
#5  0x8b9dd4ec in thread_start () at 
../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0x81d4c820 (LWP 63899)):
#0  syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1  0xb7dc4cd8 in qemu_futex_wait (val=, f=)
at /home/ubuntu/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0xb7e48708 ) at 
util/qemu-thread-posix.c:459
#3  0xb7ddf44c in call_rcu_thread (opaque=) at 
util/rcu.c:260
#4  0xb7dc3e80 in qemu_thread_start (args=) at 
util/qemu-thread-posix.c:519
#5  0x8ba6d088 in start_thread (arg=0xceefc05f) at 
pthread_create.c:463
#6  0x8b9dd4ec in thread_start () at 
../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0x81e83010 (LWP 63898)):
#0  0x8b9d4154 in __GI_ppoll (fds=0xf1e0dbc0, nfds=187650205809964, 
timeout=, 
timeout@entry=0x0, sigmask=0xceefbef0) at 
../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0xb7dbedb0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=)
at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=, 
timeout=timeout@entry=-1) at util/qemu-timer.c:340
#3  0xb7dbfd2c in os_host_main_loop_wait (timeout=-1) at 
util/main-loop.c:236
#4  main_loop_wait (nonblocking=) at util/main-loop.c:517
#5  0xb7ce86e8 in convert_do_copy (s=0xceefc068) at qemu-img.c:2028
#6  img_convert (argc=, argv=) at qemu-img.c:2520
#7  0xb7ce1e54 in main (argc=8, argv=) at qemu-img.c:5097

> > I've tried to verify me theory with this patch and didn't run into the
> > issue for ~500 iterations (usually I would trigger the issue ~20 
> > iterations).
> 
> Sorry for asking the obvious---500 iterations of what?

$ for i in $(seq 1 500); do echo "==$i=="; ./qemu/qemu-img convert -p -f qcow2 
-O qcow2 bionic-server-cloudimg-arm64.img out.img; done
==1==
(37.19/100%)

  -dann



[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2019-10-03 Thread dann frazier
** Also affects: kunpeng920
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  New
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  In Progress
Status in qemu source package in Bionic:
  New
Status in qemu source package in Disco:
  New
Status in qemu source package in Eoan:
  In Progress
Status in qemu source package in FF-Series:
  New

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  [ Original Description ]

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=,
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=,
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions



Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-09-24 Thread dann frazier
On Wed, Sep 11, 2019 at 04:09:25PM -0300, Rafael David Tinoco wrote:
> > Zhengui's theory that notify_me doesn't work properly on ARM is more
> > promising, but he couldn't provide a clear explanation of why he thought
> > notify_me is involved.  In particular, I would have expected notify_me to
> > be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example:
> > 
> > 
> > glib_pollfds_fill
> >   g_main_context_prepare
> > aio_ctx_prepare
> >   atomic_or(>notify_me, 1)
> > qemu_poll_ns
> > glib_pollfds_poll
> >   g_main_context_check
> > aio_ctx_check
> >   atomic_and(>notify_me, ~1)
> >   g_main_context_dispatch
> > aio_ctx_dispatch
> >   /* do something for event */
> > qemu_poll_ns 
> > 
> 
> Paolo,
> 
> I tried confining execution in a single NUMA domain (cpu & mem) and
> still faced the issue, then, I added a mutex "ctx->notify_me_lcktest"
> into context to protect "ctx->notify_me", like showed bellow, and it
> seems to have either fixed or mitigated it.
> 
> I was able to cause the hung once every 3 or 4 runs. I have already ran
> qemu-img convert more than 30 times now and couldn't reproduce it again.
> 
> Next step is to play with the barriers and check why existing ones
> aren't enough for ordering access to ctx->notify_me ... or should I
> try/do something else in your opinion ?
> 
> This arch/machine (Huawei D06):
> 
> $ lscpu
> Architecture:aarch64
> Byte Order:  Little Endian
> CPU(s):  96
> On-line CPU(s) list: 0-95
> Thread(s) per core:  1
> Core(s) per socket:  48
> Socket(s):   2
> NUMA node(s):4
> Vendor ID:   0x48
> Model:   0
> Stepping:0x0
> CPU max MHz: 2000.
> CPU min MHz: 200.
> BogoMIPS:200.00
> L1d cache:   64K
> L1i cache:   64K
> L2 cache:512K
> L3 cache:32768K
> NUMA node0 CPU(s):   0-23
> NUMA node1 CPU(s):   24-47
> NUMA node2 CPU(s):   48-71
> NUMA node3 CPU(s):   72-95
> Flags:   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
> cpuid asimdrdm dcpop

Note that I'm also seeing this on a ThunderX2 (same calltrace):

$ lscpu
Architecture:aarch64
Byte Order:  Little Endian
CPU(s):  224
On-line CPU(s) list: 0-223
Thread(s) per core:  4
Core(s) per socket:  28
Socket(s):   2
NUMA node(s):2
Vendor ID:   Cavium
Model:   1
Model name:  ThunderX2 99xx
Stepping:0x1
BogoMIPS:400.00
L1d cache:   32K
L1i cache:   32K
L2 cache:256K
L3 cache:32768K
NUMA node0 CPU(s):   0-111
NUMA node1 CPU(s):   112-223
Flags:   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid 
asimdrdm

  -dann

> 
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 0ca25dfec6..0724086d91 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -84,6 +84,7 @@ struct AioContext {
>   * dispatch phase, hence a simple counter is enough for them.
>   */
>  uint32_t notify_me;
> +QemuMutex notify_me_lcktest;
> 
>  /* A lock to protect between QEMUBH and AioHandler adders and deleter,
>   * and to ensure that no callbacks are removed while we're walking and
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index 51c41ed3c9..031d6e2997 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -529,7 +529,9 @@ static bool run_poll_handlers(AioContext *ctx,
> int64_t max_ns, int64_t *timeout)
>  bool progress;
>  int64_t start_time, elapsed_time;
> 
> +qemu_mutex_lock(>notify_me_lcktest);
>  assert(ctx->notify_me);
> +qemu_mutex_unlock(>notify_me_lcktest);
>  assert(qemu_lockcnt_count(>list_lock) > 0);
> 
>  trace_run_poll_handlers_begin(ctx, max_ns, *timeout);
> @@ -601,8 +603,10 @@ bool aio_poll(AioContext *ctx, bool blocking)
>   * so disable the optimization now.
>   */
>  if (blocking) {
> +qemu_mutex_lock(>notify_me_lcktest);
>  assert(in_aio_context_home_thread(ctx));
>  atomic_add(>notify_me, 2);
> +qemu_mutex_unlock(>notify_me_lcktest);
>  }
> 
>  qemu_lockcnt_inc(>list_lock);
> @@ -647,8 +651,10 @@ bool aio_poll(AioContext *ctx, bool blocking)
>  }
> 
>  if (blocking) {
> +qemu_mutex_lock(>notify_me_lcktest);
>  atomic_sub(>notify_me, 2);
>  aio_notify_accept(ctx);
> +qemu_mutex_unlock(>notify_me_lcktest);
>  }
> 
>  /* Adjust polling time */
> diff --git a/util/async.c b/util/async.c
> index c10642a385..140e1e86f5 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -221,7 +221,9 @@ aio_ctx_prepare(GSource *source, gint*timeout)
>  {
>  AioContext *ctx = (AioContext *) source;
> 
> +qemu_mutex_lock(>notify_me_lcktest);
>  atomic_or(>notify_me, 1);
> +qemu_mutex_unlock(>notify_me_lcktest);
> 
>  

[Qemu-devel] [Bug 1824053] Re: Qemu-img convert appears to be stuck on aarch64 host with low probability

2019-06-06 Thread dann frazier
*** This bug is a duplicate of bug 1805256 ***
https://bugs.launchpad.net/bugs/1805256

** This bug has been marked a duplicate of bug 1805256
   qemu-img hangs on high core count ARM system

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824053

Title:
  Qemu-img convert appears to be stuck on aarch64 host with low
  probability

Status in QEMU:
  Confirmed

Bug description:
  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.

  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".

  The bt is below:

  Thread 2 (Thread 0x4b776e50 (LWP 27215)):
  #0  0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x4a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0xaae82610 in sigwait_compat (opaque=0xc5163b00) at 
util/compatfd.c:37
  #3  0xaae85038 in qemu_thread_start (args=args@entry=0xc5163b90) 
at util/qemu_thread_posix.c:496
  #4  0x4a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x4a492b2c in thread_start () from /lib64/libc.so.6

  Thread 1 (Thread 0x4b573370 (LWP 27214)):
  #0  0x4a489020 in ppoll () from /lib64/libc.so.6
  #1  0xaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
timeout=) at qemu_timer.c:391
  #3  0xaadae014 in os_host_main_loop_wait (timeout=) at 
main_loop.c:272
  #4  0xaadae190 in main_loop_wait (nonblocking=) at 
main_loop.c:534
  #5  0xaad97be0 in convert_do_copy (s=0xdc32eb48) at 
qemu-img.c:1923
  #6  0xaada2d70 in img_convert (argc=, argv=) at qemu-img.c:2414
  #7  0xaad99ac4 in main (argc=7, argv=) at 
qemu-img.c:5305

  The problem seems to be very similar to the phenomenon described by
  this patch (https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-
  ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),

  which force main loop wakeup with SIGIO.  But this patch was reverted
  by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-
  Revert-aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).

  I can reproduce this problem with qemu.git/matser. It still exists in 
qemu.git/matser. I found that when an IO return in
  worker threads and want to call aio_notify to wake up main_loop, but it found 
that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by calling 
atomic_and(>notify_me, ~1) . So worker thread won't write enventfd to 
notify main_loop. If such a scene happens, the main_loop will hang:
  main loopworker thread1 
worker thread2
  
--
   
   qemu_poll_ns aio_worker
  qemu_bh_schedule(pool->completion_bh) 
 
  glib_pollfds_poll
  g_main_context_check
  aio_ctx_check 
aio_worker  
  
  atomic_and(>notify_me, ~1) 
qemu_bh_schedule(pool->completion_bh)  

 
  /* do something for event */   
  qemu_poll_ns
  /* hangs !!!*/

  As we known ,ctx->notify_me will be visited by worker thread and main
  loop. I thank we should add a lock protection for ctx->notify_me to
  avoid this happend.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824053/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-06-05 Thread dann frazier
** Also affects: qemu (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: qemu (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-15 Thread dann frazier
No, sorry - this bugs still persists w/ latest upstream (@ afccfc0). I
found a report of similar symptoms:

  https://patchwork.kernel.org/patch/10047341/
  https://bugzilla.redhat.com/show_bug.cgi?id=1524770#c13

To be clear, ^ is already fixed upstream, so it is not the *same* issue
- but perhaps related.


** Bug watch added: Red Hat Bugzilla #1524770
   https://bugzilla.redhat.com/show_bug.cgi?id=1524770

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2019-04-15 Thread dann frazier
** Changed in: qemu
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  Confirmed

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system

2018-11-26 Thread dann frazier
ext4 filesystem, SATA drive:

(gdb) thread apply all bt

Thread 3 (Thread 0x9bffc9a0 (LWP 9015)):
#0  0xaaa462cc in __GI___sigtimedwait (set=, 
set@entry=0xe725c070, info=info@entry=0x9bffbf18, 
timeout=0x3ff1, timeout@entry=0x0)
at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42
#1  0xaab7dfac in __sigwait (set=set@entry=0xe725c070, 
sig=sig@entry=0x9bffbff4) at ../sysdeps/unix/sysv/linux/sigwait.c:28
#2  0xd998a628 in sigwait_compat (opaque=0xe725c070)
at util/compatfd.c:36
#3  0xd998bce0 in qemu_thread_start (args=)
at util/qemu-thread-posix.c:498
#4  0xaab73088 in start_thread (arg=0xc528531f)
at pthread_create.c:463
#5  0xaaae34ec in thread_start ()
at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0xa0e779a0 (LWP 9014)):
#0  syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1  0xd998c9e8 in qemu_futex_wait (val=, f=)
at /home/ubuntu/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0xd9a091c0 )
at util/qemu-thread-posix.c:442
#3  0xd99a6834 in call_rcu_thread (opaque=)
at util/rcu.c:261
#4  0xd998bce0 in qemu_thread_start (args=)
at util/qemu-thread-posix.c:498
#5  0xaab73088 in start_thread (arg=0xc528542f)
at pthread_create.c:463
#6  0xaaae34ec in thread_start ()
at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0xa0fa8010 (LWP 9013)):
#0  0xaaada154 in __GI_ppoll (fds=0xe7291dc0, nfds=187650771816320, 
timeout=, timeout@entry=0x0, sigmask=0xc52852e0)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0xd9987f00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
__fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=, 
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0xd9988f80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4  main_loop_wait (nonblocking=) at util/main-loop.c:497
#5  0xd98b7a30 in convert_do_copy (s=0xc52854e8) at qemu-img.c:1980
#6  img_convert (argc=, argv=) at qemu-img.c:2456
#7  0xd98b033c in main (argc=7, argv=) at qemu-img.c:4975

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  New

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1805256] [NEW] qemu-img hangs on high core count ARM system

2018-11-26 Thread dann frazier
Public bug reported:

On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
frequently hangs (~50% of the time) with this command:

qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
qcow2->qcow2 conversion happens to be something uvtool does every time
it fetches images.

Once hung, attaching gdb gives the following backtrace:

(gdb) bt
#0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, 
timeout=, timeout@entry=0x0, sigmask=0xc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
__fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=, 
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4  main_loop_wait (nonblocking=) at util/main-loop.c:497
#5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980
#6  img_convert (argc=, argv=) at qemu-img.c:2456
#7  0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975

Reproduced w/ latest QEMU git (@ 53744e0a182)

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on high core count ARM system

Status in QEMU:
  New

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760, 
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=, 
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions



[Qemu-devel] [Bug 1719196] Re: [arm64 ocata] newly created instances are unable to raise network interfaces

2017-11-01 Thread dann frazier
** Tags removed: verification-needed-zesty
** Tags added: verification-done-zesty

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1719196

Title:
  [arm64 ocata] newly created instances are unable to raise network
  interfaces

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive ocata series:
  Fix Committed
Status in libvirt:
  New
Status in QEMU:
  Fix Released
Status in libvirt package in Ubuntu:
  Invalid
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Zesty:
  Fix Committed

Bug description:
  [Impact]

   * A change in qemu 2.8 (83d768b virtio: set ISR on dataplane 
 notifications) broke virtio handling on platforms without a 
 controller. Those encounter flaky networking due to missed IRQs

   * Fix is a backport of the upstream fix b4b9862b: virtio: Fix no 
 interrupt when not creating msi controller

  [Test Case]

   * On Arm with Zesty (or Ocata) run a guest without PCI based devices

   * Example in e.g. c#23

   * Without the fix the networking does not work reliably (as it losses 
 IRQs), with the fix it works fine.

  [Regression Potential]

   * Changing the IRQ handling of virtio could affect virtio in general.
 But when reviwing the patch you'll see that it is small and actually 
 only changes to enable IRQ on one more place. That could cause more 
 IRQs than needed in the worst case, but those are usually not 
 breaking but only slowing things down. Also this fix is upstream 
 quite a while, increasing confidence.

  [Other Info]
   
   * There is currently 1720397 in flight in the SRU queue, so acceptance 
 of this upload has to wait until that completes.

  ---

  arm64 Ocata ,

  I'm testing to see I can get Ocata running on arm64 and using the
  openstack-base bundle to deploy it.  I have added the bundle to the
  log file attached to this bug.

  When I create a new instance via nova, the VM comes up and runs,
  however fails to raise its eth0 interface. This occurs on both
  internal and external networks.

  ubuntu@openstackaw:~$ nova list
  
+--+-+++-++
  | ID   | Name| Status | Task State | 
Power State | Networks   |
  
+--+-+++-++
  | dcaf6d51-f81e-4cbd-ac77-0c5d21bde57c | sfeole1 | ACTIVE | -  | 
Running | internal=10.5.5.3  |
  | aa0b8aee-5650-41f4-8fa0-aeccdc763425 | sfeole2 | ACTIVE | -  | 
Running | internal=10.5.5.13 |
  
+--+-+++-++
  ubuntu@openstackaw:~$ nova show aa0b8aee-5650-41f4-8fa0-aeccdc763425
  
+--+--+
  | Property | Value
|
  
+--+--+
  | OS-DCF:diskConfig| MANUAL   
|
  | OS-EXT-AZ:availability_zone  | nova 
|
  | OS-EXT-SRV-ATTR:host | awrep3   
|
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | awrep3.maas  
|
  | OS-EXT-SRV-ATTR:instance_name| instance-0003
|
  | OS-EXT-STS:power_state   | 1
|
  | OS-EXT-STS:task_state| -
|
  | OS-EXT-STS:vm_state  | active   
|
  | OS-SRV-USG:launched_at   | 2017-09-24T14:23:08.00   
|
  | OS-SRV-USG:terminated_at | -
|
  | accessIPv4   |  
|
  | accessIPv6   |  
|
  | config_drive |  
|
  | created  | 2017-09-24T14:22:41Z 
|
  | flavor   | m1.small 
(717660ae-0440-4b19-a762-ffeb32a0575c)  |
  | hostId   | 
5612a00671c47255d2ebd6737a64ec9bd3a5866d1233ecf3e988b025 |
  | id   | aa0b8aee-5650-41f4-8fa0-aeccdc763425 
|
  | 

[Qemu-devel] [Bug 1719196] Re: [arm64 ocata] newly created instances are unable to raise network interfaces

2017-10-31 Thread dann frazier
Thanks Christian - I've now verified this. I took a stepwise approach:

1) We originally observed this issue w/ the ocata cloud archive on
xenial, so I redeployed that. I verified that I was still seeing the
problem. I then created a PPA[*] w/ an arm64 build of QEMU from the
ocata-staging PPA, which is a backport of the zesty-proposed package,
and upgraded my nova-compute nodes to this version. I rebooted my test
guests, and the problem was resolved.

2) I then updated my sources.list to point to zesty (w/ proposed
enabled), and upgraded qemu-system-arm. This way I could test the actual
build in zesty-proposed, as opposed to my backport. This continued to
work.

3) Finally, I dist-upgraded this system from xenial to zesty - so that
I'm actually testing the zesty build in a zesty environment, and
rebooted. Still worked :)

[*] https://launchpad.net/~dannf/+archive/ubuntu/lp1719196

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1719196

Title:
  [arm64 ocata] newly created instances are unable to raise network
  interfaces

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive ocata series:
  Fix Committed
Status in libvirt:
  New
Status in QEMU:
  Fix Released
Status in libvirt package in Ubuntu:
  Invalid
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Zesty:
  Fix Committed

Bug description:
  [Impact]

   * A change in qemu 2.8 (83d768b virtio: set ISR on dataplane 
 notifications) broke virtio handling on platforms without a 
 controller. Those encounter flaky networking due to missed IRQs

   * Fix is a backport of the upstream fix b4b9862b: virtio: Fix no 
 interrupt when not creating msi controller

  [Test Case]

   * On Arm with Zesty (or Ocata) run a guest without PCI based devices

   * Example in e.g. c#23

   * Without the fix the networking does not work reliably (as it losses 
 IRQs), with the fix it works fine.

  [Regression Potential]

   * Changing the IRQ handling of virtio could affect virtio in general.
 But when reviwing the patch you'll see that it is small and actually 
 only changes to enable IRQ on one more place. That could cause more 
 IRQs than needed in the worst case, but those are usually not 
 breaking but only slowing things down. Also this fix is upstream 
 quite a while, increasing confidence.

  [Other Info]
   
   * There is currently 1720397 in flight in the SRU queue, so acceptance 
 of this upload has to wait until that completes.

  ---

  arm64 Ocata ,

  I'm testing to see I can get Ocata running on arm64 and using the
  openstack-base bundle to deploy it.  I have added the bundle to the
  log file attached to this bug.

  When I create a new instance via nova, the VM comes up and runs,
  however fails to raise its eth0 interface. This occurs on both
  internal and external networks.

  ubuntu@openstackaw:~$ nova list
  
+--+-+++-++
  | ID   | Name| Status | Task State | 
Power State | Networks   |
  
+--+-+++-++
  | dcaf6d51-f81e-4cbd-ac77-0c5d21bde57c | sfeole1 | ACTIVE | -  | 
Running | internal=10.5.5.3  |
  | aa0b8aee-5650-41f4-8fa0-aeccdc763425 | sfeole2 | ACTIVE | -  | 
Running | internal=10.5.5.13 |
  
+--+-+++-++
  ubuntu@openstackaw:~$ nova show aa0b8aee-5650-41f4-8fa0-aeccdc763425
  
+--+--+
  | Property | Value
|
  
+--+--+
  | OS-DCF:diskConfig| MANUAL   
|
  | OS-EXT-AZ:availability_zone  | nova 
|
  | OS-EXT-SRV-ATTR:host | awrep3   
|
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | awrep3.maas  
|
  | OS-EXT-SRV-ATTR:instance_name| instance-0003
|
  | OS-EXT-STS:power_state   | 1
|
  | OS-EXT-STS:task_state| -
|
  | OS-EXT-STS:vm_state  | active   
|
  | OS-SRV-USG:launched_at   | 2017-09-24T14:23:08.00   
|
  | 

[Qemu-devel] [Bug 1719196] Re: [arm64 ocata] newly created instances are unable to raise network interfaces

2017-10-05 Thread dann frazier
Thanks so much for doing that Sean.

Omitting expected changes (uuid, mac address, etc), here's are the
significant changes I see:

1) N uses the QEMU 'virt' model, O uses 'virt-2.8'
2) N and O both expose a pci root, but N also exposed 2 PCI bridges that O does 
not.
3) N exposes an additional serial device.
4) N and O both use an apparmor seclabel. However, O also has a DAC model.

#4 is the most interesting to me. Is there a way to configure ocata nova
to not enable DAC?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1719196

Title:
  [arm64 ocata] newly created instances are unable to raise network
  interfaces

Status in libvirt:
  New
Status in QEMU:
  New

Bug description:
  arm64 Ocata ,

  I'm testing to see I can get Ocata running on arm64 and using the
  openstack-base bundle to deploy it.  I have added the bundle to the
  log file attached to this bug.

  When I create a new instance via nova, the VM comes up and runs,
  however fails to raise its eth0 interface. This occurs on both
  internal and external networks.

  ubuntu@openstackaw:~$ nova list
  
+--+-+++-++
  | ID   | Name| Status | Task State | 
Power State | Networks   |
  
+--+-+++-++
  | dcaf6d51-f81e-4cbd-ac77-0c5d21bde57c | sfeole1 | ACTIVE | -  | 
Running | internal=10.5.5.3  |
  | aa0b8aee-5650-41f4-8fa0-aeccdc763425 | sfeole2 | ACTIVE | -  | 
Running | internal=10.5.5.13 |
  
+--+-+++-++
  ubuntu@openstackaw:~$ nova show aa0b8aee-5650-41f4-8fa0-aeccdc763425
  
+--+--+
  | Property | Value
|
  
+--+--+
  | OS-DCF:diskConfig| MANUAL   
|
  | OS-EXT-AZ:availability_zone  | nova 
|
  | OS-EXT-SRV-ATTR:host | awrep3   
|
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | awrep3.maas  
|
  | OS-EXT-SRV-ATTR:instance_name| instance-0003
|
  | OS-EXT-STS:power_state   | 1
|
  | OS-EXT-STS:task_state| -
|
  | OS-EXT-STS:vm_state  | active   
|
  | OS-SRV-USG:launched_at   | 2017-09-24T14:23:08.00   
|
  | OS-SRV-USG:terminated_at | -
|
  | accessIPv4   |  
|
  | accessIPv6   |  
|
  | config_drive |  
|
  | created  | 2017-09-24T14:22:41Z 
|
  | flavor   | m1.small 
(717660ae-0440-4b19-a762-ffeb32a0575c)  |
  | hostId   | 
5612a00671c47255d2ebd6737a64ec9bd3a5866d1233ecf3e988b025 |
  | id   | aa0b8aee-5650-41f4-8fa0-aeccdc763425 
|
  | image| zestynosplash 
(e88fd1bd-f040-44d8-9e7c-c462ccf4b945) |
  | internal network | 10.5.5.13
|
  | key_name | mykey
|
  | metadata | {}   
|
  | name | sfeole2  
|
  | os-extended-volumes:volumes_attached | []   
|
  | progress | 0
|
  | security_groups  | default  
|
  | status   | ACTIVE   
|
  | tenant_id| 9f7a21c1ad264fec81abc09f3960ad1d 
 

[Qemu-devel] [PATCH] seccomp: loosen library version dependency

2015-10-23 Thread dann frazier
Drop the libseccomp required version back to 2.1.0, restoring the ability
to build w/ --enable-seccomp on Ubuntu 14.04.

Commit 4cc47f8b3cc4f32586ba2f7fce1dc267da774a69 tightened the dependency
on libseccomp from version 2.1.0 to 2.1.1. This broke building on Ubuntu
14.04, the current Ubuntu LTS release. The commit message didn't mention
any specific functional need for 2.1.1, just that it was the most recent
stable version at the time. I reviewed the changes between 2.1.0 and 2.1.1,
but it looks like that update just contained minor fixes and cleanups - no
obvious (to me) new interfaces or critical bug fixes.

Signed-off-by: dann frazier <dann.fraz...@canonical.com>
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 7d5aab2..8a9794b 100755
--- a/configure
+++ b/configure
@@ -1878,7 +1878,7 @@ fi
 if test "$seccomp" != "no" ; then
 case "$cpu" in
 i386|x86_64)
-libseccomp_minver="2.1.1"
+libseccomp_minver="2.1.0"
 ;;
 arm|aarch64)
 libseccomp_minver="2.2.3"
-- 
2.6.1




[Qemu-devel] [PATCH] seccomp: loosen library version dependency

2015-10-20 Thread dann frazier
Drop the libseccomp required version back to 2.1.0, restoring the ability
to build w/ --enable-seccomp on Ubuntu 14.04.

Commit 4cc47f8b3cc4f32586ba2f7fce1dc267da774a69 tightened the dependency
on libseccomp from version 2.1.0 to 2.1.1. This broke building on Ubuntu
14.04, the current Ubuntu LTS release. The commit message didn't mention
any specific functional need for 2.1.1, just that it was the most recent
stable version at the time. I reviewed the changes between 2.1.0 and 2.1.1,
but it looks like that update just contained minor fixes and cleanups - no
obvious (to me) new interfaces or critical bug fixes.

Signed-off-by: dann frazier <dann.fraz...@canonical.com>
---
 configure | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 913ae4a..b6f4694 100755
--- a/configure
+++ b/configure
@@ -1873,13 +1873,13 @@ fi
 
 if test "$seccomp" != "no" ; then
 if test "$cpu" = "i386" || test "$cpu" = "x86_64" &&
-$pkg_config --atleast-version=2.1.1 libseccomp; then
+$pkg_config --atleast-version=2.1.0 libseccomp; then
 libs_softmmu="$libs_softmmu `$pkg_config --libs libseccomp`"
 QEMU_CFLAGS="$QEMU_CFLAGS `$pkg_config --cflags libseccomp`"
seccomp="yes"
 else
if test "$seccomp" = "yes"; then
-feature_not_found "libseccomp" "Install libseccomp devel >= 2.1.1"
+feature_not_found "libseccomp" "Install libseccomp devel >= 2.1.0"
fi
seccomp="no"
 fi
-- 
2.6.1




[Qemu-devel] [Bug 1289527] Re: qemu-aarch64-static: java dies with SIGILL

2014-08-11 Thread dann frazier
Using 2.1+dfsg-2ubuntu2 from utopic within a trusty ubuntu core:

# /usr/bin/qemu-aarch64-static -d unimp /usr/bin/java 
host mmap_min_addr=0x1
Reserved 0x12000 bytes of guest address space
Relocating guest address space from 0x0040 to 0x40
guest_base  0x0
startend  size prot
0040-00401000 1000 r-x
0041-00412000 2000 rw-
0040-00401000 1000 ---
00401000-004000801000 0080 rw-
004000801000-00400081c000 0001b000 r-x
00400081c000-00400082c000 0001 ---
00400082c000-00400082f000 3000 rw-
start_brk   0x
end_code0x00400834
start_code  0x0040
start_data  0x00410db0
end_data0x00411030
start_stack 0x0040008007b0
brk 0x00411038
entry   0x004000801f80
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1289527

Title:
  qemu-aarch64-static: java dies with SIGILL

Status in QEMU:
  New
Status in “qemu” package in Ubuntu:
  Incomplete

Bug description:
  qemu-aarch64-static from qemu-user-static 1.7.0+dfsg-3ubuntu5
  (I haven't tried reproducing w/ upstream git yet)

  In an arm64 trusty chroot on an amd64 system:

  dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$ java
  #
  # A fatal error has been detected by the Java Runtime Environment:
  #
  #  SIGILL (0x4) at pc=0x0040098e8070, pid=15034, tid=274902467056
  #
  # JRE version:  (7.0_51-b31) (build )
  # Java VM: OpenJDK 64-Bit Server VM (25.0-b59 mixed mode linux-aarch64 
compressed oops)
  # Problematic frame:
  # v  ~BufferBlob::flush_icache_stub
  #
  # Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try ulimit -c unlimited before starting Java again
  #
  # An error report file with more information is saved as:
  # /home/dannf/hs_err_pid15034.log
  #
  # If you would like to submit a bug report, please visit:
  #   http://bugreport.sun.com/bugreport/crash.jsp
  #
  qemu: uncaught target signal 6 (Aborted) - core dumped
  Aborted (core dumped)
  dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1289527/+subscriptions



[Qemu-devel] [Bug 1289527] Re: qemu-aarch64-static: java dies with SIGILL

2014-08-11 Thread dann frazier
I'm also seeing a SEGV (not a SIGILL) when testing the version of QEMU
that shipped in trusty. So, we might just consider this bug fixed and
track this segfault issue separately.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1289527

Title:
  qemu-aarch64-static: java dies with SIGILL

Status in QEMU:
  New
Status in “qemu” package in Ubuntu:
  Incomplete

Bug description:
  qemu-aarch64-static from qemu-user-static 1.7.0+dfsg-3ubuntu5
  (I haven't tried reproducing w/ upstream git yet)

  In an arm64 trusty chroot on an amd64 system:

  dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$ java
  #
  # A fatal error has been detected by the Java Runtime Environment:
  #
  #  SIGILL (0x4) at pc=0x0040098e8070, pid=15034, tid=274902467056
  #
  # JRE version:  (7.0_51-b31) (build )
  # Java VM: OpenJDK 64-Bit Server VM (25.0-b59 mixed mode linux-aarch64 
compressed oops)
  # Problematic frame:
  # v  ~BufferBlob::flush_icache_stub
  #
  # Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try ulimit -c unlimited before starting Java again
  #
  # An error report file with more information is saved as:
  # /home/dannf/hs_err_pid15034.log
  #
  # If you would like to submit a bug report, please visit:
  #   http://bugreport.sun.com/bugreport/crash.jsp
  #
  qemu: uncaught target signal 6 (Aborted) - core dumped
  Aborted (core dumped)
  dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1289527/+subscriptions



Re: [Qemu-devel] Call for testing QEMU aarch64-linux-user emulation

2014-03-09 Thread Dann Frazier
On Tue, Feb 25, 2014 at 1:39 AM, Alex Bennée alex.ben...@linaro.org wrote:

 Dann Frazier dann.fraz...@canonical.com writes:

 On Mon, Feb 17, 2014 at 6:40 AM, Alex Bennée alex.ben...@linaro.org wrote:
 Hi,

 Thanks to all involved for your work here!

 After a solid few months of work the QEMU master branch [1] has now reached
 instruction feature parity with the suse-1.6 [6] tree that a lot of people
 have been using to build various aarch64 binaries. In addition to the
 snip

 I've tested against the following aarch64 rootfs:
 * SUSE [2]
 * Debian [3]
 * Ubuntu Saucy [4]

 fyi, I've been doing my testing with Ubuntu Trusty.

 Good stuff, I shall see if I can set one up. Is the package coverage
 between trusty and saucy much different? I noticed for example I
 couldn't find zile and various build-deps for llvm.

Oops, must've missed this question before. I'd say in general they
should be quite similar, but there are obviously exceptions (zile is
one).
I'm not sure why it was omitted.

Also - I've found an issue with running OpenJDK in the latest upstream git:

root@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:/root# java
#
[thread 274902467056 also had an error]# A fatal error has been
detected by the Java Runtime Environment:

#
#  SIGSEGV (0xb) at pc=0x, pid=9960, tid=297441608176
#
# JRE version: OpenJDK Runtime Environment (7.0_51-b31) (build 1.7.0_51-b31)
# Java VM: OpenJDK 64-Bit Server VM (25.0-b59 mixed mode linux-aarch64
compressed oops)
# Problematic frame:
# qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted (core dumped)

This is openjdk-7-jre-headless, version 7u51-2.4.5-1ubuntu1. I'm not
sure what debug info would be useful here, but let me know and I can
collect it.



[Qemu-devel] [Bug 1289527] Re: qemu-aarch64-static: java dies with SIGILL

2014-03-09 Thread dann frazier
** Also affects: qemu
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1289527

Title:
  qemu-aarch64-static: java dies with SIGILL

Status in QEMU:
  New
Status in “qemu” package in Ubuntu:
  Confirmed

Bug description:
  qemu-aarch64-static from qemu-user-static 1.7.0+dfsg-3ubuntu5
  (I haven't tried reproducing w/ upstream git yet)

  In an arm64 trusty chroot on an amd64 system:

  dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$ java
  #
  # A fatal error has been detected by the Java Runtime Environment:
  #
  #  SIGILL (0x4) at pc=0x0040098e8070, pid=15034, tid=274902467056
  #
  # JRE version:  (7.0_51-b31) (build )
  # Java VM: OpenJDK 64-Bit Server VM (25.0-b59 mixed mode linux-aarch64 
compressed oops)
  # Problematic frame:
  # v  ~BufferBlob::flush_icache_stub
  #
  # Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try ulimit -c unlimited before starting Java again
  #
  # An error report file with more information is saved as:
  # /home/dannf/hs_err_pid15034.log
  #
  # If you would like to submit a bug report, please visit:
  #   http://bugreport.sun.com/bugreport/crash.jsp
  #
  qemu: uncaught target signal 6 (Aborted) - core dumped
  Aborted (core dumped)
  dannf@server-75e0210e-4f99-4c86-9277-3201ab7b6afd:~$

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1289527/+subscriptions



[Qemu-devel] [Bug 1285363] Re: qemu-aarch64-static segfaults

2014-03-06 Thread dann frazier
@Serge: I can confirm that this is fixed in 1.7.0+dfsg-3ubuntu5sig1 from
your ppa.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1285363

Title:
  qemu-aarch64-static segfaults

Status in QEMU:
  New
Status in “qemu” package in Ubuntu:
  Confirmed

Bug description:
  I've found a couple conditions that causes qemu-user-static to core
  dump fairly reliably - same with upstream git - while a binary built
  from suse's aarch64-1.6 branch seems to consistently work fine.

  Testing suggests they are resolved by the sigprocmask wrapper patches
  included in suse's tree.

   1) dh_fixperms is a script that commonly runs at the end of a package build.
   Its basically doing a `find | xargs chmod`.
   2) debootstrap --second-stage
   This is used to configure an arm64 chroot that was built using
   debootstrap on a non-native host. It is basically invoking a bunch of
   shell scripts (postinst, etc). When it blows up, the stack consistently
   looks like this:

  Core was generated by `/usr/bin/qemu-aarch64-static /bin/sh -e
  /debootstrap/debootstrap --second-stage'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0,
  __dest=0x400082c330) at
  /usr/include/x86_64-linux-gnu/bits/string3.h:51
  51  return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
  (gdb) bt
  #0  0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0,
  __dest=0x400082c330) at
  /usr/include/x86_64-linux-gnu/bits/string3.h:51
  #1  stq_p (v=274886476624, ptr=0x400082c330) at
  /mnt/qemu.upstream/include/qemu/bswap.h:280
  #2  stq_le_p (v=274886476624, ptr=0x400082c330) at
  /mnt/qemu.upstream/include/qemu/bswap.h:315
  #3  target_setup_sigframe (set=0x7fff62ae3530, env=0x62d9c678,
  sf=0x400082b0d0) at /mnt/qemu.upstream/linux-user/signal.c:1167
  #4  target_setup_frame (usig=usig@entry=17, ka=ka@entry=0x604ec1e0
  sigact_table+512, info=info@entry=0x0, set=set@entry=0x7fff62ae3530,
  env=env@entry=0x62d9c678)
  at /mnt/qemu.upstream/linux-user/signal.c:1286
  #5  0x60059f46 in setup_frame (env=0x62d9c678,
  set=0x7fff62ae3530, ka=0x604ec1e0 sigact_table+512, sig=17) at
  /mnt/qemu.upstream/linux-user/signal.c:1322
  #6  process_pending_signals (cpu_env=cpu_env@entry=0x62d9c678) at
  /mnt/qemu.upstream/linux-user/signal.c:5747
  #7  0x60056e60 in cpu_loop (env=env@entry=0x62d9c678) at
  /mnt/qemu.upstream/linux-user/main.c:1082
  #8  0x60005079 in main (argc=optimized out, argv=optimized
  out, envp=optimized out) at
  /mnt/qemu.upstream/linux-user/main.c:4374

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1285363/+subscriptions



Re: [Qemu-devel] Call for testing QEMU aarch64-linux-user emulation

2014-02-27 Thread Dann Frazier
[Adding Alex Barcelo to the CC]

On Thu, Feb 27, 2014 at 6:20 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Wed, 26 Feb 2014, Dann Frazier wrote:

 I've narrowed down the changes that seem to prevent both types of
 segfaults to the following changes that introduce a wrapper around
 sigprocmask:

 https://github.com/susematz/qemu/commit/f1542ae9fe10d5a241fc2624ecaef5f0948e3472
 https://github.com/susematz/qemu/commit/4e5e1607758841c760cda4652b0ee7a6bc6eb79d
 https://github.com/susematz/qemu/commit/63eb8d3ea58f58d5857153b0c632def1bbd05781

 I'm not sure if this is a real fix or just papering over my issue -

 It's fixing the issue, but strictly speaking introduces an QoI problem.
 SIGSEGV must not be controllable by the guest, it needs to be always
 deliverable to qemu; that is what's fixed.

 The QoI problem introduced is that with the implementation as is, the
 fiddling with SIGSEGV is detectable by the guest.  E.g. if it installs a
 segv handler, blocks segv, then forces a segfault, checks that it didn't
 arrive, then unblocks segv and checks that it now arrives, such testcase
 would be able to detect that in fact it couldn't block SIGSEGV.

 Luckily for us, the effect of blocking SIGSEGV and then generating one in
 other ways than kill/sigqueue/raise (e.g. by writing to NULL) are
 undefined, so in practice that QoI issue doesn't matter.

 To fix also that latter part it'd need a further per-thread flag
 segv_blocked_p which needs to be checked before actually delivering a
 guest-directed SIGSEGV (in comparison to a qemu-directed SEGV), and
 otherwise requeue it.  That's made a bit complicated when the SEGV was
 process-directed (not thread-directed) because in that case it needs to be
 delivered as long as there's _any_ thread which has it unblocked.  So
 given the above undefinedness for sane uses of SEGVs it didn't seem worth
 the complication of having an undetectable virtualization of SIGSEGV.

Thanks for the explanation.

 but either way, are these changes reasonable for upstream submission?

 IIRC the first two commits (from Alex Barcelo) were submitted once in the
 past, but fell through the cracks.

Alex: are you interested in resubmitting these - or would you like me
to attempt to on your behalf?

 -dann


 Ciao,
 Michael.



Re: [Qemu-devel] Call for testing QEMU aarch64-linux-user emulation

2014-02-26 Thread Dann Frazier
On Tue, Feb 25, 2014 at 1:39 AM, Alex Bennée alex.ben...@linaro.org wrote:

 Dann Frazier dann.fraz...@canonical.com writes:

 On Mon, Feb 17, 2014 at 6:40 AM, Alex Bennée alex.ben...@linaro.org wrote:
 Hi,

 Thanks to all involved for your work here!

 After a solid few months of work the QEMU master branch [1] has now reached
 instruction feature parity with the suse-1.6 [6] tree that a lot of people
 have been using to build various aarch64 binaries. In addition to the
 snip

 I've tested against the following aarch64 rootfs:
 * SUSE [2]
 * Debian [3]
 * Ubuntu Saucy [4]

 fyi, I've been doing my testing with Ubuntu Trusty.

 Good stuff, I shall see if I can set one up. Is the package coverage
 between trusty and saucy much different? I noticed for example I
 couldn't find zile and various build-deps for llvm.

 snip

 Feedback I'm interested in
 ==

 * Any instruction failure (please include the log line with the
   unsupported message)
 * Any aarch64 specific failures (i.e. not generic QEMU threading 
 flakeiness).

 I'm not sure if this qualifies as generic QEMU threading flakiness or not. 
 I've
 found a couple conditions that causes master to core dump fairly
 reliably, while the aarch64-1.6 branch seems to consistently work
 fine.

  1) dh_fixperms is a script that commonly runs at the end of a package build.
  Its basically doing a `find | xargs chmod`.
  2) debootstrap --second-stage
  This is used to configure an arm64 chroot that was built using
  debootstrap on a non-native host. It is basically invoking a bunch of
  shell scripts (postinst, etc). When it blows up, the stack consistently
  looks like this:

I've narrowed down the changes that seem to prevent both types of
segfaults to the following changes that introduce a wrapper around
sigprocmask:

https://github.com/susematz/qemu/commit/f1542ae9fe10d5a241fc2624ecaef5f0948e3472
https://github.com/susematz/qemu/commit/4e5e1607758841c760cda4652b0ee7a6bc6eb79d
https://github.com/susematz/qemu/commit/63eb8d3ea58f58d5857153b0c632def1bbd05781

I'm not sure if this is a real fix or just papering over my issue -
but either way, are these changes reasonable for upstream submission?

 -dann


 Core was generated by `/usr/bin/qemu-aarch64-static /bin/sh -e
 /debootstrap/debootstrap --second-stage'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0,
 __dest=0x400082c330) at
 /usr/include/x86_64-linux-gnu/bits/string3.h:51
 51  return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
 (gdb) bt
 #0  0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0,
 __dest=0x400082c330) at
 /usr/include/x86_64-linux-gnu/bits/string3.h:51
 #1  stq_p (v=274886476624, ptr=0x400082c330) at
 /mnt/qemu.upstream/include/qemu/bswap.h:280
 #2  stq_le_p (v=274886476624, ptr=0x400082c330) at
 /mnt/qemu.upstream/include/qemu/bswap.h:315
 #3  target_setup_sigframe (set=0x7fff62ae3530, env=0x62d9c678,
 sf=0x400082b0d0) at /mnt/qemu.upstream/linux-user/signal.c:1167
 #4  target_setup_frame (usig=usig@entry=17, ka=ka@entry=0x604ec1e0
 sigact_table+512, info=info@entry=0x0, set=set@entry=0x7fff62ae3530,
 env=env@entry=0x62d9c678)
 at /mnt/qemu.upstream/linux-user/signal.c:1286
 #5  0x60059f46 in setup_frame (env=0x62d9c678,
 set=0x7fff62ae3530, ka=0x604ec1e0 sigact_table+512, sig=17) at
 /mnt/qemu.upstream/linux-user/signal.c:1322
 #6  process_pending_signals (cpu_env=cpu_env@entry=0x62d9c678) at
 /mnt/qemu.upstream/linux-user/signal.c:5747
 #7  0x60056e60 in cpu_loop (env=env@entry=0x62d9c678) at
 /mnt/qemu.upstream/linux-user/main.c:1082
 #8  0x60005079 in main (argc=optimized out, argv=optimized
 out, envp=optimized out) at
 /mnt/qemu.upstream/linux-user/main.c:4374

 There are some pretty large differences between these trees with
 respect to signal syscalls - is that the likely culprit?

 Quite likely. We explicitly concentrated on the arch64 specific
 instruction emulation leaving more generic patches to flow in from SUSE
 as they matured.

 I guess it's time to go through the remaining patches and see what's 
 up-streamable.

 Alex/Michael,

 Are any of these patches in flight now?

 Cheers,

 --
 Alex Bennée
 QEMU/KVM Hacker for Linaro





Re: [Qemu-devel] Call for testing QEMU aarch64-linux-user emulation

2014-02-24 Thread Dann Frazier
On Mon, Feb 17, 2014 at 6:40 AM, Alex Bennée alex.ben...@linaro.org wrote:
 Hi,

Thanks to all involved for your work here!

 After a solid few months of work the QEMU master branch [1] has now reached
 instruction feature parity with the suse-1.6 [6] tree that a lot of people
 have been using to build various aarch64 binaries. In addition to the
 SUSE work we have fixed numerous edge cases and finished off classes of
 instructions. All instructions have been verified with Peter's RISU
 random instruction testing tool. I have also built and run many
 packages as well as built gcc and passed most of the aarch64 specific tests.

 I've tested against the following aarch64 rootfs:
 * SUSE [2]
 * Debian [3]
 * Ubuntu Saucy [4]

fyi, I've been doing my testing with Ubuntu Trusty.

 In my tree the remaining insns that the GCC aarch64 tests need to
 implement are:
 FRECPE
 FRECPX
 CLS (2 misc variant)
 CLZ (2 misc variant)
 FSQRT
 FRINTZ
 FCVTZS

 Which I'm currently working though now. However for most build tasks I
 expect the instructions in master [1] will be enough.

 If you want the latest instructions working their way to mainline you
 are free to use my tree [5] which currently has:

 * Additional NEON/SIMD instructions
 * sendmsg syscall
 * Improved helper scripts for setting up binfmt_misc
 * The ability to set QEMU_LOG_FILENAME to /path/to/something-%d.log
   - this is useful when tests are failing N-levels deep as %d is
 replaced with the pid

 Feedback I'm interested in
 ==

 * Any instruction failure (please include the log line with the
   unsupported message)
 * Any aarch64 specific failures (i.e. not generic QEMU threading flakeiness).

I'm not sure if this qualifies as generic QEMU threading flakiness or not. I've
found a couple conditions that causes master to core dump fairly
reliably, while the aarch64-1.6 branch seems to consistently work
fine.

 1) dh_fixperms is a script that commonly runs at the end of a package build.
 Its basically doing a `find | xargs chmod`.
 2) debootstrap --second-stage
 This is used to configure an arm64 chroot that was built using
 debootstrap on a non-native host. It is basically invoking a bunch of
 shell scripts (postinst, etc). When it blows up, the stack consistently
 looks like this:

Core was generated by `/usr/bin/qemu-aarch64-static /bin/sh -e
/debootstrap/debootstrap --second-stage'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0,
__dest=0x400082c330) at
/usr/include/x86_64-linux-gnu/bits/string3.h:51
51  return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) bt
#0  0x60058e55 in memcpy (__len=8, __src=0x7fff62ae34e0,
__dest=0x400082c330) at
/usr/include/x86_64-linux-gnu/bits/string3.h:51
#1  stq_p (v=274886476624, ptr=0x400082c330) at
/mnt/qemu.upstream/include/qemu/bswap.h:280
#2  stq_le_p (v=274886476624, ptr=0x400082c330) at
/mnt/qemu.upstream/include/qemu/bswap.h:315
#3  target_setup_sigframe (set=0x7fff62ae3530, env=0x62d9c678,
sf=0x400082b0d0) at /mnt/qemu.upstream/linux-user/signal.c:1167
#4  target_setup_frame (usig=usig@entry=17, ka=ka@entry=0x604ec1e0
sigact_table+512, info=info@entry=0x0, set=set@entry=0x7fff62ae3530,
env=env@entry=0x62d9c678)
at /mnt/qemu.upstream/linux-user/signal.c:1286
#5  0x60059f46 in setup_frame (env=0x62d9c678,
set=0x7fff62ae3530, ka=0x604ec1e0 sigact_table+512, sig=17) at
/mnt/qemu.upstream/linux-user/signal.c:1322
#6  process_pending_signals (cpu_env=cpu_env@entry=0x62d9c678) at
/mnt/qemu.upstream/linux-user/signal.c:5747
#7  0x60056e60 in cpu_loop (env=env@entry=0x62d9c678) at
/mnt/qemu.upstream/linux-user/main.c:1082
#8  0x60005079 in main (argc=optimized out, argv=optimized
out, envp=optimized out) at
/mnt/qemu.upstream/linux-user/main.c:4374

There are some pretty large differences between these trees with
respect to signal syscalls - is that the likely culprit?

 -dann



 If you need to catch me in real time I'm available on #qemu (stsquad)
 and #linaro-virtualization (ajb-linaro).

 Many thanks to the SUSE guys for getting the aarch64 train rolling. I
 hope your happy with the final result ;-)

 Cheers,

 --
 Alex Bennée
 QEMU/KVM Hacker for Linaro

 [1] git://git.qemu.org/qemu.git master
 [2] 
 http://download.opensuse.org/ports/aarch64/distribution/13.1/appliances/openSUSE-13.1-ARM-JeOS.aarch64-rootfs.aarch64-1.12.1-Build32.1.tbz
 [3] 
 http://people.debian.org/~wookey/bootstrap/rootfs/debian-unstable-arm64.tar.gz
 [4] http://people.debian.org/~wookey/bootstrap/rootfs/saucy-arm64.tar.gz
 [5] https://github.com/stsquad/qemu/tree/ajb-a64-working
 [6] https://github.com/susematz/qemu/tree/aarch64-1.6




[Qemu-devel] [PATCH] e1000: Don't set the Capabilities List bit

2011-09-21 Thread dann frazier
[Originally sent to qemu-kvm list, but I was redirected here]

The Capabilities Pointer is NULL, so this bit shouldn't be set. The state of
this bit doesn't appear to change any behavior on Linux/Windows versions we've
tested, but it does cause Windows' PCI/PCI Express Compliance Test to balk.

I happen to have a physical 82540EM controller, and it also sets the
Capabilities Bit, but it actually has items on the capabilities list to go
with it :)

Signed-off-by: dann frazier dann.fraz...@canonical.com
---
 hw/e1000.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 6a3a941..ce8fc8b 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1151,8 +1151,6 @@ static int pci_e1000_init(PCIDevice *pci_dev)
 
 pci_conf = d-dev.config;
 
-/* TODO: we have no capabilities, so why is this bit set? */
-pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST);
 /* TODO: RST# value should be 0, PCI spec 6.2.4 */
 pci_conf[PCI_CACHE_LINE_SIZE] = 0x10;
 
-- 
1.7.6.3