BUG: 'list_empty(&vgdev->free_vbufs)' is true!

2016-12-15 Thread Jiri Slaby
On 11/16/2016, 02:12 PM, Gerd Hoffmann wrote:
> On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote:
>> On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote:
>>> On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
> Hi,
>
> I can relatively easily reproduce this bug:
>>>
>>> How?
>>
>> Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output.
>> Run pps [1] without exit(0); on e.g. serial console.
>> Wait a bit. The lot of output causes the BUG.
>>
>> [1] https://github.com/jirislaby/collected_sources/blob/master/pps.c
> 
> Doesn't reproduce here.
> 
> Running "while true; do dmesg; done" on the virtio-gpu fbcon.
> Running the pps fork bomb on the serial console.
> 
> Can watch dmesg printing the kernel messages over and over, until the
> shell can't spawn dmesg any more due to the fork bomb hitting the
> process limit.  No BUG() triggered.
> 
> Tried spice, gtk and sdl.
> 
> Hmm.
> 
> Any ideas what else might be needed to reproduce it?

I can reproduce even with count = 32 :(. And without the fork bomb (i.e.
with the code from the repository).

This is how I start qemu:
/usr/bin/qemu-system-x86_64 -machine accel=kvm -k en-us -smp 4 -m 2371
-usb -device virtio-rng-pci -drive
file=/home/new/suse-fact.img,format=raw,discard=unmap,if=none,id=hd
-device virtio-scsi-pci,id=scsi -device scsi-hd,drive=hd -soundhw hda
-net
user,tftp=/home/xslaby/tftp,bootfile=/pxelinux.0,hostfwd=tcp::-:22,hostfwd=tcp::3632-:3632
-net nic,model=virtio -serial pty -balloon virtio -device
virtio-tablet-pci -vga virtio -kernel
/home/latest/my/arch/x86/boot/bzImage -append root=/dev/sda1
console=ttyS0,115200 loglevel=debug -snapshot

I do
  dmesg -w # on the console
and on serial console:
  while :; do for aa in `seq 1 10`; do ./pps & done; wait; done

Note the latter can cause interrupt "storm" (~ 700 irqs per second) as
much output is generated. This can lead to some race condition. serial
is on IRQ4 and virtio gpu on IRQ10 which has lower priority AFAIK.

thanks,
-- 
js
suse labs


BUG: 'list_empty(&vgdev->free_vbufs)' is true!

2016-11-16 Thread Gerd Hoffmann
On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote:
> On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote:
> > On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote:
> >> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
> >>> Hi,
> >>>
> >>> I can relatively easily reproduce this bug:
> > 
> > How?
> 
> Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output.
> Run pps [1] without exit(0); on e.g. serial console.
> Wait a bit. The lot of output causes the BUG.
> 
> [1] https://github.com/jirislaby/collected_sources/blob/master/pps.c

Doesn't reproduce here.

Running "while true; do dmesg; done" on the virtio-gpu fbcon.
Running the pps fork bomb on the serial console.

Can watch dmesg printing the kernel messages over and over, until the
shell can't spawn dmesg any more due to the fork bomb hitting the
process limit.  No BUG() triggered.

Tried spice, gtk and sdl.

Hmm.

Any ideas what else might be needed to reproduce it?

cheers,
  Gerd



BUG: 'list_empty(&vgdev->free_vbufs)' is true!

2016-11-15 Thread Gerd Hoffmann
On Di, 2016-11-15 at 09:55 +0100, Jiri Slaby wrote:
> On 11/15/2016, 09:46 AM, Gerd Hoffmann wrote:
> > On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote:
> >> On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote:
> >>> On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote:
>  On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
> > Hi,
> >
> > I can relatively easily reproduce this bug:
> >>>
> >>> How?
> >>
> >> Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output.
> > 
> > fbcon?  Or xorg/wayland with terminal app?
> 
> Ah, just console, so fbcon. No X server running.

Hmm, /me looks puzzled.  fbcon doesn't do cursor updates, so the cursor
queue can hardly be full and there should be enough buffers even without
allocating 16 extra bufs.  I'll go try reproduce and analyze that one.
The +16 patch submitted nevertheless as temporary stopgap.

cheers,
  Gerd



BUG: 'list_empty(&vgdev->free_vbufs)' is true!

2016-11-15 Thread Jiri Slaby
On 11/15/2016, 09:46 AM, Gerd Hoffmann wrote:
> On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote:
>> On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote:
>>> On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
> Hi,
>
> I can relatively easily reproduce this bug:
>>>
>>> How?
>>
>> Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output.
> 
> fbcon?  Or xorg/wayland with terminal app?

Ah, just console, so fbcon. No X server running.

thanks,
-- 
js
suse labs


BUG: 'list_empty(&vgdev->free_vbufs)' is true!

2016-11-15 Thread Gerd Hoffmann
On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote:
> On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote:
> > On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote:
> >> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
> >>> Hi,
> >>>
> >>> I can relatively easily reproduce this bug:
> > 
> > How?
> 
> Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output.

fbcon?  Or xorg/wayland with terminal app?

> Run pps [1] without exit(0); on e.g. serial console.
> Wait a bit. The lot of output causes the BUG.
> 
> [1] https://github.com/jirislaby/collected_sources/blob/master/pps.c
> 
> >>> BUG: 'list_empty(&vgdev->free_vbufs)' is true!
> > 
> >> The following might be helpful for debugging - if kernel still will
> >> not stop panicing, we are looking at some kind
> >> of memory corruption.
> > 
> > Looking carefully through the code I think it isn't impossible to
> > trigger this, but you need for that:
> > 
> >   (1) command queue full (quite possible),
> >   (2) cursor queue full too (unlikely), and
> >   (3) multiple threads trying to submit commands and waiting for free
> >   space in the command queue (possible with virgl enabled).
> 
> I use -vga virtio with no -display option, so no virtgl, I suppose:
> [drm] virgl 3d acceleration not available
> 
> > Do things improve if you allocate some extra bufs?
> > 
> >  int virtio_gpu_alloc_vbufs(struct virtio_gpu_device *vgdev)
> >  {
> > struct virtio_gpu_vbuffer *vbuf;
> > -   int i, size, count = 0;
> > +   int i, size, count = 16;
> 
> This seems to help.
> 
> thanks,



BUG: 'list_empty(&vgdev->free_vbufs)' is true!

2016-11-11 Thread Jiri Slaby
On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote:
> On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote:
>> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
>>> Hi,
>>>
>>> I can relatively easily reproduce this bug:
> 
> How?

Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output.
Run pps [1] without exit(0); on e.g. serial console.
Wait a bit. The lot of output causes the BUG.

[1] https://github.com/jirislaby/collected_sources/blob/master/pps.c

>>> BUG: 'list_empty(&vgdev->free_vbufs)' is true!
> 
>> The following might be helpful for debugging - if kernel still will
>> not stop panicing, we are looking at some kind
>> of memory corruption.
> 
> Looking carefully through the code I think it isn't impossible to
> trigger this, but you need for that:
> 
>   (1) command queue full (quite possible),
>   (2) cursor queue full too (unlikely), and
>   (3) multiple threads trying to submit commands and waiting for free
>   space in the command queue (possible with virgl enabled).

I use -vga virtio with no -display option, so no virtgl, I suppose:
[drm] virgl 3d acceleration not available

> Do things improve if you allocate some extra bufs?
> 
>  int virtio_gpu_alloc_vbufs(struct virtio_gpu_device *vgdev)
>  {
> struct virtio_gpu_vbuffer *vbuf;
> -   int i, size, count = 0;
> +   int i, size, count = 16;

This seems to help.

thanks,
-- 
js
suse labs


BUG: 'list_empty(&vgdev->free_vbufs)' is true!

2016-11-11 Thread Jiri Slaby
On 11/08/2016, 09:37 PM, Michael S. Tsirkin wrote:
> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
> The following might be helpful for debugging - if kernel still will
> not stop panicing, we are looking at some kind
> of memory corruption.
> 
> 
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
> b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 5a0f8a7..d5e1e72 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -127,7 +127,11 @@ virtio_gpu_get_vbuf(struct virtio_gpu_device *vgdev,
>   struct virtio_gpu_vbuffer *vbuf;
>  
>   spin_lock(&vgdev->free_vbufs_lock);
> - BUG_ON(list_empty(&vgdev->free_vbufs));
> + WARN_ON(list_empty(&vgdev->free_vbufs));
> + if (list_empty(&vgdev->free_vbufs)) {
> + spin_unlock(&vgdev->free_vbufs_lock);
> + return ERR_PTR(-EINVAL);
> + }

Yeah, I already tried that, but it dies immediately after that:

WARNING: '1' is true!
[ cut here ]
WARNING: CPU: 2 PID: 5019 at
/home/latest/linux/drivers/gpu/drm/virtio/virtgpu_vq.c:130
virtio_gpu_get_vbuf+0x415/0x6a0
Modules linked in:
CPU: 2 PID: 5019 Comm: kworker/2:3 Not tainted 4.9.0-rc2-next-20161028+ #33
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Workqueue: events drm_fb_helper_dirty_work
Call Trace:
 dump_stack+0xcd/0x134
 ? _atomic_dec_and_lock+0xcc/0xcc
 ? vprintk_default+0x1f/0x30
 ? printk+0x99/0xb5
 __warn+0x19e/0x1d0
 warn_slowpath_null+0x1d/0x20
 virtio_gpu_get_vbuf+0x415/0x6a0
 ? lock_pin_lock+0x4a0/0x4a0
 ? virtio_gpu_cmd_capset_cb+0x460/0x460
 ? debug_check_no_locks_freed+0x350/0x350
 virtio_gpu_cmd_resource_flush+0x8d/0x2d0
 ? virtio_gpu_cmd_set_scanout+0x310/0x310
 virtio_gpu_surface_dirty+0x364/0x930
 ? mark_held_locks+0xff/0x290
 ? virtio_gpufb_create+0xab0/0xab0
 ? _raw_spin_unlock_irqrestore+0x53/0x70
 ? trace_hardirqs_on_caller+0x46c/0x6b0
 virtio_gpu_framebuffer_surface_dirty+0x14/0x20
 drm_fb_helper_dirty_work+0x27a/0x400
 ? drm_fb_helper_is_bound+0x300/0x300
 process_one_work+0x834/0x1c90
 ? process_one_work+0x7a5/0x1c90
 ? pwq_dec_nr_in_flight+0x3a0/0x3a0
 ? worker_thread+0x1b2/0x1540
 worker_thread+0x650/0x1540
 ? process_one_work+0x1c90/0x1c90
 ? process_one_work+0x1c90/0x1c90
 kthread+0x206/0x310
 ? kthread_create_on_node+0xa0/0xa0
 ? trace_hardirqs_on+0xd/0x10
 ? kthread_create_on_node+0xa0/0xa0
 ? kthread_create_on_node+0xa0/0xa0
 ret_from_fork+0x2a/0x40
---[ end trace c723c98d382423f4 ]---
BUG: unable to handle kernel paging request at fc00
IP: check_memory_region+0x7f/0x1a0
PGD 0

Oops:  [#1] PREEMPT SMP KASAN
Modules linked in:
CPU: 2 PID: 5019 Comm: kworker/2:3 Tainted: GW
4.9.0-rc2-next-20161028+ #33
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Workqueue: events drm_fb_helper_dirty_work
task: 8800455f4980 task.stack: 88001fd78000
RIP: 0010:check_memory_region+0x7f/0x1a0
RSP: 0018:88001fd7f938 EFLAGS: 00010282
RAX: fc00 RBX: dc01 RCX: 8260afb3
RDX: 0001 RSI: 0030 RDI: fff4
RBP: 88001fd7f948 R08: fc01 R09: dc04
R10: 0023 R11: dc05 R12: 0030
R13:  R14: 0050 R15: 0001
FS:  () GS:88007dd0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: fc00 CR3: 773a CR4: 06e0
Call Trace:
Code: 83 fb 10 7f 3f 4d 85 db 74 34 48 bb 01 00 00 00 00 fc ff df 49 01
c3 49 01 d8 80 38 00 75 13 4d 39 c3 4c 89 c0 74 17 49 83 c0 01 <41> 80
78 ff 00 74 ed 49 89 c0 4d 85 c0 0f 85 8f 00 00 00 5b 41
RIP: check_memory_region+0x7f/0x1a0 RSP: 88001fd7f938
CR2: fc00

thanks,
-- 
js
suse labs


BUG: 'list_empty(&vgdev->free_vbufs)' is true!

2016-11-09 Thread Gerd Hoffmann
On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
> > Hi,
> > 
> > I can relatively easily reproduce this bug:

How?

> > BUG: 'list_empty(&vgdev->free_vbufs)' is true!

> The following might be helpful for debugging - if kernel still will
> not stop panicing, we are looking at some kind
> of memory corruption.

Looking carefully through the code I think it isn't impossible to
trigger this, but you need for that:

  (1) command queue full (quite possible),
  (2) cursor queue full too (unlikely), and
  (3) multiple threads trying to submit commands and waiting for free
  space in the command queue (possible with virgl enabled).

Do things improve if you allocate some extra bufs?

 int virtio_gpu_alloc_vbufs(struct virtio_gpu_device *vgdev)
 {
struct virtio_gpu_vbuffer *vbuf;
-   int i, size, count = 0;
+   int i, size, count = 16;
void *ptr;

INIT_LIST_HEAD(&vgdev->free_vbufs);

Memory corruption sounds plausible too.

Redirect console to ttyS0 for trouble-shooting, trying to dump the oops
to the display device which triggered the oops in the first place isn't
going to work very well ...

cheers,
  Gerd



BUG: 'list_empty(&vgdev->free_vbufs)' is true!

2016-11-08 Thread Michael S. Tsirkin
On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
> Hi,
> 
> I can relatively easily reproduce this bug:
> BUG: 'list_empty(&vgdev->free_vbufs)' is true!
> [ cut here ]
> kernel BUG at /home/latest/linux/drivers/gpu/drm/virtio/virtgpu_vq.c:130!
> invalid opcode:  [#1] PREEMPT SMP KASAN
> Modules linked in:
> CPU: 1 PID: 355 Comm: kworker/1:2 Not tainted 4.9.0-rc2-next-20161028+ #32
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
> Workqueue: events drm_fb_helper_dirty_work
> task: 88007b124980 task.stack: 88007b8a
> RIP: 0010:virtio_gpu_get_vbuf+0x32e/0x630
> RSP: 0018:88007b8a78c0 EFLAGS: 00010286
> RAX: 002e RBX: 11000f714f1d RCX: 
> RDX: 002e RSI: 0001 RDI: ed000f714f0e
> RBP: 88007b8a7970 R08: 0001 R09: 
> R10: 0002 R11: 0001 R12: 0030
> R13: 88007caeaba8 R14: 0018 R15: 88007cae
> FS:  () GS:88007dc8() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 00601028 CR3: 7740d000 CR4: 06e0
> Call Trace:
> Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 bb 01 00 00 4c 89 69 e8
> eb 9e 48 c7 c6 e0 d2 d1 83 48 c7 c7 20 d3 d1 83 e8 6c fb 04 ff <0f> 0b
> 48 c7 c7 a0 fb b0 85 e8 09 95 86 ff 48 c7 c6 c0 d3 d1 83
> RIP: virtio_gpu_get_vbuf+0x32e/0x630 RSP: 88007b8a78c0
> 
> 
> There is no stacktrace, as the kernel starts panicing all over the place
> during its generation. Any ideas?
> 
> thanks,

CC maintainers.

The following might be helpful for debugging - if kernel still will
not stop panicing, we are looking at some kind
of memory corruption.


diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 5a0f8a7..d5e1e72 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -127,7 +127,11 @@ virtio_gpu_get_vbuf(struct virtio_gpu_device *vgdev,
struct virtio_gpu_vbuffer *vbuf;

spin_lock(&vgdev->free_vbufs_lock);
-   BUG_ON(list_empty(&vgdev->free_vbufs));
+   WARN_ON(list_empty(&vgdev->free_vbufs));
+   if (list_empty(&vgdev->free_vbufs)) {
+   spin_unlock(&vgdev->free_vbufs_lock);
+   return ERR_PTR(-EINVAL);
+   }
vbuf = list_first_entry(&vgdev->free_vbufs,
struct virtio_gpu_vbuffer, list);
list_del(&vbuf->list);