Re: [Qemu-devel] BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-10-26 Thread Jean-Philippe Menil

On 06/27/2017 04:13 AM, Jason Wang wrote:



On 2017年06月26日 15:35, Jean-Philippe Menil wrote:

On 06/26/2017 04:50 AM, Jason Wang wrote:



On 2017年06月24日 06:32, Cong Wang wrote:
On Fri, Jun 23, 2017 at 1:43 AM, Jason Wang  
wrote:


On 2017年06月23日 02:53, Michael S. Tsirkin wrote:

On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:

Hi Michael,

from what i see, the race appear when we hit virtnet_reset in
virtnet_xdp_set.
virtnet_reset
_remove_vq_common
  virtnet_del_vqs
virtnet_free_queues
  kfree(vi->sq)
when the xdp program (with two instances of the program to 
trigger it

faster)
is added or removed.

It's easily repeatable, with 2 cpus and 4 queues on the qemu command
line,
running the xdp_ttl tool from Jesper.

For now, i'm able to continue my qualification, testing if xdp_qp 
is not

null,
but do not seem to be a sustainable trick.
if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)

Maybe it will be more clear to you with theses informations.

Best regards.

Jean-Philippe


I'm pretty clear about the issue here, I was trying to figure out 
a fix.

Jason, any thoughts?



Hi Jean:

Does the following fix this issue? (I can't reproduce it locally 
through

xdp_ttl)

It is tricky here.

 From my understanding of the code base, the tx_lock is not sufficient
here, because in virtnet_del_vqs() all vqs are deleted and one vp
maps to one txq.

I am afraid you have to add a spinlock somewhere to serialized
free_old_xmit_skbs() vs. vring_del_virtqueue(). As you can see
they are in different layers, so it is hard to figure out where to add
it...

Also, make sure we don't sleep inside the spinlock, I see a
synchronize_net().


Looks like I miss something. I thought free_old_xmit_skbs() were 
serialized in this case since we disable all tx queues after 
netif_tx_unlock_bh()?


Jean:

I thought this could be easily reproduced by e.g produce some traffic 
and in the same time try to attach an xdp program. But looks not. How 
do you trigger this? What's your qemu command line for this?


Thanks


Hi Jason,

this is how i trigger the bug:
- on the guest, tcpdump on on the interface
- on the guest, run iperf against the host
- on the guest, cat /sys/kernel/debug/tracing/trace_pipe
- on the guest, run one or two instances of xdp_ttl compiled with 
DEBUG uncommented, that i start stop, until i trigger the bug.


qemu command line is as follow:

qemu-system-x86_64 -name ubuntu --enable-kvm -machine pc,accel=kvm 
-smp 2 -drive file=/dev/LocalDisk/ubuntu,if=virtio,format=raw -m 2048 
-rtc base=localtime,clock=host -usbdevice tablet --balloon virtio 
-netdev 
tap,id=ubuntu-0,ifname=ubuntu-0,script=/home/jenfi/WORK/jp/qemu/if-up,downscript=/home/jenfi/WORK/jp/qemu/if-down,vhost=on,queues=4 
-device 
virtio-net-pci,netdev=ubuntu-0,mac=de:ad:be:ef:01:03,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=2 
-vnc 127.0.0.1:3 -nographic -serial 
file:/home/jenfi/WORK/jp/qemu/ubuntu.out -monitor 
unix:/home/jenfi/WORK/jp/qemu/ubuntu.sock,server,nowait


Notice, the smp 2, queues to 4 and vectors to 2.
Seem that if fogot to mention that in the beginning of this thread, 
sorry for that.


Best regards.

Jean-Philippe



Thanks Jean, I manage to reproduce the issue.

I thought netif_tx_unlock_bh() will do tx lock but looks not, that's why 
previous patch doesn't work.


Could you please this this patch? (At least it can't trigger the warning 
after more than 20 times of xdp start/stop).


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1f8c15c..a18f859 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1802,6 +1802,7 @@ static void virtnet_freeze_down(struct 
virtio_device *vdev)

 flush_work(&vi->config_work);

 netif_device_detach(vi->dev);
+   netif_tx_disable(vi->dev);
 cancel_delayed_work_sync(&vi->refill);

 if (netif_running(vi->dev)) {




Hi Jason,

Seem to do the trick !
with your patch, i'm unable to repeat the problem anymore (running more 
than 2h without any issue).


Best regards.

Jean-Philippe
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-10-26 Thread Jean-Philippe Menil

On 06/26/2017 04:50 AM, Jason Wang wrote:



On 2017年06月24日 06:32, Cong Wang wrote:

On Fri, Jun 23, 2017 at 1:43 AM, Jason Wang  wrote:


On 2017年06月23日 02:53, Michael S. Tsirkin wrote:

On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:

Hi Michael,

from what i see, the race appear when we hit virtnet_reset in
virtnet_xdp_set.
virtnet_reset
_remove_vq_common
  virtnet_del_vqs
virtnet_free_queues
  kfree(vi->sq)
when the xdp program (with two instances of the program to trigger it
faster)
is added or removed.

It's easily repeatable, with 2 cpus and 4 queues on the qemu command
line,
running the xdp_ttl tool from Jesper.

For now, i'm able to continue my qualification, testing if xdp_qp 
is not

null,
but do not seem to be a sustainable trick.
if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)

Maybe it will be more clear to you with theses informations.

Best regards.

Jean-Philippe


I'm pretty clear about the issue here, I was trying to figure out a 
fix.

Jason, any thoughts?



Hi Jean:

Does the following fix this issue? (I can't reproduce it locally through
xdp_ttl)

It is tricky here.

 From my understanding of the code base, the tx_lock is not sufficient
here, because in virtnet_del_vqs() all vqs are deleted and one vp
maps to one txq.

I am afraid you have to add a spinlock somewhere to serialized
free_old_xmit_skbs() vs. vring_del_virtqueue(). As you can see
they are in different layers, so it is hard to figure out where to add
it...

Also, make sure we don't sleep inside the spinlock, I see a
synchronize_net().


Looks like I miss something. I thought free_old_xmit_skbs() were 
serialized in this case since we disable all tx queues after 
netif_tx_unlock_bh()?


Jean:

I thought this could be easily reproduced by e.g produce some traffic 
and in the same time try to attach an xdp program. But looks not. How do 
you trigger this? What's your qemu command line for this?


Thanks


Hi Jason,

this is how i trigger the bug:
- on the guest, tcpdump on on the interface
- on the guest, run iperf against the host
- on the guest, cat /sys/kernel/debug/tracing/trace_pipe
- on the guest, run one or two instances of xdp_ttl compiled with DEBUG 
uncommented, that i start stop, until i trigger the bug.


qemu command line is as follow:

qemu-system-x86_64 -name ubuntu --enable-kvm -machine pc,accel=kvm -smp 
2 -drive file=/dev/LocalDisk/ubuntu,if=virtio,format=raw -m 2048 -rtc 
base=localtime,clock=host -usbdevice tablet --balloon virtio -netdev 
tap,id=ubuntu-0,ifname=ubuntu-0,script=/home/jenfi/WORK/jp/qemu/if-up,downscript=/home/jenfi/WORK/jp/qemu/if-down,vhost=on,queues=4 
-device 
virtio-net-pci,netdev=ubuntu-0,mac=de:ad:be:ef:01:03,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=2 
-vnc 127.0.0.1:3 -nographic -serial 
file:/home/jenfi/WORK/jp/qemu/ubuntu.out -monitor 
unix:/home/jenfi/WORK/jp/qemu/ubuntu.sock,server,nowait


Notice, the smp 2, queues to 4 and vectors to 2.
Seem that if fogot to mention that in the beginning of this thread, 
sorry for that.


Best regards.

Jean-Philippe
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-10-26 Thread Jean-Philippe Menil

On 06/23/2017 10:43 AM, Jason Wang wrote:



On 2017年06月23日 02:53, Michael S. Tsirkin wrote:

On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:

2017-06-06 1:52 GMT+02:00 Michael S. Tsirkin :

 On Mon, Jun 05, 2017 at 05:08:25AM +0300, Michael S. Tsirkin wrote:
 > On Mon, Jun 05, 2017 at 12:48:53AM +0200, Jean-Philippe Menil 
wrote:

 > > Hi,
 > >
 > > while playing with xdp and ebpf, i'm hitting the following:
 > >
 > > [  309.993136]
 > > 
==

 > > [  309.994735] BUG: KASAN: use-after-free in
 > > free_old_xmit_skbs.isra.29+0x2b7/0x2e0 [virtio_net]
 > > [  309.998396] Read of size 8 at addr 88006aa64220 by 
task sshd/323

 > > [  310.000650]
 > > [  310.002305] CPU: 1 PID: 323 Comm: sshd Not tainted 
4.12.0-rc3+ #2
 > > [  310.004018] Hardware name: QEMU Standard PC (i440FX + 
PIIX, 1996),

 BIOS
 > > 1.10.2-20170228_101828-anatol 04/01/2014

...


 >
 > Since commit 680557cf79f82623e2c4fd42733077d60a843513
 > virtio_net: rework mergeable buffer handling
 >
 > we no longer must do the resets, we now have enough space
 > to store a bit saying whether a buffer is xdp one or not.
 >
 > And that's probably a cleaner way to fix these issues than
 > try to find and fix the race condition.
 >
 > John?
 >
 > --
 > MST


 I think I see the source of the race. virtio net calls
 netif_device_detach and assumes no packets will be sent after
 this point. However, all it does is stop all queues so
 no new packets will be transmitted.

 Try locking with HARD_TX_LOCK?

 --
 MST


Hi Michael,

from what i see, the race appear when we hit virtnet_reset in 
virtnet_xdp_set.

virtnet_reset
   _remove_vq_common
 virtnet_del_vqs
   virtnet_free_queues
 kfree(vi->sq)
when the xdp program (with two instances of the program to trigger it 
faster)

is added or removed.

It's easily repeatable, with 2 cpus and 4 queues on the qemu command 
line,

running the xdp_ttl tool from Jesper.

For now, i'm able to continue my qualification, testing if xdp_qp is 
not null,

but do not seem to be a sustainable trick.
if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)

Maybe it will be more clear to you with theses informations.

Best regards.

Jean-Philippe


I'm pretty clear about the issue here, I was trying to figure out a fix.
Jason, any thoughts?




Hi Jean:

Does the following fix this issue? (I can't reproduce it locally through 
xdp_ttl)


Thanks

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1f8c15c..3e65c3f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1801,7 +1801,9 @@ static void virtnet_freeze_down(struct 
virtio_device *vdev)

 /* Make sure no work handler is accessing the device */
 flush_work(&vi->config_work);

+   netif_tx_lock_bh(vi->dev);
 netif_device_detach(vi->dev);
+   netif_tx_unlock_bh(vi->dev);
 cancel_delayed_work_sync(&vi->refill);



Hi Jason,

unfortunately, same crash on same place, the lock did not help.

[  574.522886] 
==
[  574.527393] BUG: KASAN: use-after-free in 
free_old_xmit_skbs.isra.28+0x29b/0x2b0 [virtio_net]

[  574.531934] Read of size 8 at addr 88005d220020 by task iperf/2252
[  574.536296]
[  574.539729] CPU: 1 PID: 2252 Comm: iperf Not tainted 4.12.0-rc5+ #5
[  574.543916] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.10.2-20170228_101828-anatol 04/01/2014

[  574.552046] Call Trace:
[  574.555648]  dump_stack+0xb3/0x10b
[  574.559471]  ? free_old_xmit_skbs.isra.28+0x29b/0x2b0 [virtio_net]
[  574.563578]  print_address_description+0x6a/0x280
[  574.567253]  ? free_old_xmit_skbs.isra.28+0x29b/0x2b0 [virtio_net]
[  574.571223]  kasan_report+0x22b/0x340
[  574.574698]  __asan_report_load8_noabort+0x14/0x20
[  574.578490]  free_old_xmit_skbs.isra.28+0x29b/0x2b0 [virtio_net]
[  574.582586]  ? dev_queue_xmit_nit+0x5fb/0x850
[  574.586348]  ? virtnet_del_vqs+0xf0/0xf0 [virtio_net]
[  574.590153]  ? __skb_clone+0x24a/0x7d0
[  574.593835]  start_xmit+0x15a/0x1620 [virtio_net]
[  574.597939]  dev_hard_start_xmit+0x17f/0x7e0
[  574.601832]  sch_direct_xmit+0x2a8/0x5d0
[  574.605665]  ? dev_deactivate_queue.constprop.31+0x150/0x150
[  574.609827]  __dev_queue_xmit+0x1124/0x18b0
[  574.613595]  ? selinux_ip_postroute+0x4b2/0xa90
[  574.617928]  ? netdev_pick_tx+0x2d0/0x2d0
[  574.621852]  ? mark_held_locks+0xc8/0x120
[  574.625673]  ? ip_finish_output+0x626/0x9b0
[  574.631679]  ? ip_finish_output2+0xb44/0x1160
[  574.637642]  dev_queue_xmit+0x17/0x20
[  574.641693]  ip_finish_output2+0xcd1/0x1160
[  574.645621]  ? do_add_counters+0x480/0x480
[  574.649554]  ? do_add_counters+0x403/0x480
[  574.653209]  ? ip_copy_metadata+0x630/0x630
[  574.657066]  ip_finish_output+0x626/0x9b0
[  574.

Re: BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-10-26 Thread jean-philippe menil
2017-06-06 1:52 GMT+02:00 Michael S. Tsirkin :

> On Mon, Jun 05, 2017 at 05:08:25AM +0300, Michael S. Tsirkin wrote:
> > On Mon, Jun 05, 2017 at 12:48:53AM +0200, Jean-Philippe Menil wrote:
> > > Hi,
> > >
> > > while playing with xdp and ebpf, i'm hitting the following:
> > >
> > > [  309.993136]
> > > ==
> > > [  309.994735] BUG: KASAN: use-after-free in
> > > free_old_xmit_skbs.isra.29+0x2b7/0x2e0 [virtio_net]
> > > [  309.998396] Read of size 8 at addr 88006aa64220 by task sshd/323
> > > [  310.000650]
> > > [  310.002305] CPU: 1 PID: 323 Comm: sshd Not tainted 4.12.0-rc3+ #2
> > > [  310.004018] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS
> > > 1.10.2-20170228_101828-anatol 04/01/2014
> > > [  310.006495] Call Trace:
> > > [  310.007610]  dump_stack+0xb8/0x14c
> > > [  310.008748]  ? _atomic_dec_and_lock+0x174/0x174
> > > [  310.009998]  ? pm_qos_get_value.part.7+0x6/0x6
> > > [  310.011203]  print_address_description+0x6f/0x280
> > > [  310.012416]  kasan_report+0x27a/0x370
> > > [  310.013573]  ? free_old_xmit_skbs.isra.29+0x2b7/0x2e0 [virtio_net]
> > > [  310.014900]  __asan_report_load8_noabort+0x19/0x20
> > > [  310.016136]  free_old_xmit_skbs.isra.29+0x2b7/0x2e0 [virtio_net]
> > > [  310.017467]  ? virtnet_del_vqs+0xe0/0xe0 [virtio_net]
> > > [  310.018759]  ? packet_rcv+0x20d0/0x20d0
> > > [  310.019950]  ? dev_queue_xmit_nit+0x5cd/0xaf0
> > > [  310.021168]  start_xmit+0x1b4/0x1b10 [virtio_net]
> > > [  310.022413]  ? default_device_exit+0x2d0/0x2d0
> > > [  310.023634]  ? virtnet_remove+0xf0/0xf0 [virtio_net]
> > > [  310.024874]  ? update_load_avg+0x1281/0x29f0
> > > [  310.026059]  dev_hard_start_xmit+0x1ea/0x7f0
> > > [  310.027247]  ? validate_xmit_skb_list+0x100/0x100
> > > [  310.028470]  ? validate_xmit_skb+0x7f/0xc10
> > > [  310.029731]  ? netif_skb_features+0x920/0x920
> > > [  310.033469]  ? __skb_tx_hash+0x2f0/0x2f0
> > > [  310.035615]  ? validate_xmit_skb_list+0xa3/0x100
> > > [  310.037782]  sch_direct_xmit+0x2eb/0x7a0
> > > [  310.039842]  ? dev_deactivate_queue.constprop.29+0x230/0x230
> > > [  310.041980]  ? netdev_pick_tx+0x212/0x2b0
> > > [  310.043868]  __dev_queue_xmit+0x12fa/0x20b0
> > > [  310.045564]  ? netdev_pick_tx+0x2b0/0x2b0
> > > [  310.047210]  ? __account_cfs_rq_runtime+0x630/0x630
> > > [  310.048301]  ? update_stack_state+0x402/0x780
> > > [  310.049307]  ? account_entity_enqueue+0x730/0x730
> > > [  310.050322]  ? __rb_erase_color+0x27d0/0x27d0
> > > [  310.051286]  ? update_curr_fair+0x70/0x70
> > > [  310.052206]  ? enqueue_entity+0x2450/0x2450
> > > [  310.053124]  ? entry_SYSCALL64_slow_path+0x25/0x25
> > > [  310.054082]  ? dequeue_entity+0x27a/0x1520
> > > [  310.054967]  ? bpf_prog_alloc+0x320/0x320
> > > [  310.055822]  ? yield_to_task_fair+0x110/0x110
> > > [  310.056708]  ? set_next_entity+0x2f2/0xa90
> > > [  310.057574]  ? dequeue_task_fair+0xc09/0x2ec0
> > > [  310.058457]  dev_queue_xmit+0x10/0x20
> > > [  310.059298]  ip_finish_output2+0xacf/0x12a0
> > > [  310.060160]  ? dequeue_entity+0x1520/0x1520
> > > [  310.063410]  ? ip_fragment.constprop.47+0x220/0x220
> > > [  310.065078]  ? ring_buffer_set_clock+0x50/0x50
> > > [  310.066677]  ? __switch_to+0x685/0xda0
> > > [  310.068166]  ? load_balance+0x38f0/0x38f0
> > > [  310.069544]  ? compat_start_thread+0x80/0x80
> > > [  310.070989]  ? trace_find_cmdline+0x60/0x60
> > > [  310.072402]  ? rt_cpu_seq_show+0x2d0/0x2d0
> > > [  310.073579]  ip_finish_output+0x407/0x880
> > > [  310.074441]  ? ip_finish_output+0x407/0x880
> > > [  310.075255]  ? update_stack_state+0x402/0x780
> > > [  310.076076]  ip_output+0x1c0/0x640
> > > [  310.076843]  ? ip_mc_output+0x1350/0x1350
> > > [  310.077642]  ? __sk_dst_check+0x164/0x370
> > > [  310.078441]  ? complete_formation.isra.53+0xa30/0xa30
> > > [  310.079313]  ? __read_once_size_nocheck.constprop.7+0x20/0x20
> > > [  310.080265]  ? sock_prot_inuse_add+0xa0/0xa0
> > > [  310.081097]  ? memcpy+0x45/0x50
> > > [  310.081850]  ? __copy_skb_header+0x1fa/0x280
> > > [  310.082676]  ip_local_out+0x70/0x90
> > > [  310.083448]  ip_queue_xmit+0x8a1/0x22a0
> > > [  310.084236]  ? ip_build_and_send_pkt+0xe80/0xe80
> > > [  310.085079]  ? tcp_v4_md5_lookup+0x13/0x20
> > > [  310.085884]  tcp_transmit_skb+0x187a/0x3e00
> > > [  310.086696]  ? __tcp_select_window+0xaf0/0xaf0
> > > [  310.087524]  ? sock_sendmsg+0xba/0xf0
> > > [  310.088298]  ? __vfs_write+0x4e0/0x960
> > > [  310.089074]  ? vfs_write+0x155/0x4b0
> > > [  310.089838]  ? SyS_write+0xf7/0x240
> > > [  310.090593]  ? do_syscall_64+0x235/0x5b0
> > > [  310.091372]  ? entry_SYSCALL64_slow_path+0x25/0x25
> > > [  310.094690]  ? sock_sendmsg+0xba/0xf0
> > > [  310.096133]  ? do_syscall_64+0x235/0x5b0
> > > [  310.097593]  ? entry_SYSCALL64_slow_path+0x25/0x25
> > > [  310.099157]  ? tcp_init_tso_segs+0x1e0/0x1e0
> > > [  310.100539]  ? radix_tree_lookup+0xd/0x10
> > > [  310.101894]  ? get_work_pool+0xcd/0x150
> > > [  3

Re: [Qemu-devel] BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-06-26 Thread Jason Wang



On 2017年06月26日 15:35, Jean-Philippe Menil wrote:

On 06/26/2017 04:50 AM, Jason Wang wrote:



On 2017年06月24日 06:32, Cong Wang wrote:
On Fri, Jun 23, 2017 at 1:43 AM, Jason Wang  
wrote:


On 2017年06月23日 02:53, Michael S. Tsirkin wrote:

On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:

Hi Michael,

from what i see, the race appear when we hit virtnet_reset in
virtnet_xdp_set.
virtnet_reset
_remove_vq_common
  virtnet_del_vqs
virtnet_free_queues
  kfree(vi->sq)
when the xdp program (with two instances of the program to 
trigger it

faster)
is added or removed.

It's easily repeatable, with 2 cpus and 4 queues on the qemu command
line,
running the xdp_ttl tool from Jesper.

For now, i'm able to continue my qualification, testing if xdp_qp 
is not

null,
but do not seem to be a sustainable trick.
if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)

Maybe it will be more clear to you with theses informations.

Best regards.

Jean-Philippe


I'm pretty clear about the issue here, I was trying to figure out 
a fix.

Jason, any thoughts?



Hi Jean:

Does the following fix this issue? (I can't reproduce it locally 
through

xdp_ttl)

It is tricky here.

 From my understanding of the code base, the tx_lock is not sufficient
here, because in virtnet_del_vqs() all vqs are deleted and one vp
maps to one txq.

I am afraid you have to add a spinlock somewhere to serialized
free_old_xmit_skbs() vs. vring_del_virtqueue(). As you can see
they are in different layers, so it is hard to figure out where to add
it...

Also, make sure we don't sleep inside the spinlock, I see a
synchronize_net().


Looks like I miss something. I thought free_old_xmit_skbs() were 
serialized in this case since we disable all tx queues after 
netif_tx_unlock_bh()?


Jean:

I thought this could be easily reproduced by e.g produce some traffic 
and in the same time try to attach an xdp program. But looks not. How 
do you trigger this? What's your qemu command line for this?


Thanks


Hi Jason,

this is how i trigger the bug:
- on the guest, tcpdump on on the interface
- on the guest, run iperf against the host
- on the guest, cat /sys/kernel/debug/tracing/trace_pipe
- on the guest, run one or two instances of xdp_ttl compiled with 
DEBUG uncommented, that i start stop, until i trigger the bug.


qemu command line is as follow:

qemu-system-x86_64 -name ubuntu --enable-kvm -machine pc,accel=kvm 
-smp 2 -drive file=/dev/LocalDisk/ubuntu,if=virtio,format=raw -m 2048 
-rtc base=localtime,clock=host -usbdevice tablet --balloon virtio 
-netdev 
tap,id=ubuntu-0,ifname=ubuntu-0,script=/home/jenfi/WORK/jp/qemu/if-up,downscript=/home/jenfi/WORK/jp/qemu/if-down,vhost=on,queues=4 
-device 
virtio-net-pci,netdev=ubuntu-0,mac=de:ad:be:ef:01:03,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=2 
-vnc 127.0.0.1:3 -nographic -serial 
file:/home/jenfi/WORK/jp/qemu/ubuntu.out -monitor 
unix:/home/jenfi/WORK/jp/qemu/ubuntu.sock,server,nowait


Notice, the smp 2, queues to 4 and vectors to 2.
Seem that if fogot to mention that in the beginning of this thread, 
sorry for that.


Best regards.

Jean-Philippe



Thanks Jean, I manage to reproduce the issue.

I thought netif_tx_unlock_bh() will do tx lock but looks not, that's why 
previous patch doesn't work.


Could you please this this patch? (At least it can't trigger the warning 
after more than 20 times of xdp start/stop).


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1f8c15c..a18f859 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1802,6 +1802,7 @@ static void virtnet_freeze_down(struct 
virtio_device *vdev)

flush_work(&vi->config_work);

netif_device_detach(vi->dev);
+   netif_tx_disable(vi->dev);
cancel_delayed_work_sync(&vi->refill);

if (netif_running(vi->dev)) {


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-06-25 Thread Jason Wang



On 2017年06月24日 06:32, Cong Wang wrote:

On Fri, Jun 23, 2017 at 1:43 AM, Jason Wang  wrote:


On 2017年06月23日 02:53, Michael S. Tsirkin wrote:

On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:

Hi Michael,

from what i see, the race appear when we hit virtnet_reset in
virtnet_xdp_set.
virtnet_reset
_remove_vq_common
  virtnet_del_vqs
virtnet_free_queues
  kfree(vi->sq)
when the xdp program (with two instances of the program to trigger it
faster)
is added or removed.

It's easily repeatable, with 2 cpus and 4 queues on the qemu command
line,
running the xdp_ttl tool from Jesper.

For now, i'm able to continue my qualification, testing if xdp_qp is not
null,
but do not seem to be a sustainable trick.
if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)

Maybe it will be more clear to you with theses informations.

Best regards.

Jean-Philippe


I'm pretty clear about the issue here, I was trying to figure out a fix.
Jason, any thoughts?



Hi Jean:

Does the following fix this issue? (I can't reproduce it locally through
xdp_ttl)

It is tricky here.

 From my understanding of the code base, the tx_lock is not sufficient
here, because in virtnet_del_vqs() all vqs are deleted and one vp
maps to one txq.

I am afraid you have to add a spinlock somewhere to serialized
free_old_xmit_skbs() vs. vring_del_virtqueue(). As you can see
they are in different layers, so it is hard to figure out where to add
it...

Also, make sure we don't sleep inside the spinlock, I see a
synchronize_net().


Looks like I miss something. I thought free_old_xmit_skbs() were 
serialized in this case since we disable all tx queues after 
netif_tx_unlock_bh()?


Jean:

I thought this could be easily reproduced by e.g produce some traffic 
and in the same time try to attach an xdp program. But looks not. How do 
you trigger this? What's your qemu command line for this?


Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-06-23 Thread Cong Wang
On Fri, Jun 23, 2017 at 1:43 AM, Jason Wang  wrote:
>
>
> On 2017年06月23日 02:53, Michael S. Tsirkin wrote:
>>
>> On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:
>>>
>>> Hi Michael,
>>>
>>> from what i see, the race appear when we hit virtnet_reset in
>>> virtnet_xdp_set.
>>> virtnet_reset
>>>_remove_vq_common
>>>  virtnet_del_vqs
>>>virtnet_free_queues
>>>  kfree(vi->sq)
>>> when the xdp program (with two instances of the program to trigger it
>>> faster)
>>> is added or removed.
>>>
>>> It's easily repeatable, with 2 cpus and 4 queues on the qemu command
>>> line,
>>> running the xdp_ttl tool from Jesper.
>>>
>>> For now, i'm able to continue my qualification, testing if xdp_qp is not
>>> null,
>>> but do not seem to be a sustainable trick.
>>> if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)
>>>
>>> Maybe it will be more clear to you with theses informations.
>>>
>>> Best regards.
>>>
>>> Jean-Philippe
>>
>>
>> I'm pretty clear about the issue here, I was trying to figure out a fix.
>> Jason, any thoughts?
>>
>>
>
> Hi Jean:
>
> Does the following fix this issue? (I can't reproduce it locally through
> xdp_ttl)

It is tricky here.

From my understanding of the code base, the tx_lock is not sufficient
here, because in virtnet_del_vqs() all vqs are deleted and one vp
maps to one txq.

I am afraid you have to add a spinlock somewhere to serialized
free_old_xmit_skbs() vs. vring_del_virtqueue(). As you can see
they are in different layers, so it is hard to figure out where to add
it...

Also, make sure we don't sleep inside the spinlock, I see a
synchronize_net().
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-06-23 Thread Jason Wang



On 2017年06月23日 02:53, Michael S. Tsirkin wrote:

On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:

2017-06-06 1:52 GMT+02:00 Michael S. Tsirkin :

 On Mon, Jun 05, 2017 at 05:08:25AM +0300, Michael S. Tsirkin wrote:
 > On Mon, Jun 05, 2017 at 12:48:53AM +0200, Jean-Philippe Menil wrote:
 > > Hi,
 > >
 > > while playing with xdp and ebpf, i'm hitting the following:
 > >
 > > [  309.993136]
 > > ==
 > > [  309.994735] BUG: KASAN: use-after-free in
 > > free_old_xmit_skbs.isra.29+0x2b7/0x2e0 [virtio_net]
 > > [  309.998396] Read of size 8 at addr 88006aa64220 by task sshd/323
 > > [  310.000650]
 > > [  310.002305] CPU: 1 PID: 323 Comm: sshd Not tainted 4.12.0-rc3+ #2
 > > [  310.004018] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
 BIOS
 > > 1.10.2-20170228_101828-anatol 04/01/2014

...


 >
 > Since commit 680557cf79f82623e2c4fd42733077d60a843513
 > virtio_net: rework mergeable buffer handling
 >
 > we no longer must do the resets, we now have enough space
 > to store a bit saying whether a buffer is xdp one or not.
 >
 > And that's probably a cleaner way to fix these issues than
 > try to find and fix the race condition.
 >
 > John?
 >
 > --
 > MST


 I think I see the source of the race. virtio net calls
 netif_device_detach and assumes no packets will be sent after
 this point. However, all it does is stop all queues so
 no new packets will be transmitted.

 Try locking with HARD_TX_LOCK?



 --
 MST


Hi Michael,

from what i see, the race appear when we hit virtnet_reset in virtnet_xdp_set.
virtnet_reset
   _remove_vq_common
 virtnet_del_vqs
   virtnet_free_queues
 kfree(vi->sq)
when the xdp program (with two instances of the program to trigger it faster)
is added or removed.

It's easily repeatable, with 2 cpus and 4 queues on the qemu command line,
running the xdp_ttl tool from Jesper.

For now, i'm able to continue my qualification, testing if xdp_qp is not null,
but do not seem to be a sustainable trick.
if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)

Maybe it will be more clear to you with theses informations.

Best regards.

Jean-Philippe


I'm pretty clear about the issue here, I was trying to figure out a fix.
Jason, any thoughts?




Hi Jean:

Does the following fix this issue? (I can't reproduce it locally through 
xdp_ttl)


Thanks

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1f8c15c..3e65c3f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1801,7 +1801,9 @@ static void virtnet_freeze_down(struct 
virtio_device *vdev)

/* Make sure no work handler is accessing the device */
flush_work(&vi->config_work);

+   netif_tx_lock_bh(vi->dev);
netif_device_detach(vi->dev);
+   netif_tx_unlock_bh(vi->dev);
cancel_delayed_work_sync(&vi->refill);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-06-22 Thread Michael S. Tsirkin
On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:
> 2017-06-06 1:52 GMT+02:00 Michael S. Tsirkin :
> 
> On Mon, Jun 05, 2017 at 05:08:25AM +0300, Michael S. Tsirkin wrote:
> > On Mon, Jun 05, 2017 at 12:48:53AM +0200, Jean-Philippe Menil wrote:
> > > Hi,
> > >
> > > while playing with xdp and ebpf, i'm hitting the following:
> > >
> > > [  309.993136]
> > > ==
> > > [  309.994735] BUG: KASAN: use-after-free in
> > > free_old_xmit_skbs.isra.29+0x2b7/0x2e0 [virtio_net]
> > > [  309.998396] Read of size 8 at addr 88006aa64220 by task 
> sshd/323
> > > [  310.000650]
> > > [  310.002305] CPU: 1 PID: 323 Comm: sshd Not tainted 4.12.0-rc3+ #2
> > > [  310.004018] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS
> > > 1.10.2-20170228_101828-anatol 04/01/2014

...

> >
> > Since commit 680557cf79f82623e2c4fd42733077d60a843513
> >     virtio_net: rework mergeable buffer handling
> >
> > we no longer must do the resets, we now have enough space
> > to store a bit saying whether a buffer is xdp one or not.
> >
> > And that's probably a cleaner way to fix these issues than
> > try to find and fix the race condition.
> >
> > John?
> >
> > --
> > MST
> 
> 
> I think I see the source of the race. virtio net calls
> netif_device_detach and assumes no packets will be sent after
> this point. However, all it does is stop all queues so
> no new packets will be transmitted.
> 
> Try locking with HARD_TX_LOCK?
>
> 
> --
> MST
> 
> 
> Hi Michael,
> 
> from what i see, the race appear when we hit virtnet_reset in virtnet_xdp_set.
> virtnet_reset
>   _remove_vq_common
>     virtnet_del_vqs
>   virtnet_free_queues
>     kfree(vi->sq)
> when the xdp program (with two instances of the program to trigger it faster)
> is added or removed.
> 
> It's easily repeatable, with 2 cpus and 4 queues on the qemu command line,
> running the xdp_ttl tool from Jesper.
> 
> For now, i'm able to continue my qualification, testing if xdp_qp is not null,
> but do not seem to be a sustainable trick.
> if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)
> 
> Maybe it will be more clear to you with theses informations.
> 
> Best regards.
> 
> Jean-Philippe


I'm pretty clear about the issue here, I was trying to figure out a fix.
Jason, any thoughts?


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-06-05 Thread Michael S. Tsirkin
On Mon, Jun 05, 2017 at 05:08:25AM +0300, Michael S. Tsirkin wrote:
> On Mon, Jun 05, 2017 at 12:48:53AM +0200, Jean-Philippe Menil wrote:
> > Hi,
> > 
> > while playing with xdp and ebpf, i'm hitting the following:
> > 
> > [  309.993136]
> > ==
> > [  309.994735] BUG: KASAN: use-after-free in
> > free_old_xmit_skbs.isra.29+0x2b7/0x2e0 [virtio_net]
> > [  309.998396] Read of size 8 at addr 88006aa64220 by task sshd/323
> > [  310.000650]
> > [  310.002305] CPU: 1 PID: 323 Comm: sshd Not tainted 4.12.0-rc3+ #2
> > [  310.004018] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > 1.10.2-20170228_101828-anatol 04/01/2014
> > [  310.006495] Call Trace:
> > [  310.007610]  dump_stack+0xb8/0x14c
> > [  310.008748]  ? _atomic_dec_and_lock+0x174/0x174
> > [  310.009998]  ? pm_qos_get_value.part.7+0x6/0x6
> > [  310.011203]  print_address_description+0x6f/0x280
> > [  310.012416]  kasan_report+0x27a/0x370
> > [  310.013573]  ? free_old_xmit_skbs.isra.29+0x2b7/0x2e0 [virtio_net]
> > [  310.014900]  __asan_report_load8_noabort+0x19/0x20
> > [  310.016136]  free_old_xmit_skbs.isra.29+0x2b7/0x2e0 [virtio_net]
> > [  310.017467]  ? virtnet_del_vqs+0xe0/0xe0 [virtio_net]
> > [  310.018759]  ? packet_rcv+0x20d0/0x20d0
> > [  310.019950]  ? dev_queue_xmit_nit+0x5cd/0xaf0
> > [  310.021168]  start_xmit+0x1b4/0x1b10 [virtio_net]
> > [  310.022413]  ? default_device_exit+0x2d0/0x2d0
> > [  310.023634]  ? virtnet_remove+0xf0/0xf0 [virtio_net]
> > [  310.024874]  ? update_load_avg+0x1281/0x29f0
> > [  310.026059]  dev_hard_start_xmit+0x1ea/0x7f0
> > [  310.027247]  ? validate_xmit_skb_list+0x100/0x100
> > [  310.028470]  ? validate_xmit_skb+0x7f/0xc10
> > [  310.029731]  ? netif_skb_features+0x920/0x920
> > [  310.033469]  ? __skb_tx_hash+0x2f0/0x2f0
> > [  310.035615]  ? validate_xmit_skb_list+0xa3/0x100
> > [  310.037782]  sch_direct_xmit+0x2eb/0x7a0
> > [  310.039842]  ? dev_deactivate_queue.constprop.29+0x230/0x230
> > [  310.041980]  ? netdev_pick_tx+0x212/0x2b0
> > [  310.043868]  __dev_queue_xmit+0x12fa/0x20b0
> > [  310.045564]  ? netdev_pick_tx+0x2b0/0x2b0
> > [  310.047210]  ? __account_cfs_rq_runtime+0x630/0x630
> > [  310.048301]  ? update_stack_state+0x402/0x780
> > [  310.049307]  ? account_entity_enqueue+0x730/0x730
> > [  310.050322]  ? __rb_erase_color+0x27d0/0x27d0
> > [  310.051286]  ? update_curr_fair+0x70/0x70
> > [  310.052206]  ? enqueue_entity+0x2450/0x2450
> > [  310.053124]  ? entry_SYSCALL64_slow_path+0x25/0x25
> > [  310.054082]  ? dequeue_entity+0x27a/0x1520
> > [  310.054967]  ? bpf_prog_alloc+0x320/0x320
> > [  310.055822]  ? yield_to_task_fair+0x110/0x110
> > [  310.056708]  ? set_next_entity+0x2f2/0xa90
> > [  310.057574]  ? dequeue_task_fair+0xc09/0x2ec0
> > [  310.058457]  dev_queue_xmit+0x10/0x20
> > [  310.059298]  ip_finish_output2+0xacf/0x12a0
> > [  310.060160]  ? dequeue_entity+0x1520/0x1520
> > [  310.063410]  ? ip_fragment.constprop.47+0x220/0x220
> > [  310.065078]  ? ring_buffer_set_clock+0x50/0x50
> > [  310.066677]  ? __switch_to+0x685/0xda0
> > [  310.068166]  ? load_balance+0x38f0/0x38f0
> > [  310.069544]  ? compat_start_thread+0x80/0x80
> > [  310.070989]  ? trace_find_cmdline+0x60/0x60
> > [  310.072402]  ? rt_cpu_seq_show+0x2d0/0x2d0
> > [  310.073579]  ip_finish_output+0x407/0x880
> > [  310.074441]  ? ip_finish_output+0x407/0x880
> > [  310.075255]  ? update_stack_state+0x402/0x780
> > [  310.076076]  ip_output+0x1c0/0x640
> > [  310.076843]  ? ip_mc_output+0x1350/0x1350
> > [  310.077642]  ? __sk_dst_check+0x164/0x370
> > [  310.078441]  ? complete_formation.isra.53+0xa30/0xa30
> > [  310.079313]  ? __read_once_size_nocheck.constprop.7+0x20/0x20
> > [  310.080265]  ? sock_prot_inuse_add+0xa0/0xa0
> > [  310.081097]  ? memcpy+0x45/0x50
> > [  310.081850]  ? __copy_skb_header+0x1fa/0x280
> > [  310.082676]  ip_local_out+0x70/0x90
> > [  310.083448]  ip_queue_xmit+0x8a1/0x22a0
> > [  310.084236]  ? ip_build_and_send_pkt+0xe80/0xe80
> > [  310.085079]  ? tcp_v4_md5_lookup+0x13/0x20
> > [  310.085884]  tcp_transmit_skb+0x187a/0x3e00
> > [  310.086696]  ? __tcp_select_window+0xaf0/0xaf0
> > [  310.087524]  ? sock_sendmsg+0xba/0xf0
> > [  310.088298]  ? __vfs_write+0x4e0/0x960
> > [  310.089074]  ? vfs_write+0x155/0x4b0
> > [  310.089838]  ? SyS_write+0xf7/0x240
> > [  310.090593]  ? do_syscall_64+0x235/0x5b0
> > [  310.091372]  ? entry_SYSCALL64_slow_path+0x25/0x25
> > [  310.094690]  ? sock_sendmsg+0xba/0xf0
> > [  310.096133]  ? do_syscall_64+0x235/0x5b0
> > [  310.097593]  ? entry_SYSCALL64_slow_path+0x25/0x25
> > [  310.099157]  ? tcp_init_tso_segs+0x1e0/0x1e0
> > [  310.100539]  ? radix_tree_lookup+0xd/0x10
> > [  310.101894]  ? get_work_pool+0xcd/0x150
> > [  310.103216]  ? check_flush_dependency+0x330/0x330
> > [  310.104113]  tcp_write_xmit+0x498/0x52a0
> > [  310.104905]  ? kasan_unpoison_shadow+0x35/0x50
> > [  310.105729]  ? kasan_kmalloc+0xad/0xe0
> > [  310.106505]  ? tcp_transmit_skb+