Re: [PATCH] net: ieee802154: fix net_device reference release too early
Hello. Sorry too late to reply. > > Hello. > > On Thu, 2017-05-18 at 15:14, Stefan Schmidt wrote: > > Hello. > > > > On Thu, 2017-05-18 at 15:50, linzhang wrote: > > > This patch fixes the kernel oops when release net_device reference in > > > advance. In function raw_sendmsg(i think the dgram_sendmsg has the same > > > problem), there is a race condition between dev_put and dev_queue_xmit > > > when the device is gong that maybe lead to dev_queue_ximt to see > > > an illegal net_device pointer. > > > > > > > You have a test case to reproduce this oops? I fear I have not seen > > one. > > If you have a test case handy adding it to the commit would be handy. If you > do > not have one around we can do without. > My test kernel is 3.13.0-32. Becasue i am not have a real 802154 device, so i change lowpan_newlink function to this: /* find and hold real wpan device */ real_dev = dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); if (!real_dev) return -ENODEV; // if (real_dev->type != ARPHRD_IEEE802154) { // dev_put(real_dev); // return -EINVAL; // } lowpan_dev_info(dev)->real_dev = real_dev; lowpan_dev_info(dev)->fragment_tag = 0; mutex_init(&lowpan_dev_info(dev)->dev_list_mtx); Also, in order to simulate preempt, i change the raw_sendmsg function to this: skb->dev = dev; skb->sk = sk; skb->protocol = htons(ETH_P_IEEE802154); dev_put(dev); //simulate preempt schedule_timeout_uninterruptible(30 * HZ); err = dev_queue_xmit(skb); if (err > 0) err = net_xmit_errno(err); and this is my userspace test code named test_send_data: #include #include #include #include #include int main(int argc, char **argv) { char buf[127]; int sockfd; sockfd = socket(AF_IEEE802154, SOCK_RAW, 0); if (sockfd < 0) { printf("create sockfd error: %s\n", strerror(errno)); return -1; } send(sockfd, buf, sizeof(buf), 0); return 0; } This is my test case: root@zhanglin-x-computer:~/develop/802154# uname -a Linux zhanglin-x-computer 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux root@zhanglin-x-computer:~/develop/802154# ip link add link eth0 name lowpan0 type lowpan root@zhanglin-x-computer:~/develop/802154# //keep the lowpan0 device down root@zhanglin-x-computer:~/develop/802154# ./test_send_data & //wait a while root@zhanglin-x-computer:~/develop/802154# ip link del link dev lowpan0 //the device is gone //oops [381.303307] general protection fault: [#1]SMP [381.303407] Modules linked in: af_802154 6lowpan bnep rfcomm bluetooth nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek rts5139(C) snd_hda_intel snd_had_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_req intel_rapl snd_seq_device coretemp i915 kvm_intel kvm snd_timer snd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cypted drm_kms_helper drm i2c_algo_bit soundcore video mac_hid parport_pc ppdev ip parport hid_generic usbhid hid ahci r8169 mii libahdi [381.304286] CPU:1 PID: 2524 Commm: 1 Tainted: G C 0 3.13.0-32-generic #57-Ubuntu [381.304409] Hardware name: Haier Haier DT Computer/Haier DT Codputer, BIOS FIBT19H02_X64 06/09/2014 [381.304546] tasks: 96965fc0 ti: B0013779c000 task.ti: B8013779c000 [381.304659] RIP: 0010:[] [] __dev_queue_ximt+0x61/0x500 [381.304798] RSP: 0018:B8013779dca0 EFLAGS: 00010202 [381.304880] RAX: 272b031d57565351 RBX: RCX: 8800968f1a00 [381.304987] RDX: RSI: RDI: 8800968f1a00 [381.305095] RBP: 8e013773dce0 R08: 0266 R09: 0004 [381.305202] R10: 0004 R11: 0005 R12: 88013902e000 [381.305310] R13: 007f R14: 007f R15: 8800968f1a00 [381.305418] FS: 7fc57f50f740() GS: 88013fc8() knlGS: [381.305540] CS: 0010 DS: ES: CR0: 8005003b [381.305627] CR2: 7fad0841c000 CR3: 0001368dd000 CR4: 001007e0 [361.905734] Stack: [381.305768] 002052d0 3facb30a 88013779dcc0 880137764000 [381.305898] 88013779de70 007f 007f 88013902e000 [381.306026] 88013779dcf0 81622490 88013779dd39 a03af9f1 [381.306155] Call Trace: [381.306202] [] dev_queue_xmit+0x10/0x20 [381.306294] [] raw_sendmsg+0x1b1/0x270 [af_802154] [381.306396] [] ieee802154_sock_sendmsg+0x14/0x20 [af_802154] [381.306512] [] sock_sendmsg+0x8b/0xc0 [381.306600] [] ? __d_alloc+0x25/0x180 [381.306687] [] ? kmem_cache_alloc_trace+0x1c6/0x1f0 [381.306791] [] SYSC_sendto+0x121/0x1c0 [381.306878] [] ? vtime_account_user+x54/0x60 [381.306975] [] ? syscall_trace_enter+0x145/0x250 [381.307073] [] SyS_sendto+0xe/0x10 [381.307156] [] tracesys
Re: [PATCH] net: ieee802154: fix net_device reference release too early
Hello. On Thu, 2017-05-18 at 15:14, Stefan Schmidt wrote: > Hello. > > On Thu, 2017-05-18 at 15:50, linzhang wrote: > > This patch fixes the kernel oops when release net_device reference in > > advance. In function raw_sendmsg(i think the dgram_sendmsg has the same > > problem), there is a race condition between dev_put and dev_queue_xmit > > when the device is gong that maybe lead to dev_queue_ximt to see > > an illegal net_device pointer. > > > > You have a test case to reproduce this oops? I fear I have not seen > one. If you have a test case handy adding it to the commit would be handy. If you do not have one around we can do without. > > So i think that dev_put should be behind of the dev_queue_xmit. > > > > Also, explicit set skb->sk is needless, sock_alloc_send_skb is > > already set it. > > You could have put this fixup in a different patch. I actually would request you to split this into two patches. One for the removal of the sk setting and one for the race condition fix. > > Signed-off-by: linzhang > > This looks more like a username instead of a real name. If you have Lin > Zhang as you English real name that would be better here. :) This would be also appreciated. > > --- > > net/ieee802154/socket.c | 10 -- > > 1 file changed, 4 insertions(+), 6 deletions(-) > > > > diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c > > index eedba76..a60658c 100644 > > --- a/net/ieee802154/socket.c > > +++ b/net/ieee802154/socket.c > > @@ -301,15 +301,14 @@ static int raw_sendmsg(struct sock *sk, struct msghdr > > *msg, size_t size) > > goto out_skb; > > > > skb->dev = dev; > > - skb->sk = sk; > > skb->protocol = htons(ETH_P_IEEE802154); > > > > - dev_put(dev); > > - > > err = dev_queue_xmit(skb); > > if (err > 0) > > err = net_xmit_errno(err); > > > > + dev_put(dev); > > + > > return err ?: size; > > > > out_skb: > > @@ -690,15 +689,14 @@ static int dgram_sendmsg(struct sock *sk, struct > > msghdr *msg, size_t size) > > goto out_skb; > > > > skb->dev = dev; > > - skb->sk = sk; > > skb->protocol = htons(ETH_P_IEEE802154); > > > > - dev_put(dev); > > - > > err = dev_queue_xmit(skb); > > if (err > 0) > > err = net_xmit_errno(err); > > > > + dev_put(dev); > > + > > return err ?: size; > > Going to give this a test ride here now. I gave it a ride in my testbed and I encountered no problems. While I have never seen the race and oops myself doing the dev_put before the xmit can surely lead to such a race and the fix is valid. Once you have done the changes requested above and re-submit your two patches you can add my Acked-by: Stefan Schmidt to both of them. regards Stefan Schmidt
Re: [PATCH] net: ieee802154: fix net_device reference release too early
Hello. On Thu, 2017-05-18 at 15:50, linzhang wrote: > This patch fixes the kernel oops when release net_device reference in > advance. In function raw_sendmsg(i think the dgram_sendmsg has the same > problem), there is a race condition between dev_put and dev_queue_xmit > when the device is gong that maybe lead to dev_queue_ximt to see > an illegal net_device pointer. > You have a test case to reproduce this oops? I fear I have not seen one. > So i think that dev_put should be behind of the dev_queue_xmit. > > Also, explicit set skb->sk is needless, sock_alloc_send_skb is > already set it. You could have put this fixup in a different patch. > Signed-off-by: linzhang This looks more like a username instead of a real name. If you have Lin Zhang as you English real name that would be better here. :) > --- > net/ieee802154/socket.c | 10 -- > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c > index eedba76..a60658c 100644 > --- a/net/ieee802154/socket.c > +++ b/net/ieee802154/socket.c > @@ -301,15 +301,14 @@ static int raw_sendmsg(struct sock *sk, struct msghdr > *msg, size_t size) > goto out_skb; > > skb->dev = dev; > - skb->sk = sk; > skb->protocol = htons(ETH_P_IEEE802154); > > - dev_put(dev); > - > err = dev_queue_xmit(skb); > if (err > 0) > err = net_xmit_errno(err); > > + dev_put(dev); > + > return err ?: size; > > out_skb: > @@ -690,15 +689,14 @@ static int dgram_sendmsg(struct sock *sk, struct msghdr > *msg, size_t size) > goto out_skb; > > skb->dev = dev; > - skb->sk = sk; > skb->protocol = htons(ETH_P_IEEE802154); > > - dev_put(dev); > - > err = dev_queue_xmit(skb); > if (err > 0) > err = net_xmit_errno(err); > > + dev_put(dev); > + > return err ?: size; Going to give this a test ride here now. regards Stefan Schmidt