Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
on Sat, 15 Jun 2013 11:08:47 +0300, Ming Lei wrote: On Sat, Jun 15, 2013 at 2:30 PM, Guenter Roeck wrote: On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote: On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison wrote: > patch applied and no longer have the bug message when i > reboot and wake up the ethernet controller. I am wondering if Guenter's patch can fix the race really, but I'd like to see Guenter's explanation on his patch. The race should be caused by below: - request timeout triggered by internal timer - user space aborts the requests before the line in _request_firmware_load() fw_priv->buf = NULL which is run in timeout path - then the abort() called from firmware_loading_store() may use a freed fw buf since the timeout path will free the fw buffer. Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't protected by fw_lock now, so Guenter's patch can't avoid the race entirely. I agree; my patch only protects one specific path, and was based on the observation that access to fw_priv->buf is protected elsewhwere in the code. My suspicion was that fw_priv->buf was freed while waiting for the mutex in firmware_loading_store(). Your patch is more comprehensive. OK, thanks for your reply. I will post out one version for merge, and this one moves the "fw_priv->buf = NULL;" into fw_load_abort() for simplifying change. this is just to let you know that i've tested Ming Lei's latest patch. thank you very much for the fix and the explanation. -- nirinA -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Sat, Jun 15, 2013 at 2:30 PM, Guenter Roeck wrote: > On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote: >> On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison >> wrote: >> >> > patch applied and no longer have the bug message when i >> > reboot and wake up the ethernet controller. >> >> I am wondering if Guenter's patch can fix the race really, but I'd like to >> see Guenter's explanation on his patch. >> >> The race should be caused by below: >> >> - request timeout triggered by internal timer >> >> - user space aborts the requests before the line in _request_firmware_load() >> >> fw_priv->buf = NULL >> >> which is run in timeout path >> >> - then the abort() called from firmware_loading_store() may use a freed fw >> buf >> since the timeout path will free the fw buffer. >> >> Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't >> protected >> by fw_lock now, so Guenter's patch can't avoid the race entirely. >> > I agree; my patch only protects one specific path, and was based on the > observation that access to fw_priv->buf is protected elsewhwere in the code. > My suspicion was that fw_priv->buf was freed while waiting for the mutex in > firmware_loading_store(). > > Your patch is more comprehensive. OK, thanks for your reply. I will post out one version for merge, and this one moves the "fw_priv->buf = NULL;" into fw_load_abort() for simplifying change. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote: > On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison > wrote: > > > patch applied and no longer have the bug message when i > > reboot and wake up the ethernet controller. > > I am wondering if Guenter's patch can fix the race really, but I'd like to > see Guenter's explanation on his patch. > > The race should be caused by below: > > - request timeout triggered by internal timer > > - user space aborts the requests before the line in _request_firmware_load() > > fw_priv->buf = NULL > > which is run in timeout path > > - then the abort() called from firmware_loading_store() may use a freed fw buf > since the timeout path will free the fw buffer. > > Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't > protected > by fw_lock now, so Guenter's patch can't avoid the race entirely. > I agree; my patch only protects one specific path, and was based on the observation that access to fw_priv->buf is protected elsewhwere in the code. My suspicion was that fw_priv->buf was freed while waiting for the mutex in firmware_loading_store(). Your patch is more comprehensive. Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote: On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison nirina.raseliari...@gmail.com wrote: patch applied and no longer have the bug message when i reboot and wake up the ethernet controller. I am wondering if Guenter's patch can fix the race really, but I'd like to see Guenter's explanation on his patch. The race should be caused by below: - request timeout triggered by internal timer - user space aborts the requests before the line in _request_firmware_load() fw_priv-buf = NULL which is run in timeout path - then the abort() called from firmware_loading_store() may use a freed fw buf since the timeout path will free the fw buffer. Considered clearing 'fw_priv-buf' in _request_firmware_load()() isn't protected by fw_lock now, so Guenter's patch can't avoid the race entirely. I agree; my patch only protects one specific path, and was based on the observation that access to fw_priv-buf is protected elsewhwere in the code. My suspicion was that fw_priv-buf was freed while waiting for the mutex in firmware_loading_store(). Your patch is more comprehensive. Guenter -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Sat, Jun 15, 2013 at 2:30 PM, Guenter Roeck li...@roeck-us.net wrote: On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote: On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison nirina.raseliari...@gmail.com wrote: patch applied and no longer have the bug message when i reboot and wake up the ethernet controller. I am wondering if Guenter's patch can fix the race really, but I'd like to see Guenter's explanation on his patch. The race should be caused by below: - request timeout triggered by internal timer - user space aborts the requests before the line in _request_firmware_load() fw_priv-buf = NULL which is run in timeout path - then the abort() called from firmware_loading_store() may use a freed fw buf since the timeout path will free the fw buffer. Considered clearing 'fw_priv-buf' in _request_firmware_load()() isn't protected by fw_lock now, so Guenter's patch can't avoid the race entirely. I agree; my patch only protects one specific path, and was based on the observation that access to fw_priv-buf is protected elsewhwere in the code. My suspicion was that fw_priv-buf was freed while waiting for the mutex in firmware_loading_store(). Your patch is more comprehensive. OK, thanks for your reply. I will post out one version for merge, and this one moves the fw_priv-buf = NULL; into fw_load_abort() for simplifying change. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
on Sat, 15 Jun 2013 11:08:47 +0300, Ming Lei ming@canonical.com wrote: On Sat, Jun 15, 2013 at 2:30 PM, Guenter Roeck li...@roeck-us.net wrote: On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote: On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison nirina.raseliari...@gmail.com wrote: patch applied and no longer have the bug message when i reboot and wake up the ethernet controller. I am wondering if Guenter's patch can fix the race really, but I'd like to see Guenter's explanation on his patch. The race should be caused by below: - request timeout triggered by internal timer - user space aborts the requests before the line in _request_firmware_load() fw_priv-buf = NULL which is run in timeout path - then the abort() called from firmware_loading_store() may use a freed fw buf since the timeout path will free the fw buffer. Considered clearing 'fw_priv-buf' in _request_firmware_load()() isn't protected by fw_lock now, so Guenter's patch can't avoid the race entirely. I agree; my patch only protects one specific path, and was based on the observation that access to fw_priv-buf is protected elsewhwere in the code. My suspicion was that fw_priv-buf was freed while waiting for the mutex in firmware_loading_store(). Your patch is more comprehensive. OK, thanks for your reply. I will post out one version for merge, and this one moves the fw_priv-buf = NULL; into fw_load_abort() for simplifying change. this is just to let you know that i've tested Ming Lei's latest patch. thank you very much for the fix and the explanation. -- nirinA -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison wrote: > patch applied and no longer have the bug message when i > reboot and wake up the ethernet controller. I am wondering if Guenter's patch can fix the race really, but I'd like to see Guenter's explanation on his patch. The race should be caused by below: - request timeout triggered by internal timer - user space aborts the requests before the line in _request_firmware_load() fw_priv->buf = NULL which is run in timeout path - then the abort() called from firmware_loading_store() may use a freed fw buf since the timeout path will free the fw buffer. Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't protected by fw_lock now, so Guenter's patch can't avoid the race entirely. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
on Fri, 14 Jun 2013 20:02:25 +0300, Ming Lei wrote: On Fri, Jun 14, 2013 at 10:30 PM, Bjorn Helgaas wrote: [+cc Ming, Hayes, Francois, r8169 list] On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison wrote: hello there, i have this ethernet controler: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 05) that uses the r8169 module. it works fine, but sometimes after a reboot and issueing: ifconfig eth0 192.168.1.1 up i got the message below. after another reboot the message disappears. i also get the same message this 3.9.5 and 3.9.4. it seems i catch my first oops and don't know what to do with it. currently running: cat /proc/version Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1 SMP Fri Jun 14 09:14:50 EAT 2013 uname -a Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64 Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux thanks, -8<--8<--- [ 57.877560] BUG: unable to handle kernel NULL pointer dereference at 0040 [ 57.877603] IP: [] fw_load_abort.isra.5+0x4/0x20 [ 57.877634] PGD 21330a067 PUD 211a3a067 PMD 0 [ 57.877660] Oops: 0002 [#1] SMP [ 57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169 microcode mii [ 57.877735] CPU 0 [ 57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01 [ 57.877790] RIP: 0010:[] [] fw_load_abort.isra.5+0x4/0x20 [ 57.877824] RSP: 0018:8802119a7e80 EFLAGS: 00010246 [ 57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX: [ 57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI: [ 57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09: 05aa [ 57.877920] R10: R11: R12: [ 57.877945] R13: 880213d34088 R14: 0003 R15: 88020eafc230 [ 57.877970] FS: 7f3c6cb2a740() GS:88021f20() knlGS: [ 57.877998] CS: 0010 DS: ES: CR0: 80050033 [ 57.878019] CR2: 0040 CR3: 000203155000 CR4: 001407f0 [ 57.878044] DR0: DR1: DR2: [ 57.878069] DR3: DR6: 0ff0 DR7: 0400 [ 57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000, task 8802158fe250) [ 57.878124] Stack: [ 57.878133] 8802119a7eb0 81491917 880211a4d5a0 0003 [ 57.878168] 8802119a7f50 818765a0 8802119a7ec0 81483063 [ 57.878203] 8802119a7f08 8119bc9e 880213d34098 880211a4d5c0 [ 57.878237] Call Trace: [ 57.878251] [] firmware_loading_store+0x77/0x150 [ 57.878275] [] dev_attr_store+0x13/0x20 [ 57.878297] [] sysfs_write_file+0xce/0x140 [ 57.878320] [] vfs_write+0x9a/0x160 [ 57.878340] [] sys_write+0x44/0x90 [ 57.878360] [] system_call_fastpath+0x1a/0x1f [ 57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f [ 57.881753] RIP [] fw_load_abort.isra.5+0x4/0x20 [ 57.882888] RSP [ 57.884019] CR2: 0040 [ 57.885166] ---[ end trace 6705f6d4ce6b6a12 ]--- Looks it is a double abort race, could you try below patch? (also attached for applying) i've also applied this patch and up to now, after reboot a few times all thing seems to work fine. thanks, -- diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 6ede229..a217ba8 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device *dev, struct device_attribute *attr, char *buf) { struct firmware_priv *fw_priv = to_firmware_priv(dev); - int loading = test_bit(FW_STATUS_LOADING, _priv->buf->status); + int loading = 0; + + mutex_lock(_lock); + if (fw_priv->buf) + loading = test_bit(FW_STATUS_LOADING, _priv->buf->status); + mutex_unlock(_lock); return sprintf(buf, "%d\n", loading); } @@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct device *dev, const char *buf, size_t count) { struct firmware_priv *fw_priv = to_firmware_priv(dev); - struct firmware_buf *fw_buf = fw_priv->buf; + struct firmware_buf *fw_buf; int loading = simple_strtol(buf, NULL, 10); int i; mutex_lock(_lock); - + fw_buf = fw_priv->buf; if (!fw_buf) goto out; @@ -636,6 +641,7 @@ static ssize_t firmware_loading_store(struct
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
on Fri, 14 Jun 2013 18:45:48 +0300, Guenter Roeck wrote: On Fri, Jun 14, 2013 at 08:30:29AM -0600, Bjorn Helgaas wrote: [+cc Ming, Hayes, Francois, r8169 list] On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison wrote: > hello there, > i have this ethernet controler: > > Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet > controller (rev 05) > > that uses the r8169 module. > it works fine, but sometimes after a reboot and issueing: > > ifconfig eth0 192.168.1.1 up > > i got the message below. after another reboot the > message disappears. i also get the same message this 3.9.5 and 3.9.4. > > it seems i catch my first oops and don't know what to do with it. > currently running: > > cat /proc/version > Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1 > SMP Fri Jun 14 09:14:50 EAT 2013 > > uname -a > Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64 > Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux > > thanks, > -8<--8<--- > > [ 57.877560] BUG: unable to handle kernel NULL pointer dereference at > 0040 > [ 57.877603] IP: [] fw_load_abort.isra.5+0x4/0x20 > [ 57.877634] PGD 21330a067 PUD 211a3a067 PMD 0 > [ 57.877660] Oops: 0002 [#1] SMP > [ 57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169 > microcode mii > [ 57.877735] CPU 0 > [ 57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be > filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01 > [ 57.877790] RIP: 0010:[] [] > fw_load_abort.isra.5+0x4/0x20 > [ 57.877824] RSP: 0018:8802119a7e80 EFLAGS: 00010246 > [ 57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX: > > [ 57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI: > > [ 57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09: > 05aa > [ 57.877920] R10: R11: R12: > > [ 57.877945] R13: 880213d34088 R14: 0003 R15: > 88020eafc230 > [ 57.877970] FS: 7f3c6cb2a740() GS:88021f20() > knlGS: > [ 57.877998] CS: 0010 DS: ES: CR0: 80050033 > [ 57.878019] CR2: 0040 CR3: 000203155000 CR4: > 001407f0 > [ 57.878044] DR0: DR1: DR2: > > [ 57.878069] DR3: DR6: 0ff0 DR7: > 0400 > [ 57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000, > task 8802158fe250) > [ 57.878124] Stack: > [ 57.878133] 8802119a7eb0 81491917 880211a4d5a0 > 0003 > [ 57.878168] 8802119a7f50 818765a0 8802119a7ec0 > 81483063 > [ 57.878203] 8802119a7f08 8119bc9e 880213d34098 > 880211a4d5c0 > [ 57.878237] Call Trace: > [ 57.878251] [] firmware_loading_store+0x77/0x150 > [ 57.878275] [] dev_attr_store+0x13/0x20 > [ 57.878297] [] sysfs_write_file+0xce/0x140 > [ 57.878320] [] vfs_write+0x9a/0x160 > [ 57.878340] [] sys_write+0x44/0x90 > [ 57.878360] [] system_call_fastpath+0x1a/0x1f > [ 57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff > ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5 > 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f > [ 57.881753] RIP [] fw_load_abort.isra.5+0x4/0x20 > [ 57.882888] RSP > [ 57.884019] CR2: 0040 > [ 57.885166] ---[ end trace 6705f6d4ce6b6a12 ]--- > Please try the following patch. patch applied and no longer have the bug message when i reboot and wake up the ethernet controller. thanks, [ Bjorn, sorry I dropped you from the recipient list, but unfortunately Google still considers me to be a spammer and doesn't let me send any e-mail to you ] Guenter -- From 9feae0b1b33721573c41fbf2323db2a12c34c725 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Fri, 14 Jun 2013 08:39:06 -0700 Subject: [PATCH] firmware: Fix race condition in firmware_loading_store Fix: BUG: unable to handle kernel NULL pointer dereference at 0040 IP: [] fw_load_abort.isra.5+0x4/0x20 ... Call Trace: [] firmware_loading_store+0x77/0x150 [] dev_attr_store+0x13/0x20 [] sysfs_write_file+0xce/0x140 [] vfs_write+0x9a/0x160 [] sys_write+0x44/0x90 [] system_call_fastpath+0x1a/0x1f Signed-off-by: Guenter Roeck --- drivers/base/firmware_class.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 4b1f926..f34b489 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -570,12 +570,13 @@ static ssize_t firmware_loading_store(struct device *dev, const char *buf, size_t
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Fri, Jun 14, 2013 at 10:30 PM, Bjorn Helgaas wrote: > [+cc Ming, Hayes, Francois, r8169 list] > > On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison > wrote: >> hello there, >> i have this ethernet controler: >> >> Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet >> controller (rev 05) >> >> that uses the r8169 module. >> it works fine, but sometimes after a reboot and issueing: >> >> ifconfig eth0 192.168.1.1 up >> >> i got the message below. after another reboot the >> message disappears. i also get the same message this 3.9.5 and 3.9.4. >> >> it seems i catch my first oops and don't know what to do with it. >> currently running: >> >> cat /proc/version >> Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1 >> SMP Fri Jun 14 09:14:50 EAT 2013 >> >> uname -a >> Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64 >> Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux >> >> thanks, >> -8<--8<--- >> >> [ 57.877560] BUG: unable to handle kernel NULL pointer dereference at >> 0040 >> [ 57.877603] IP: [] fw_load_abort.isra.5+0x4/0x20 >> [ 57.877634] PGD 21330a067 PUD 211a3a067 PMD 0 >> [ 57.877660] Oops: 0002 [#1] SMP >> [ 57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169 >> microcode mii >> [ 57.877735] CPU 0 >> [ 57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be >> filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01 >> [ 57.877790] RIP: 0010:[] [] >> fw_load_abort.isra.5+0x4/0x20 >> [ 57.877824] RSP: 0018:8802119a7e80 EFLAGS: 00010246 >> [ 57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX: >> >> [ 57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI: >> >> [ 57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09: >> 05aa >> [ 57.877920] R10: R11: R12: >> >> [ 57.877945] R13: 880213d34088 R14: 0003 R15: >> 88020eafc230 >> [ 57.877970] FS: 7f3c6cb2a740() GS:88021f20() >> knlGS: >> [ 57.877998] CS: 0010 DS: ES: CR0: 80050033 >> [ 57.878019] CR2: 0040 CR3: 000203155000 CR4: >> 001407f0 >> [ 57.878044] DR0: DR1: DR2: >> >> [ 57.878069] DR3: DR6: 0ff0 DR7: >> 0400 >> [ 57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000, >> task 8802158fe250) >> [ 57.878124] Stack: >> [ 57.878133] 8802119a7eb0 81491917 880211a4d5a0 >> 0003 >> [ 57.878168] 8802119a7f50 818765a0 8802119a7ec0 >> 81483063 >> [ 57.878203] 8802119a7f08 8119bc9e 880213d34098 >> 880211a4d5c0 >> [ 57.878237] Call Trace: >> [ 57.878251] [] firmware_loading_store+0x77/0x150 >> [ 57.878275] [] dev_attr_store+0x13/0x20 >> [ 57.878297] [] sysfs_write_file+0xce/0x140 >> [ 57.878320] [] vfs_write+0x9a/0x160 >> [ 57.878340] [] sys_write+0x44/0x90 >> [ 57.878360] [] system_call_fastpath+0x1a/0x1f >> [ 57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff >> ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5 >> 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f >> [ 57.881753] RIP [] fw_load_abort.isra.5+0x4/0x20 >> [ 57.882888] RSP >> [ 57.884019] CR2: 0040 >> [ 57.885166] ---[ end trace 6705f6d4ce6b6a12 ]--- Looks it is a double abort race, could you try below patch? (also attached for applying) -- diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 6ede229..a217ba8 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device *dev, struct device_attribute *attr, char *buf) { struct firmware_priv *fw_priv = to_firmware_priv(dev); - int loading = test_bit(FW_STATUS_LOADING, _priv->buf->status); + int loading = 0; + + mutex_lock(_lock); + if (fw_priv->buf) + loading = test_bit(FW_STATUS_LOADING, _priv->buf->status); + mutex_unlock(_lock); return sprintf(buf, "%d\n", loading); } @@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct device *dev, const char *buf, size_t count) { struct firmware_priv *fw_priv = to_firmware_priv(dev); - struct firmware_buf *fw_buf = fw_priv->buf; + struct firmware_buf *fw_buf; int loading = simple_strtol(buf, NULL, 10); int i; mutex_lock(_lock); - + fw_buf = fw_priv->buf; if (!fw_buf) goto out; @@ -636,6 +641,7 @@
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Fri, Jun 14, 2013 at 08:30:29AM -0600, Bjorn Helgaas wrote: > [+cc Ming, Hayes, Francois, r8169 list] > > On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison > wrote: > > hello there, > > i have this ethernet controler: > > > > Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet > > controller (rev 05) > > > > that uses the r8169 module. > > it works fine, but sometimes after a reboot and issueing: > > > > ifconfig eth0 192.168.1.1 up > > > > i got the message below. after another reboot the > > message disappears. i also get the same message this 3.9.5 and 3.9.4. > > > > it seems i catch my first oops and don't know what to do with it. > > currently running: > > > > cat /proc/version > > Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1 > > SMP Fri Jun 14 09:14:50 EAT 2013 > > > > uname -a > > Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64 > > Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux > > > > thanks, > > -8<--8<--- > > > > [ 57.877560] BUG: unable to handle kernel NULL pointer dereference at > > 0040 > > [ 57.877603] IP: [] fw_load_abort.isra.5+0x4/0x20 > > [ 57.877634] PGD 21330a067 PUD 211a3a067 PMD 0 > > [ 57.877660] Oops: 0002 [#1] SMP > > [ 57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169 > > microcode mii > > [ 57.877735] CPU 0 > > [ 57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be > > filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01 > > [ 57.877790] RIP: 0010:[] [] > > fw_load_abort.isra.5+0x4/0x20 > > [ 57.877824] RSP: 0018:8802119a7e80 EFLAGS: 00010246 > > [ 57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX: > > > > [ 57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI: > > > > [ 57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09: > > 05aa > > [ 57.877920] R10: R11: R12: > > > > [ 57.877945] R13: 880213d34088 R14: 0003 R15: > > 88020eafc230 > > [ 57.877970] FS: 7f3c6cb2a740() GS:88021f20() > > knlGS: > > [ 57.877998] CS: 0010 DS: ES: CR0: 80050033 > > [ 57.878019] CR2: 0040 CR3: 000203155000 CR4: > > 001407f0 > > [ 57.878044] DR0: DR1: DR2: > > > > [ 57.878069] DR3: DR6: 0ff0 DR7: > > 0400 > > [ 57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000, > > task 8802158fe250) > > [ 57.878124] Stack: > > [ 57.878133] 8802119a7eb0 81491917 880211a4d5a0 > > 0003 > > [ 57.878168] 8802119a7f50 818765a0 8802119a7ec0 > > 81483063 > > [ 57.878203] 8802119a7f08 8119bc9e 880213d34098 > > 880211a4d5c0 > > [ 57.878237] Call Trace: > > [ 57.878251] [] firmware_loading_store+0x77/0x150 > > [ 57.878275] [] dev_attr_store+0x13/0x20 > > [ 57.878297] [] sysfs_write_file+0xce/0x140 > > [ 57.878320] [] vfs_write+0x9a/0x160 > > [ 57.878340] [] sys_write+0x44/0x90 > > [ 57.878360] [] system_call_fastpath+0x1a/0x1f > > [ 57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff > > ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5 > > 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f > > [ 57.881753] RIP [] fw_load_abort.isra.5+0x4/0x20 > > [ 57.882888] RSP > > [ 57.884019] CR2: 0040 > > [ 57.885166] ---[ end trace 6705f6d4ce6b6a12 ]--- > > Please try the following patch. [ Bjorn, sorry I dropped you from the recipient list, but unfortunately Google still considers me to be a spammer and doesn't let me send any e-mail to you ] Guenter -- >From 9feae0b1b33721573c41fbf2323db2a12c34c725 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Fri, 14 Jun 2013 08:39:06 -0700 Subject: [PATCH] firmware: Fix race condition in firmware_loading_store Fix: BUG: unable to handle kernel NULL pointer dereference at 0040 IP: [] fw_load_abort.isra.5+0x4/0x20 ... Call Trace: [] firmware_loading_store+0x77/0x150 [] dev_attr_store+0x13/0x20 [] sysfs_write_file+0xce/0x140 [] vfs_write+0x9a/0x160 [] sys_write+0x44/0x90 [] system_call_fastpath+0x1a/0x1f Signed-off-by: Guenter Roeck --- drivers/base/firmware_class.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 4b1f926..f34b489 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -570,12 +570,13 @@ static ssize_t firmware_loading_store(struct device *dev, const char *buf, size_t count) { struct
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[+cc Ming, Hayes, Francois, r8169 list] On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison wrote: > hello there, > i have this ethernet controler: > > Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet > controller (rev 05) > > that uses the r8169 module. > it works fine, but sometimes after a reboot and issueing: > > ifconfig eth0 192.168.1.1 up > > i got the message below. after another reboot the > message disappears. i also get the same message this 3.9.5 and 3.9.4. > > it seems i catch my first oops and don't know what to do with it. > currently running: > > cat /proc/version > Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1 > SMP Fri Jun 14 09:14:50 EAT 2013 > > uname -a > Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64 > Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux > > thanks, > -8<--8<--- > > [ 57.877560] BUG: unable to handle kernel NULL pointer dereference at > 0040 > [ 57.877603] IP: [] fw_load_abort.isra.5+0x4/0x20 > [ 57.877634] PGD 21330a067 PUD 211a3a067 PMD 0 > [ 57.877660] Oops: 0002 [#1] SMP > [ 57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169 > microcode mii > [ 57.877735] CPU 0 > [ 57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be > filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01 > [ 57.877790] RIP: 0010:[] [] > fw_load_abort.isra.5+0x4/0x20 > [ 57.877824] RSP: 0018:8802119a7e80 EFLAGS: 00010246 > [ 57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX: > > [ 57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI: > > [ 57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09: > 05aa > [ 57.877920] R10: R11: R12: > > [ 57.877945] R13: 880213d34088 R14: 0003 R15: > 88020eafc230 > [ 57.877970] FS: 7f3c6cb2a740() GS:88021f20() > knlGS: > [ 57.877998] CS: 0010 DS: ES: CR0: 80050033 > [ 57.878019] CR2: 0040 CR3: 000203155000 CR4: > 001407f0 > [ 57.878044] DR0: DR1: DR2: > > [ 57.878069] DR3: DR6: 0ff0 DR7: > 0400 > [ 57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000, > task 8802158fe250) > [ 57.878124] Stack: > [ 57.878133] 8802119a7eb0 81491917 880211a4d5a0 > 0003 > [ 57.878168] 8802119a7f50 818765a0 8802119a7ec0 > 81483063 > [ 57.878203] 8802119a7f08 8119bc9e 880213d34098 > 880211a4d5c0 > [ 57.878237] Call Trace: > [ 57.878251] [] firmware_loading_store+0x77/0x150 > [ 57.878275] [] dev_attr_store+0x13/0x20 > [ 57.878297] [] sysfs_write_file+0xce/0x140 > [ 57.878320] [] vfs_write+0x9a/0x160 > [ 57.878340] [] sys_write+0x44/0x90 > [ 57.878360] [] system_call_fastpath+0x1a/0x1f > [ 57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff > ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5 > 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f > [ 57.881753] RIP [] fw_load_abort.isra.5+0x4/0x20 > [ 57.882888] RSP > [ 57.884019] CR2: 0040 > [ 57.885166] ---[ end trace 6705f6d4ce6b6a12 ]--- > > -- > nirinA > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[+cc Ming, Hayes, Francois, r8169 list] On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison nirina.raseliari...@gmail.com wrote: hello there, i have this ethernet controler: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 05) that uses the r8169 module. it works fine, but sometimes after a reboot and issueing: ifconfig eth0 192.168.1.1 up i got the message below. after another reboot the message disappears. i also get the same message this 3.9.5 and 3.9.4. it seems i catch my first oops and don't know what to do with it. currently running: cat /proc/version Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1 SMP Fri Jun 14 09:14:50 EAT 2013 uname -a Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64 Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux thanks, -8--8--- [ 57.877560] BUG: unable to handle kernel NULL pointer dereference at 0040 [ 57.877603] IP: [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.877634] PGD 21330a067 PUD 211a3a067 PMD 0 [ 57.877660] Oops: 0002 [#1] SMP [ 57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169 microcode mii [ 57.877735] CPU 0 [ 57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01 [ 57.877790] RIP: 0010:[81491844] [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.877824] RSP: 0018:8802119a7e80 EFLAGS: 00010246 [ 57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX: [ 57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI: [ 57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09: 05aa [ 57.877920] R10: R11: R12: [ 57.877945] R13: 880213d34088 R14: 0003 R15: 88020eafc230 [ 57.877970] FS: 7f3c6cb2a740() GS:88021f20() knlGS: [ 57.877998] CS: 0010 DS: ES: CR0: 80050033 [ 57.878019] CR2: 0040 CR3: 000203155000 CR4: 001407f0 [ 57.878044] DR0: DR1: DR2: [ 57.878069] DR3: DR6: 0ff0 DR7: 0400 [ 57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000, task 8802158fe250) [ 57.878124] Stack: [ 57.878133] 8802119a7eb0 81491917 880211a4d5a0 0003 [ 57.878168] 8802119a7f50 818765a0 8802119a7ec0 81483063 [ 57.878203] 8802119a7f08 8119bc9e 880213d34098 880211a4d5c0 [ 57.878237] Call Trace: [ 57.878251] [81491917] firmware_loading_store+0x77/0x150 [ 57.878275] [81483063] dev_attr_store+0x13/0x20 [ 57.878297] [8119bc9e] sysfs_write_file+0xce/0x140 [ 57.878320] [81133e8a] vfs_write+0x9a/0x160 [ 57.878340] [81134164] sys_write+0x44/0x90 [ 57.878360] [817d70ed] system_call_fastpath+0x1a/0x1f [ 57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5 f0 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f [ 57.881753] RIP [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.882888] RSP 8802119a7e80 [ 57.884019] CR2: 0040 [ 57.885166] ---[ end trace 6705f6d4ce6b6a12 ]--- -- nirinA -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Fri, Jun 14, 2013 at 08:30:29AM -0600, Bjorn Helgaas wrote: [+cc Ming, Hayes, Francois, r8169 list] On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison nirina.raseliari...@gmail.com wrote: hello there, i have this ethernet controler: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 05) that uses the r8169 module. it works fine, but sometimes after a reboot and issueing: ifconfig eth0 192.168.1.1 up i got the message below. after another reboot the message disappears. i also get the same message this 3.9.5 and 3.9.4. it seems i catch my first oops and don't know what to do with it. currently running: cat /proc/version Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1 SMP Fri Jun 14 09:14:50 EAT 2013 uname -a Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64 Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux thanks, -8--8--- [ 57.877560] BUG: unable to handle kernel NULL pointer dereference at 0040 [ 57.877603] IP: [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.877634] PGD 21330a067 PUD 211a3a067 PMD 0 [ 57.877660] Oops: 0002 [#1] SMP [ 57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169 microcode mii [ 57.877735] CPU 0 [ 57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01 [ 57.877790] RIP: 0010:[81491844] [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.877824] RSP: 0018:8802119a7e80 EFLAGS: 00010246 [ 57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX: [ 57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI: [ 57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09: 05aa [ 57.877920] R10: R11: R12: [ 57.877945] R13: 880213d34088 R14: 0003 R15: 88020eafc230 [ 57.877970] FS: 7f3c6cb2a740() GS:88021f20() knlGS: [ 57.877998] CS: 0010 DS: ES: CR0: 80050033 [ 57.878019] CR2: 0040 CR3: 000203155000 CR4: 001407f0 [ 57.878044] DR0: DR1: DR2: [ 57.878069] DR3: DR6: 0ff0 DR7: 0400 [ 57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000, task 8802158fe250) [ 57.878124] Stack: [ 57.878133] 8802119a7eb0 81491917 880211a4d5a0 0003 [ 57.878168] 8802119a7f50 818765a0 8802119a7ec0 81483063 [ 57.878203] 8802119a7f08 8119bc9e 880213d34098 880211a4d5c0 [ 57.878237] Call Trace: [ 57.878251] [81491917] firmware_loading_store+0x77/0x150 [ 57.878275] [81483063] dev_attr_store+0x13/0x20 [ 57.878297] [8119bc9e] sysfs_write_file+0xce/0x140 [ 57.878320] [81133e8a] vfs_write+0x9a/0x160 [ 57.878340] [81134164] sys_write+0x44/0x90 [ 57.878360] [817d70ed] system_call_fastpath+0x1a/0x1f [ 57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5 f0 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f [ 57.881753] RIP [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.882888] RSP 8802119a7e80 [ 57.884019] CR2: 0040 [ 57.885166] ---[ end trace 6705f6d4ce6b6a12 ]--- Please try the following patch. [ Bjorn, sorry I dropped you from the recipient list, but unfortunately Google still considers me to be a spammer and doesn't let me send any e-mail to you ] Guenter -- From 9feae0b1b33721573c41fbf2323db2a12c34c725 Mon Sep 17 00:00:00 2001 From: Guenter Roeck li...@roeck-us.net Date: Fri, 14 Jun 2013 08:39:06 -0700 Subject: [PATCH] firmware: Fix race condition in firmware_loading_store Fix: BUG: unable to handle kernel NULL pointer dereference at 0040 IP: [81491844] fw_load_abort.isra.5+0x4/0x20 ... Call Trace: [81491917] firmware_loading_store+0x77/0x150 [81483063] dev_attr_store+0x13/0x20 [8119bc9e] sysfs_write_file+0xce/0x140 [81133e8a] vfs_write+0x9a/0x160 [81134164] sys_write+0x44/0x90 [817d70ed] system_call_fastpath+0x1a/0x1f Signed-off-by: Guenter Roeck li...@roeck-us.net --- drivers/base/firmware_class.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 4b1f926..f34b489 100644 --- a/drivers/base/firmware_class.c +++
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Fri, Jun 14, 2013 at 10:30 PM, Bjorn Helgaas bhelg...@google.com wrote: [+cc Ming, Hayes, Francois, r8169 list] On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison nirina.raseliari...@gmail.com wrote: hello there, i have this ethernet controler: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 05) that uses the r8169 module. it works fine, but sometimes after a reboot and issueing: ifconfig eth0 192.168.1.1 up i got the message below. after another reboot the message disappears. i also get the same message this 3.9.5 and 3.9.4. it seems i catch my first oops and don't know what to do with it. currently running: cat /proc/version Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1 SMP Fri Jun 14 09:14:50 EAT 2013 uname -a Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64 Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux thanks, -8--8--- [ 57.877560] BUG: unable to handle kernel NULL pointer dereference at 0040 [ 57.877603] IP: [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.877634] PGD 21330a067 PUD 211a3a067 PMD 0 [ 57.877660] Oops: 0002 [#1] SMP [ 57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169 microcode mii [ 57.877735] CPU 0 [ 57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01 [ 57.877790] RIP: 0010:[81491844] [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.877824] RSP: 0018:8802119a7e80 EFLAGS: 00010246 [ 57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX: [ 57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI: [ 57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09: 05aa [ 57.877920] R10: R11: R12: [ 57.877945] R13: 880213d34088 R14: 0003 R15: 88020eafc230 [ 57.877970] FS: 7f3c6cb2a740() GS:88021f20() knlGS: [ 57.877998] CS: 0010 DS: ES: CR0: 80050033 [ 57.878019] CR2: 0040 CR3: 000203155000 CR4: 001407f0 [ 57.878044] DR0: DR1: DR2: [ 57.878069] DR3: DR6: 0ff0 DR7: 0400 [ 57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000, task 8802158fe250) [ 57.878124] Stack: [ 57.878133] 8802119a7eb0 81491917 880211a4d5a0 0003 [ 57.878168] 8802119a7f50 818765a0 8802119a7ec0 81483063 [ 57.878203] 8802119a7f08 8119bc9e 880213d34098 880211a4d5c0 [ 57.878237] Call Trace: [ 57.878251] [81491917] firmware_loading_store+0x77/0x150 [ 57.878275] [81483063] dev_attr_store+0x13/0x20 [ 57.878297] [8119bc9e] sysfs_write_file+0xce/0x140 [ 57.878320] [81133e8a] vfs_write+0x9a/0x160 [ 57.878340] [81134164] sys_write+0x44/0x90 [ 57.878360] [817d70ed] system_call_fastpath+0x1a/0x1f [ 57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5 f0 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f [ 57.881753] RIP [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.882888] RSP 8802119a7e80 [ 57.884019] CR2: 0040 [ 57.885166] ---[ end trace 6705f6d4ce6b6a12 ]--- Looks it is a double abort race, could you try below patch? (also attached for applying) -- diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 6ede229..a217ba8 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device *dev, struct device_attribute *attr, char *buf) { struct firmware_priv *fw_priv = to_firmware_priv(dev); - int loading = test_bit(FW_STATUS_LOADING, fw_priv-buf-status); + int loading = 0; + + mutex_lock(fw_lock); + if (fw_priv-buf) + loading = test_bit(FW_STATUS_LOADING, fw_priv-buf-status); + mutex_unlock(fw_lock); return sprintf(buf, %d\n, loading); } @@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct device *dev, const char *buf, size_t count) { struct firmware_priv *fw_priv = to_firmware_priv(dev); - struct firmware_buf *fw_buf = fw_priv-buf; + struct firmware_buf *fw_buf; int loading = simple_strtol(buf, NULL, 10); int i; mutex_lock(fw_lock); - + fw_buf = fw_priv-buf; if
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
on Fri, 14 Jun 2013 18:45:48 +0300, Guenter Roeck li...@roeck-us.net wrote: On Fri, Jun 14, 2013 at 08:30:29AM -0600, Bjorn Helgaas wrote: [+cc Ming, Hayes, Francois, r8169 list] On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison nirina.raseliari...@gmail.com wrote: hello there, i have this ethernet controler: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 05) that uses the r8169 module. it works fine, but sometimes after a reboot and issueing: ifconfig eth0 192.168.1.1 up i got the message below. after another reboot the message disappears. i also get the same message this 3.9.5 and 3.9.4. it seems i catch my first oops and don't know what to do with it. currently running: cat /proc/version Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1 SMP Fri Jun 14 09:14:50 EAT 2013 uname -a Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64 Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux thanks, -8--8--- [ 57.877560] BUG: unable to handle kernel NULL pointer dereference at 0040 [ 57.877603] IP: [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.877634] PGD 21330a067 PUD 211a3a067 PMD 0 [ 57.877660] Oops: 0002 [#1] SMP [ 57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169 microcode mii [ 57.877735] CPU 0 [ 57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01 [ 57.877790] RIP: 0010:[81491844] [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.877824] RSP: 0018:8802119a7e80 EFLAGS: 00010246 [ 57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX: [ 57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI: [ 57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09: 05aa [ 57.877920] R10: R11: R12: [ 57.877945] R13: 880213d34088 R14: 0003 R15: 88020eafc230 [ 57.877970] FS: 7f3c6cb2a740() GS:88021f20() knlGS: [ 57.877998] CS: 0010 DS: ES: CR0: 80050033 [ 57.878019] CR2: 0040 CR3: 000203155000 CR4: 001407f0 [ 57.878044] DR0: DR1: DR2: [ 57.878069] DR3: DR6: 0ff0 DR7: 0400 [ 57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000, task 8802158fe250) [ 57.878124] Stack: [ 57.878133] 8802119a7eb0 81491917 880211a4d5a0 0003 [ 57.878168] 8802119a7f50 818765a0 8802119a7ec0 81483063 [ 57.878203] 8802119a7f08 8119bc9e 880213d34098 880211a4d5c0 [ 57.878237] Call Trace: [ 57.878251] [81491917] firmware_loading_store+0x77/0x150 [ 57.878275] [81483063] dev_attr_store+0x13/0x20 [ 57.878297] [8119bc9e] sysfs_write_file+0xce/0x140 [ 57.878320] [81133e8a] vfs_write+0x9a/0x160 [ 57.878340] [81134164] sys_write+0x44/0x90 [ 57.878360] [817d70ed] system_call_fastpath+0x1a/0x1f [ 57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5 f0 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f [ 57.881753] RIP [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.882888] RSP 8802119a7e80 [ 57.884019] CR2: 0040 [ 57.885166] ---[ end trace 6705f6d4ce6b6a12 ]--- Please try the following patch. patch applied and no longer have the bug message when i reboot and wake up the ethernet controller. thanks, [ Bjorn, sorry I dropped you from the recipient list, but unfortunately Google still considers me to be a spammer and doesn't let me send any e-mail to you ] Guenter -- From 9feae0b1b33721573c41fbf2323db2a12c34c725 Mon Sep 17 00:00:00 2001 From: Guenter Roeck li...@roeck-us.net Date: Fri, 14 Jun 2013 08:39:06 -0700 Subject: [PATCH] firmware: Fix race condition in firmware_loading_store Fix: BUG: unable to handle kernel NULL pointer dereference at 0040 IP: [81491844] fw_load_abort.isra.5+0x4/0x20 ... Call Trace: [81491917] firmware_loading_store+0x77/0x150 [81483063] dev_attr_store+0x13/0x20 [8119bc9e] sysfs_write_file+0xce/0x140 [81133e8a] vfs_write+0x9a/0x160 [81134164] sys_write+0x44/0x90 [817d70ed] system_call_fastpath+0x1a/0x1f Signed-off-by: Guenter Roeck li...@roeck-us.net --- drivers/base/firmware_class.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
on Fri, 14 Jun 2013 20:02:25 +0300, Ming Lei ming@canonical.com wrote: On Fri, Jun 14, 2013 at 10:30 PM, Bjorn Helgaas bhelg...@google.com wrote: [+cc Ming, Hayes, Francois, r8169 list] On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison nirina.raseliari...@gmail.com wrote: hello there, i have this ethernet controler: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 05) that uses the r8169 module. it works fine, but sometimes after a reboot and issueing: ifconfig eth0 192.168.1.1 up i got the message below. after another reboot the message disappears. i also get the same message this 3.9.5 and 3.9.4. it seems i catch my first oops and don't know what to do with it. currently running: cat /proc/version Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1 SMP Fri Jun 14 09:14:50 EAT 2013 uname -a Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64 Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux thanks, -8--8--- [ 57.877560] BUG: unable to handle kernel NULL pointer dereference at 0040 [ 57.877603] IP: [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.877634] PGD 21330a067 PUD 211a3a067 PMD 0 [ 57.877660] Oops: 0002 [#1] SMP [ 57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169 microcode mii [ 57.877735] CPU 0 [ 57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01 [ 57.877790] RIP: 0010:[81491844] [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.877824] RSP: 0018:8802119a7e80 EFLAGS: 00010246 [ 57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX: [ 57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI: [ 57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09: 05aa [ 57.877920] R10: R11: R12: [ 57.877945] R13: 880213d34088 R14: 0003 R15: 88020eafc230 [ 57.877970] FS: 7f3c6cb2a740() GS:88021f20() knlGS: [ 57.877998] CS: 0010 DS: ES: CR0: 80050033 [ 57.878019] CR2: 0040 CR3: 000203155000 CR4: 001407f0 [ 57.878044] DR0: DR1: DR2: [ 57.878069] DR3: DR6: 0ff0 DR7: 0400 [ 57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000, task 8802158fe250) [ 57.878124] Stack: [ 57.878133] 8802119a7eb0 81491917 880211a4d5a0 0003 [ 57.878168] 8802119a7f50 818765a0 8802119a7ec0 81483063 [ 57.878203] 8802119a7f08 8119bc9e 880213d34098 880211a4d5c0 [ 57.878237] Call Trace: [ 57.878251] [81491917] firmware_loading_store+0x77/0x150 [ 57.878275] [81483063] dev_attr_store+0x13/0x20 [ 57.878297] [8119bc9e] sysfs_write_file+0xce/0x140 [ 57.878320] [81133e8a] vfs_write+0x9a/0x160 [ 57.878340] [81134164] sys_write+0x44/0x90 [ 57.878360] [817d70ed] system_call_fastpath+0x1a/0x1f [ 57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5 f0 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f [ 57.881753] RIP [81491844] fw_load_abort.isra.5+0x4/0x20 [ 57.882888] RSP 8802119a7e80 [ 57.884019] CR2: 0040 [ 57.885166] ---[ end trace 6705f6d4ce6b6a12 ]--- Looks it is a double abort race, could you try below patch? (also attached for applying) i've also applied this patch and up to now, after reboot a few times all thing seems to work fine. thanks, -- diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 6ede229..a217ba8 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device *dev, struct device_attribute *attr, char *buf) { struct firmware_priv *fw_priv = to_firmware_priv(dev); - int loading = test_bit(FW_STATUS_LOADING, fw_priv-buf-status); + int loading = 0; + + mutex_lock(fw_lock); + if (fw_priv-buf) + loading = test_bit(FW_STATUS_LOADING, fw_priv-buf-status); + mutex_unlock(fw_lock); return sprintf(buf, %d\n, loading); } @@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct device *dev, const char *buf, size_t count) { struct firmware_priv *fw_priv = to_firmware_priv(dev); - struct firmware_buf *fw_buf = fw_priv-buf; + struct firmware_buf
Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison nirina.raseliari...@gmail.com wrote: patch applied and no longer have the bug message when i reboot and wake up the ethernet controller. I am wondering if Guenter's patch can fix the race really, but I'd like to see Guenter's explanation on his patch. The race should be caused by below: - request timeout triggered by internal timer - user space aborts the requests before the line in _request_firmware_load() fw_priv-buf = NULL which is run in timeout path - then the abort() called from firmware_loading_store() may use a freed fw buf since the timeout path will free the fw buffer. Considered clearing 'fw_priv-buf' in _request_firmware_load()() isn't protected by fw_lock now, so Guenter's patch can't avoid the race entirely. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/