Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-15 Thread nirinA raseliarison
on Sat, 15 Jun 2013 11:08:47 +0300, Ming Lei   
wrote:


On Sat, Jun 15, 2013 at 2:30 PM, Guenter Roeck   
wrote:

On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote:

On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
 wrote:

> patch applied and no longer have the bug message when i
> reboot and wake up the ethernet controller.

I am wondering if Guenter's patch can fix the race really, but I'd  
like to

see Guenter's explanation on his patch.

The race should be caused by below:

- request timeout triggered by internal timer

- user space aborts the requests before the line in  
_request_firmware_load()


 fw_priv->buf = NULL

which is run in timeout path

- then the abort() called from firmware_loading_store() may use a  
freed fw buf

since the timeout path will free the fw buffer.

Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't  
protected

by fw_lock now, so Guenter's patch can't avoid the race entirely.


I agree; my patch only protects one specific path, and was based on the
observation that access to fw_priv->buf is protected elsewhwere in the  
code.
My suspicion was that fw_priv->buf was freed while waiting for the  
mutex in

firmware_loading_store().

Your patch is more comprehensive.


OK, thanks for your reply.

I will post out one version for merge, and this one moves the
"fw_priv->buf = NULL;" into fw_load_abort() for simplifying change.


this is just to let you know that i've tested Ming Lei's latest patch.
thank you very much for the fix and the explanation.

--
nirinA
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-15 Thread Ming Lei
On Sat, Jun 15, 2013 at 2:30 PM, Guenter Roeck  wrote:
> On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote:
>> On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
>>  wrote:
>>
>> > patch applied and no longer have the bug message when i
>> > reboot and wake up the ethernet controller.
>>
>> I am wondering if Guenter's patch can fix the race really, but I'd like to
>> see Guenter's explanation on his patch.
>>
>> The race should be caused by below:
>>
>> - request timeout triggered by internal timer
>>
>> - user space aborts the requests before the line in _request_firmware_load()
>>
>>  fw_priv->buf = NULL
>>
>> which is run in timeout path
>>
>> - then the abort() called from firmware_loading_store() may use a freed fw 
>> buf
>> since the timeout path will free the fw buffer.
>>
>> Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't 
>> protected
>> by fw_lock now, so Guenter's patch can't avoid the race entirely.
>>
> I agree; my patch only protects one specific path, and was based on the
> observation that access to fw_priv->buf is protected elsewhwere in the code.
> My suspicion was that fw_priv->buf was freed while waiting for the mutex in
> firmware_loading_store().
>
> Your patch is more comprehensive.

OK, thanks for your reply.

I will post out one version for merge, and this one moves the
"fw_priv->buf = NULL;" into fw_load_abort() for simplifying change.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-15 Thread Guenter Roeck
On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote:
> On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
>  wrote:
> 
> > patch applied and no longer have the bug message when i
> > reboot and wake up the ethernet controller.
> 
> I am wondering if Guenter's patch can fix the race really, but I'd like to
> see Guenter's explanation on his patch.
> 
> The race should be caused by below:
> 
> - request timeout triggered by internal timer
> 
> - user space aborts the requests before the line in _request_firmware_load()
> 
>  fw_priv->buf = NULL
> 
> which is run in timeout path
> 
> - then the abort() called from firmware_loading_store() may use a freed fw buf
> since the timeout path will free the fw buffer.
> 
> Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't 
> protected
> by fw_lock now, so Guenter's patch can't avoid the race entirely.
> 
I agree; my patch only protects one specific path, and was based on the
observation that access to fw_priv->buf is protected elsewhwere in the code.
My suspicion was that fw_priv->buf was freed while waiting for the mutex in
firmware_loading_store().

Your patch is more comprehensive.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-15 Thread Guenter Roeck
On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote:
 On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
 nirina.raseliari...@gmail.com wrote:
 
  patch applied and no longer have the bug message when i
  reboot and wake up the ethernet controller.
 
 I am wondering if Guenter's patch can fix the race really, but I'd like to
 see Guenter's explanation on his patch.
 
 The race should be caused by below:
 
 - request timeout triggered by internal timer
 
 - user space aborts the requests before the line in _request_firmware_load()
 
  fw_priv-buf = NULL
 
 which is run in timeout path
 
 - then the abort() called from firmware_loading_store() may use a freed fw buf
 since the timeout path will free the fw buffer.
 
 Considered clearing 'fw_priv-buf' in _request_firmware_load()() isn't 
 protected
 by fw_lock now, so Guenter's patch can't avoid the race entirely.
 
I agree; my patch only protects one specific path, and was based on the
observation that access to fw_priv-buf is protected elsewhwere in the code.
My suspicion was that fw_priv-buf was freed while waiting for the mutex in
firmware_loading_store().

Your patch is more comprehensive.

Guenter
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-15 Thread Ming Lei
On Sat, Jun 15, 2013 at 2:30 PM, Guenter Roeck li...@roeck-us.net wrote:
 On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote:
 On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
 nirina.raseliari...@gmail.com wrote:

  patch applied and no longer have the bug message when i
  reboot and wake up the ethernet controller.

 I am wondering if Guenter's patch can fix the race really, but I'd like to
 see Guenter's explanation on his patch.

 The race should be caused by below:

 - request timeout triggered by internal timer

 - user space aborts the requests before the line in _request_firmware_load()

  fw_priv-buf = NULL

 which is run in timeout path

 - then the abort() called from firmware_loading_store() may use a freed fw 
 buf
 since the timeout path will free the fw buffer.

 Considered clearing 'fw_priv-buf' in _request_firmware_load()() isn't 
 protected
 by fw_lock now, so Guenter's patch can't avoid the race entirely.

 I agree; my patch only protects one specific path, and was based on the
 observation that access to fw_priv-buf is protected elsewhwere in the code.
 My suspicion was that fw_priv-buf was freed while waiting for the mutex in
 firmware_loading_store().

 Your patch is more comprehensive.

OK, thanks for your reply.

I will post out one version for merge, and this one moves the
fw_priv-buf = NULL; into fw_load_abort() for simplifying change.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-15 Thread nirinA raseliarison
on Sat, 15 Jun 2013 11:08:47 +0300, Ming Lei ming@canonical.com  
wrote:


On Sat, Jun 15, 2013 at 2:30 PM, Guenter Roeck li...@roeck-us.net  
wrote:

On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote:

On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
nirina.raseliari...@gmail.com wrote:

 patch applied and no longer have the bug message when i
 reboot and wake up the ethernet controller.

I am wondering if Guenter's patch can fix the race really, but I'd  
like to

see Guenter's explanation on his patch.

The race should be caused by below:

- request timeout triggered by internal timer

- user space aborts the requests before the line in  
_request_firmware_load()


 fw_priv-buf = NULL

which is run in timeout path

- then the abort() called from firmware_loading_store() may use a  
freed fw buf

since the timeout path will free the fw buffer.

Considered clearing 'fw_priv-buf' in _request_firmware_load()() isn't  
protected

by fw_lock now, so Guenter's patch can't avoid the race entirely.


I agree; my patch only protects one specific path, and was based on the
observation that access to fw_priv-buf is protected elsewhwere in the  
code.
My suspicion was that fw_priv-buf was freed while waiting for the  
mutex in

firmware_loading_store().

Your patch is more comprehensive.


OK, thanks for your reply.

I will post out one version for merge, and this one moves the
fw_priv-buf = NULL; into fw_load_abort() for simplifying change.


this is just to let you know that i've tested Ming Lei's latest patch.
thank you very much for the fix and the explanation.

--
nirinA
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread Ming Lei
On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
 wrote:

> patch applied and no longer have the bug message when i
> reboot and wake up the ethernet controller.

I am wondering if Guenter's patch can fix the race really, but I'd like to
see Guenter's explanation on his patch.

The race should be caused by below:

- request timeout triggered by internal timer

- user space aborts the requests before the line in _request_firmware_load()

 fw_priv->buf = NULL

which is run in timeout path

- then the abort() called from firmware_loading_store() may use a freed fw buf
since the timeout path will free the fw buffer.

Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't protected
by fw_lock now, so Guenter's patch can't avoid the race entirely.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread nirinA raseliarison
on Fri, 14 Jun 2013 20:02:25 +0300, Ming Lei   
wrote:


On Fri, Jun 14, 2013 at 10:30 PM, Bjorn Helgaas   
wrote:

[+cc Ming, Hayes, Francois, r8169 list]

On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
 wrote:

hello there,
i have this ethernet controler:

 Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast  
Ethernet

controller (rev 05)

that uses the r8169 module.
it works fine, but sometimes after a reboot and issueing:

 ifconfig eth0 192.168.1.1 up

i got the message below. after another reboot the
message disappears. i also get the same message this 3.9.5 and 3.9.4.

it seems i catch my first oops and don't know what to do with it.
currently running:

 cat /proc/version
 Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1  
(GCC) ) #1

SMP Fri Jun 14 09:14:50 EAT 2013

 uname -a
 Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013  
x86_64

Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux

thanks,
-8<--8<---

[   57.877560] BUG: unable to handle kernel NULL pointer dereference at
0040
[   57.877603] IP: [] fw_load_abort.isra.5+0x4/0x20
[   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
[   57.877660] Oops: 0002 [#1] SMP
[   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev  
r8169

microcode mii
[   57.877735] CPU 0
[   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1  
To be

filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
[   57.877790] RIP: 0010:[]  []
fw_load_abort.isra.5+0x4/0x20
[   57.877824] RSP: 0018:8802119a7e80  EFLAGS: 00010246
[   57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX:

[   57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI:

[   57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09:
05aa
[   57.877920] R10:  R11:  R12:

[   57.877945] R13: 880213d34088 R14: 0003 R15:
88020eafc230
[   57.877970] FS:  7f3c6cb2a740() GS:88021f20()
knlGS:
[   57.877998] CS:  0010 DS:  ES:  CR0: 80050033
[   57.878019] CR2: 0040 CR3: 000203155000 CR4:
001407f0
[   57.878044] DR0:  DR1:  DR2:

[   57.878069] DR3:  DR6: 0ff0 DR7:
0400
[   57.878094] Process firmware (pid: 1950, threadinfo  
8802119a6000,

task 8802158fe250)
[   57.878124] Stack:
[   57.878133]  8802119a7eb0 81491917 880211a4d5a0
0003
[   57.878168]  8802119a7f50 818765a0 8802119a7ec0
81483063
[   57.878203]  8802119a7f08 8119bc9e 880213d34098
880211a4d5c0
[   57.878237] Call Trace:
[   57.878251]  [] firmware_loading_store+0x77/0x150
[   57.878275]  [] dev_attr_store+0x13/0x20
[   57.878297]  [] sysfs_write_file+0xce/0x140
[   57.878320]  [] vfs_write+0x9a/0x160
[   57.878340]  [] sys_write+0x44/0x90
[   57.878360]  [] system_call_fastpath+0x1a/0x1f
[   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79  
ff ff
ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48  
89 e5

 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
[   57.881753] RIP  [] fw_load_abort.isra.5+0x4/0x20
[   57.882888]  RSP 
[   57.884019] CR2: 0040
[   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---


Looks it is a double abort race, could you try below patch?
(also attached for applying)


i've also applied this patch and up to now, after
reboot a few times all thing seems to work fine.

thanks,


--
diff --git a/drivers/base/firmware_class.c  
b/drivers/base/firmware_class.c

index 6ede229..a217ba8 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device  
*dev,

 struct device_attribute *attr, char *buf)
 {
struct firmware_priv *fw_priv = to_firmware_priv(dev);
-   int loading = test_bit(FW_STATUS_LOADING, _priv->buf->status);
+   int loading = 0;
+
+   mutex_lock(_lock);
+   if (fw_priv->buf)
+   loading = test_bit(FW_STATUS_LOADING, _priv->buf->status);
+   mutex_unlock(_lock);

return sprintf(buf, "%d\n", loading);
 }
@@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct  
device *dev,

  const char *buf, size_t count)
 {
struct firmware_priv *fw_priv = to_firmware_priv(dev);
-   struct firmware_buf *fw_buf = fw_priv->buf;
+   struct firmware_buf *fw_buf;
int loading = simple_strtol(buf, NULL, 10);
int i;

mutex_lock(_lock);
-
+   fw_buf = fw_priv->buf;
if (!fw_buf)
goto out;

@@ -636,6 +641,7 @@ static ssize_t firmware_loading_store(struct 

Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread nirinA raseliarison
on Fri, 14 Jun 2013 18:45:48 +0300, Guenter Roeck   
wrote:



On Fri, Jun 14, 2013 at 08:30:29AM -0600, Bjorn Helgaas wrote:

[+cc Ming, Hayes, Francois, r8169 list]

On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
 wrote:
> hello there,
> i have this ethernet controler:
>
>  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast  
Ethernet

> controller (rev 05)
>
> that uses the r8169 module.
> it works fine, but sometimes after a reboot and issueing:
>
>  ifconfig eth0 192.168.1.1 up
>
> i got the message below. after another reboot the
> message disappears. i also get the same message this 3.9.5 and 3.9.4.
>
> it seems i catch my first oops and don't know what to do with it.
> currently running:
>
>  cat /proc/version
>  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1  
(GCC) ) #1

> SMP Fri Jun 14 09:14:50 EAT 2013
>
>  uname -a
>  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013  
x86_64

> Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux
>
> thanks,
>  
-8<--8<---

>
> [   57.877560] BUG: unable to handle kernel NULL pointer dereference  
at

> 0040
> [   57.877603] IP: [] fw_load_abort.isra.5+0x4/0x20
> [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
> [   57.877660] Oops: 0002 [#1] SMP
> [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev  
r8169

> microcode mii
> [   57.877735] CPU 0
> [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614  
#1 To be

> filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
> [   57.877790] RIP: 0010:[]  []
> fw_load_abort.isra.5+0x4/0x20
> [   57.877824] RSP: 0018:8802119a7e80  EFLAGS: 00010246
> [   57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX:
> 
> [   57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI:
> 
> [   57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09:
> 05aa
> [   57.877920] R10:  R11:  R12:
> 
> [   57.877945] R13: 880213d34088 R14: 0003 R15:
> 88020eafc230
> [   57.877970] FS:  7f3c6cb2a740() GS:88021f20()
> knlGS:
> [   57.877998] CS:  0010 DS:  ES:  CR0: 80050033
> [   57.878019] CR2: 0040 CR3: 000203155000 CR4:
> 001407f0
> [   57.878044] DR0:  DR1:  DR2:
> 
> [   57.878069] DR3:  DR6: 0ff0 DR7:
> 0400
> [   57.878094] Process firmware (pid: 1950, threadinfo  
8802119a6000,

> task 8802158fe250)
> [   57.878124] Stack:
> [   57.878133]  8802119a7eb0 81491917 880211a4d5a0
> 0003
> [   57.878168]  8802119a7f50 818765a0 8802119a7ec0
> 81483063
> [   57.878203]  8802119a7f08 8119bc9e 880213d34098
> 880211a4d5c0
> [   57.878237] Call Trace:
> [   57.878251]  [] firmware_loading_store+0x77/0x150
> [   57.878275]  [] dev_attr_store+0x13/0x20
> [   57.878297]  [] sysfs_write_file+0xce/0x140
> [   57.878320]  [] vfs_write+0x9a/0x160
> [   57.878340]  [] sys_write+0x44/0x90
> [   57.878360]  [] system_call_fastpath+0x1a/0x1f
> [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79  
ff ff
> ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48  
89 e5

>  80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
> [   57.881753] RIP  [] fw_load_abort.isra.5+0x4/0x20
> [   57.882888]  RSP 
> [   57.884019] CR2: 0040
> [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---
>


Please try the following patch.


patch applied and no longer have the bug message when i
reboot and wake up the ethernet controller.

thanks,


[ Bjorn, sorry I dropped you from the recipient list, but unfortunately
Google still considers me to be a spammer and doesn't let me send any
e-mail to you ]

Guenter

--

From 9feae0b1b33721573c41fbf2323db2a12c34c725 Mon Sep 17 00:00:00 2001
From: Guenter Roeck 
Date: Fri, 14 Jun 2013 08:39:06 -0700
Subject: [PATCH] firmware: Fix race condition in firmware_loading_store

Fix:

BUG: unable to handle kernel NULL pointer dereference at 0040
IP: [] fw_load_abort.isra.5+0x4/0x20
...
Call Trace:
[] firmware_loading_store+0x77/0x150
[] dev_attr_store+0x13/0x20
[] sysfs_write_file+0xce/0x140
[] vfs_write+0x9a/0x160
[] sys_write+0x44/0x90
[] system_call_fastpath+0x1a/0x1f

Signed-off-by: Guenter Roeck 
---
 drivers/base/firmware_class.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/base/firmware_class.c  
b/drivers/base/firmware_class.c

index 4b1f926..f34b489 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -570,12 +570,13 @@ static ssize_t firmware_loading_store(struct  
device *dev,

  const char *buf, size_t 

Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread Ming Lei
On Fri, Jun 14, 2013 at 10:30 PM, Bjorn Helgaas  wrote:
> [+cc Ming, Hayes, Francois, r8169 list]
>
> On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
>  wrote:
>> hello there,
>> i have this ethernet controler:
>>
>>  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet
>> controller (rev 05)
>>
>> that uses the r8169 module.
>> it works fine, but sometimes after a reboot and issueing:
>>
>>  ifconfig eth0 192.168.1.1 up
>>
>> i got the message below. after another reboot the
>> message disappears. i also get the same message this 3.9.5 and 3.9.4.
>>
>> it seems i catch my first oops and don't know what to do with it.
>> currently running:
>>
>>  cat /proc/version
>>  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1
>> SMP Fri Jun 14 09:14:50 EAT 2013
>>
>>  uname -a
>>  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64
>> Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux
>>
>> thanks,
>> -8<--8<---
>>
>> [   57.877560] BUG: unable to handle kernel NULL pointer dereference at
>> 0040
>> [   57.877603] IP: [] fw_load_abort.isra.5+0x4/0x20
>> [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
>> [   57.877660] Oops: 0002 [#1] SMP
>> [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169
>> microcode mii
>> [   57.877735] CPU 0
>> [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be
>> filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
>> [   57.877790] RIP: 0010:[]  []
>> fw_load_abort.isra.5+0x4/0x20
>> [   57.877824] RSP: 0018:8802119a7e80  EFLAGS: 00010246
>> [   57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX:
>> 
>> [   57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI:
>> 
>> [   57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09:
>> 05aa
>> [   57.877920] R10:  R11:  R12:
>> 
>> [   57.877945] R13: 880213d34088 R14: 0003 R15:
>> 88020eafc230
>> [   57.877970] FS:  7f3c6cb2a740() GS:88021f20()
>> knlGS:
>> [   57.877998] CS:  0010 DS:  ES:  CR0: 80050033
>> [   57.878019] CR2: 0040 CR3: 000203155000 CR4:
>> 001407f0
>> [   57.878044] DR0:  DR1:  DR2:
>> 
>> [   57.878069] DR3:  DR6: 0ff0 DR7:
>> 0400
>> [   57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000,
>> task 8802158fe250)
>> [   57.878124] Stack:
>> [   57.878133]  8802119a7eb0 81491917 880211a4d5a0
>> 0003
>> [   57.878168]  8802119a7f50 818765a0 8802119a7ec0
>> 81483063
>> [   57.878203]  8802119a7f08 8119bc9e 880213d34098
>> 880211a4d5c0
>> [   57.878237] Call Trace:
>> [   57.878251]  [] firmware_loading_store+0x77/0x150
>> [   57.878275]  [] dev_attr_store+0x13/0x20
>> [   57.878297]  [] sysfs_write_file+0xce/0x140
>> [   57.878320]  [] vfs_write+0x9a/0x160
>> [   57.878340]  [] sys_write+0x44/0x90
>> [   57.878360]  [] system_call_fastpath+0x1a/0x1f
>> [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff
>> ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5
>>  80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
>> [   57.881753] RIP  [] fw_load_abort.isra.5+0x4/0x20
>> [   57.882888]  RSP 
>> [   57.884019] CR2: 0040
>> [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---

Looks it is a double abort race, could you try below patch?
(also attached for applying)

--
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 6ede229..a217ba8 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device *dev,
 struct device_attribute *attr, char *buf)
 {
struct firmware_priv *fw_priv = to_firmware_priv(dev);
-   int loading = test_bit(FW_STATUS_LOADING, _priv->buf->status);
+   int loading = 0;
+
+   mutex_lock(_lock);
+   if (fw_priv->buf)
+   loading = test_bit(FW_STATUS_LOADING, _priv->buf->status);
+   mutex_unlock(_lock);

return sprintf(buf, "%d\n", loading);
 }
@@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct device *dev,
  const char *buf, size_t count)
 {
struct firmware_priv *fw_priv = to_firmware_priv(dev);
-   struct firmware_buf *fw_buf = fw_priv->buf;
+   struct firmware_buf *fw_buf;
int loading = simple_strtol(buf, NULL, 10);
int i;

mutex_lock(_lock);
-
+   fw_buf = fw_priv->buf;
if (!fw_buf)
goto out;

@@ -636,6 +641,7 @@ 

Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread Guenter Roeck
On Fri, Jun 14, 2013 at 08:30:29AM -0600, Bjorn Helgaas wrote:
> [+cc Ming, Hayes, Francois, r8169 list]
> 
> On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
>  wrote:
> > hello there,
> > i have this ethernet controler:
> >
> >  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet
> > controller (rev 05)
> >
> > that uses the r8169 module.
> > it works fine, but sometimes after a reboot and issueing:
> >
> >  ifconfig eth0 192.168.1.1 up
> >
> > i got the message below. after another reboot the
> > message disappears. i also get the same message this 3.9.5 and 3.9.4.
> >
> > it seems i catch my first oops and don't know what to do with it.
> > currently running:
> >
> >  cat /proc/version
> >  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1
> > SMP Fri Jun 14 09:14:50 EAT 2013
> >
> >  uname -a
> >  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64
> > Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux
> >
> > thanks,
> > -8<--8<---
> >
> > [   57.877560] BUG: unable to handle kernel NULL pointer dereference at
> > 0040
> > [   57.877603] IP: [] fw_load_abort.isra.5+0x4/0x20
> > [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
> > [   57.877660] Oops: 0002 [#1] SMP
> > [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169
> > microcode mii
> > [   57.877735] CPU 0
> > [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be
> > filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
> > [   57.877790] RIP: 0010:[]  []
> > fw_load_abort.isra.5+0x4/0x20
> > [   57.877824] RSP: 0018:8802119a7e80  EFLAGS: 00010246
> > [   57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX:
> > 
> > [   57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI:
> > 
> > [   57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09:
> > 05aa
> > [   57.877920] R10:  R11:  R12:
> > 
> > [   57.877945] R13: 880213d34088 R14: 0003 R15:
> > 88020eafc230
> > [   57.877970] FS:  7f3c6cb2a740() GS:88021f20()
> > knlGS:
> > [   57.877998] CS:  0010 DS:  ES:  CR0: 80050033
> > [   57.878019] CR2: 0040 CR3: 000203155000 CR4:
> > 001407f0
> > [   57.878044] DR0:  DR1:  DR2:
> > 
> > [   57.878069] DR3:  DR6: 0ff0 DR7:
> > 0400
> > [   57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000,
> > task 8802158fe250)
> > [   57.878124] Stack:
> > [   57.878133]  8802119a7eb0 81491917 880211a4d5a0
> > 0003
> > [   57.878168]  8802119a7f50 818765a0 8802119a7ec0
> > 81483063
> > [   57.878203]  8802119a7f08 8119bc9e 880213d34098
> > 880211a4d5c0
> > [   57.878237] Call Trace:
> > [   57.878251]  [] firmware_loading_store+0x77/0x150
> > [   57.878275]  [] dev_attr_store+0x13/0x20
> > [   57.878297]  [] sysfs_write_file+0xce/0x140
> > [   57.878320]  [] vfs_write+0x9a/0x160
> > [   57.878340]  [] sys_write+0x44/0x90
> > [   57.878360]  [] system_call_fastpath+0x1a/0x1f
> > [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff
> > ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5
> >  80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
> > [   57.881753] RIP  [] fw_load_abort.isra.5+0x4/0x20
> > [   57.882888]  RSP 
> > [   57.884019] CR2: 0040
> > [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---
> >

Please try the following patch.

[ Bjorn, sorry I dropped you from the recipient list, but unfortunately
Google still considers me to be a spammer and doesn't let me send any
e-mail to you ]

Guenter

--

>From 9feae0b1b33721573c41fbf2323db2a12c34c725 Mon Sep 17 00:00:00 2001
From: Guenter Roeck 
Date: Fri, 14 Jun 2013 08:39:06 -0700
Subject: [PATCH] firmware: Fix race condition in firmware_loading_store

Fix:

BUG: unable to handle kernel NULL pointer dereference at 0040
IP: [] fw_load_abort.isra.5+0x4/0x20
...
Call Trace:
[] firmware_loading_store+0x77/0x150
[] dev_attr_store+0x13/0x20
[] sysfs_write_file+0xce/0x140
[] vfs_write+0x9a/0x160
[] sys_write+0x44/0x90
[] system_call_fastpath+0x1a/0x1f

Signed-off-by: Guenter Roeck 
---
 drivers/base/firmware_class.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 4b1f926..f34b489 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -570,12 +570,13 @@ static ssize_t firmware_loading_store(struct device *dev,
  const char *buf, size_t count)
 {
struct 

Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread Bjorn Helgaas
[+cc Ming, Hayes, Francois, r8169 list]

On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
 wrote:
> hello there,
> i have this ethernet controler:
>
>  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet
> controller (rev 05)
>
> that uses the r8169 module.
> it works fine, but sometimes after a reboot and issueing:
>
>  ifconfig eth0 192.168.1.1 up
>
> i got the message below. after another reboot the
> message disappears. i also get the same message this 3.9.5 and 3.9.4.
>
> it seems i catch my first oops and don't know what to do with it.
> currently running:
>
>  cat /proc/version
>  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1
> SMP Fri Jun 14 09:14:50 EAT 2013
>
>  uname -a
>  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64
> Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux
>
> thanks,
> -8<--8<---
>
> [   57.877560] BUG: unable to handle kernel NULL pointer dereference at
> 0040
> [   57.877603] IP: [] fw_load_abort.isra.5+0x4/0x20
> [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
> [   57.877660] Oops: 0002 [#1] SMP
> [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169
> microcode mii
> [   57.877735] CPU 0
> [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be
> filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
> [   57.877790] RIP: 0010:[]  []
> fw_load_abort.isra.5+0x4/0x20
> [   57.877824] RSP: 0018:8802119a7e80  EFLAGS: 00010246
> [   57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX:
> 
> [   57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI:
> 
> [   57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09:
> 05aa
> [   57.877920] R10:  R11:  R12:
> 
> [   57.877945] R13: 880213d34088 R14: 0003 R15:
> 88020eafc230
> [   57.877970] FS:  7f3c6cb2a740() GS:88021f20()
> knlGS:
> [   57.877998] CS:  0010 DS:  ES:  CR0: 80050033
> [   57.878019] CR2: 0040 CR3: 000203155000 CR4:
> 001407f0
> [   57.878044] DR0:  DR1:  DR2:
> 
> [   57.878069] DR3:  DR6: 0ff0 DR7:
> 0400
> [   57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000,
> task 8802158fe250)
> [   57.878124] Stack:
> [   57.878133]  8802119a7eb0 81491917 880211a4d5a0
> 0003
> [   57.878168]  8802119a7f50 818765a0 8802119a7ec0
> 81483063
> [   57.878203]  8802119a7f08 8119bc9e 880213d34098
> 880211a4d5c0
> [   57.878237] Call Trace:
> [   57.878251]  [] firmware_loading_store+0x77/0x150
> [   57.878275]  [] dev_attr_store+0x13/0x20
> [   57.878297]  [] sysfs_write_file+0xce/0x140
> [   57.878320]  [] vfs_write+0x9a/0x160
> [   57.878340]  [] sys_write+0x44/0x90
> [   57.878360]  [] system_call_fastpath+0x1a/0x1f
> [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff
> ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5
>  80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
> [   57.881753] RIP  [] fw_load_abort.isra.5+0x4/0x20
> [   57.882888]  RSP 
> [   57.884019] CR2: 0040
> [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---
>
> --
> nirinA
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread Bjorn Helgaas
[+cc Ming, Hayes, Francois, r8169 list]

On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
nirina.raseliari...@gmail.com wrote:
 hello there,
 i have this ethernet controler:

  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet
 controller (rev 05)

 that uses the r8169 module.
 it works fine, but sometimes after a reboot and issueing:

  ifconfig eth0 192.168.1.1 up

 i got the message below. after another reboot the
 message disappears. i also get the same message this 3.9.5 and 3.9.4.

 it seems i catch my first oops and don't know what to do with it.
 currently running:

  cat /proc/version
  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1
 SMP Fri Jun 14 09:14:50 EAT 2013

  uname -a
  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64
 Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux

 thanks,
 -8--8---

 [   57.877560] BUG: unable to handle kernel NULL pointer dereference at
 0040
 [   57.877603] IP: [81491844] fw_load_abort.isra.5+0x4/0x20
 [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
 [   57.877660] Oops: 0002 [#1] SMP
 [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169
 microcode mii
 [   57.877735] CPU 0
 [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be
 filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
 [   57.877790] RIP: 0010:[81491844]  [81491844]
 fw_load_abort.isra.5+0x4/0x20
 [   57.877824] RSP: 0018:8802119a7e80  EFLAGS: 00010246
 [   57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX:
 
 [   57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI:
 
 [   57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09:
 05aa
 [   57.877920] R10:  R11:  R12:
 
 [   57.877945] R13: 880213d34088 R14: 0003 R15:
 88020eafc230
 [   57.877970] FS:  7f3c6cb2a740() GS:88021f20()
 knlGS:
 [   57.877998] CS:  0010 DS:  ES:  CR0: 80050033
 [   57.878019] CR2: 0040 CR3: 000203155000 CR4:
 001407f0
 [   57.878044] DR0:  DR1:  DR2:
 
 [   57.878069] DR3:  DR6: 0ff0 DR7:
 0400
 [   57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000,
 task 8802158fe250)
 [   57.878124] Stack:
 [   57.878133]  8802119a7eb0 81491917 880211a4d5a0
 0003
 [   57.878168]  8802119a7f50 818765a0 8802119a7ec0
 81483063
 [   57.878203]  8802119a7f08 8119bc9e 880213d34098
 880211a4d5c0
 [   57.878237] Call Trace:
 [   57.878251]  [81491917] firmware_loading_store+0x77/0x150
 [   57.878275]  [81483063] dev_attr_store+0x13/0x20
 [   57.878297]  [8119bc9e] sysfs_write_file+0xce/0x140
 [   57.878320]  [81133e8a] vfs_write+0x9a/0x160
 [   57.878340]  [81134164] sys_write+0x44/0x90
 [   57.878360]  [817d70ed] system_call_fastpath+0x1a/0x1f
 [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff
 ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5
 f0 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
 [   57.881753] RIP  [81491844] fw_load_abort.isra.5+0x4/0x20
 [   57.882888]  RSP 8802119a7e80
 [   57.884019] CR2: 0040
 [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---

 --
 nirinA
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread Guenter Roeck
On Fri, Jun 14, 2013 at 08:30:29AM -0600, Bjorn Helgaas wrote:
 [+cc Ming, Hayes, Francois, r8169 list]
 
 On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
 nirina.raseliari...@gmail.com wrote:
  hello there,
  i have this ethernet controler:
 
   Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet
  controller (rev 05)
 
  that uses the r8169 module.
  it works fine, but sometimes after a reboot and issueing:
 
   ifconfig eth0 192.168.1.1 up
 
  i got the message below. after another reboot the
  message disappears. i also get the same message this 3.9.5 and 3.9.4.
 
  it seems i catch my first oops and don't know what to do with it.
  currently running:
 
   cat /proc/version
   Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1
  SMP Fri Jun 14 09:14:50 EAT 2013
 
   uname -a
   Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64
  Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux
 
  thanks,
  -8--8---
 
  [   57.877560] BUG: unable to handle kernel NULL pointer dereference at
  0040
  [   57.877603] IP: [81491844] fw_load_abort.isra.5+0x4/0x20
  [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
  [   57.877660] Oops: 0002 [#1] SMP
  [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169
  microcode mii
  [   57.877735] CPU 0
  [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be
  filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
  [   57.877790] RIP: 0010:[81491844]  [81491844]
  fw_load_abort.isra.5+0x4/0x20
  [   57.877824] RSP: 0018:8802119a7e80  EFLAGS: 00010246
  [   57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX:
  
  [   57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI:
  
  [   57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09:
  05aa
  [   57.877920] R10:  R11:  R12:
  
  [   57.877945] R13: 880213d34088 R14: 0003 R15:
  88020eafc230
  [   57.877970] FS:  7f3c6cb2a740() GS:88021f20()
  knlGS:
  [   57.877998] CS:  0010 DS:  ES:  CR0: 80050033
  [   57.878019] CR2: 0040 CR3: 000203155000 CR4:
  001407f0
  [   57.878044] DR0:  DR1:  DR2:
  
  [   57.878069] DR3:  DR6: 0ff0 DR7:
  0400
  [   57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000,
  task 8802158fe250)
  [   57.878124] Stack:
  [   57.878133]  8802119a7eb0 81491917 880211a4d5a0
  0003
  [   57.878168]  8802119a7f50 818765a0 8802119a7ec0
  81483063
  [   57.878203]  8802119a7f08 8119bc9e 880213d34098
  880211a4d5c0
  [   57.878237] Call Trace:
  [   57.878251]  [81491917] firmware_loading_store+0x77/0x150
  [   57.878275]  [81483063] dev_attr_store+0x13/0x20
  [   57.878297]  [8119bc9e] sysfs_write_file+0xce/0x140
  [   57.878320]  [81133e8a] vfs_write+0x9a/0x160
  [   57.878340]  [81134164] sys_write+0x44/0x90
  [   57.878360]  [817d70ed] system_call_fastpath+0x1a/0x1f
  [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff
  ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5
  f0 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
  [   57.881753] RIP  [81491844] fw_load_abort.isra.5+0x4/0x20
  [   57.882888]  RSP 8802119a7e80
  [   57.884019] CR2: 0040
  [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---
 

Please try the following patch.

[ Bjorn, sorry I dropped you from the recipient list, but unfortunately
Google still considers me to be a spammer and doesn't let me send any
e-mail to you ]

Guenter

--

From 9feae0b1b33721573c41fbf2323db2a12c34c725 Mon Sep 17 00:00:00 2001
From: Guenter Roeck li...@roeck-us.net
Date: Fri, 14 Jun 2013 08:39:06 -0700
Subject: [PATCH] firmware: Fix race condition in firmware_loading_store

Fix:

BUG: unable to handle kernel NULL pointer dereference at 0040
IP: [81491844] fw_load_abort.isra.5+0x4/0x20
...
Call Trace:
[81491917] firmware_loading_store+0x77/0x150
[81483063] dev_attr_store+0x13/0x20
[8119bc9e] sysfs_write_file+0xce/0x140
[81133e8a] vfs_write+0x9a/0x160
[81134164] sys_write+0x44/0x90
[817d70ed] system_call_fastpath+0x1a/0x1f

Signed-off-by: Guenter Roeck li...@roeck-us.net
---
 drivers/base/firmware_class.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 4b1f926..f34b489 100644
--- a/drivers/base/firmware_class.c
+++ 

Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread Ming Lei
On Fri, Jun 14, 2013 at 10:30 PM, Bjorn Helgaas bhelg...@google.com wrote:
 [+cc Ming, Hayes, Francois, r8169 list]

 On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
 nirina.raseliari...@gmail.com wrote:
 hello there,
 i have this ethernet controler:

  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet
 controller (rev 05)

 that uses the r8169 module.
 it works fine, but sometimes after a reboot and issueing:

  ifconfig eth0 192.168.1.1 up

 i got the message below. after another reboot the
 message disappears. i also get the same message this 3.9.5 and 3.9.4.

 it seems i catch my first oops and don't know what to do with it.
 currently running:

  cat /proc/version
  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1
 SMP Fri Jun 14 09:14:50 EAT 2013

  uname -a
  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64
 Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux

 thanks,
 -8--8---

 [   57.877560] BUG: unable to handle kernel NULL pointer dereference at
 0040
 [   57.877603] IP: [81491844] fw_load_abort.isra.5+0x4/0x20
 [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
 [   57.877660] Oops: 0002 [#1] SMP
 [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169
 microcode mii
 [   57.877735] CPU 0
 [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be
 filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
 [   57.877790] RIP: 0010:[81491844]  [81491844]
 fw_load_abort.isra.5+0x4/0x20
 [   57.877824] RSP: 0018:8802119a7e80  EFLAGS: 00010246
 [   57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX:
 
 [   57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI:
 
 [   57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09:
 05aa
 [   57.877920] R10:  R11:  R12:
 
 [   57.877945] R13: 880213d34088 R14: 0003 R15:
 88020eafc230
 [   57.877970] FS:  7f3c6cb2a740() GS:88021f20()
 knlGS:
 [   57.877998] CS:  0010 DS:  ES:  CR0: 80050033
 [   57.878019] CR2: 0040 CR3: 000203155000 CR4:
 001407f0
 [   57.878044] DR0:  DR1:  DR2:
 
 [   57.878069] DR3:  DR6: 0ff0 DR7:
 0400
 [   57.878094] Process firmware (pid: 1950, threadinfo 8802119a6000,
 task 8802158fe250)
 [   57.878124] Stack:
 [   57.878133]  8802119a7eb0 81491917 880211a4d5a0
 0003
 [   57.878168]  8802119a7f50 818765a0 8802119a7ec0
 81483063
 [   57.878203]  8802119a7f08 8119bc9e 880213d34098
 880211a4d5c0
 [   57.878237] Call Trace:
 [   57.878251]  [81491917] firmware_loading_store+0x77/0x150
 [   57.878275]  [81483063] dev_attr_store+0x13/0x20
 [   57.878297]  [8119bc9e] sysfs_write_file+0xce/0x140
 [   57.878320]  [81133e8a] vfs_write+0x9a/0x160
 [   57.878340]  [81134164] sys_write+0x44/0x90
 [   57.878360]  [817d70ed] system_call_fastpath+0x1a/0x1f
 [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff
 ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5
 f0 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
 [   57.881753] RIP  [81491844] fw_load_abort.isra.5+0x4/0x20
 [   57.882888]  RSP 8802119a7e80
 [   57.884019] CR2: 0040
 [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---

Looks it is a double abort race, could you try below patch?
(also attached for applying)

--
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 6ede229..a217ba8 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device *dev,
 struct device_attribute *attr, char *buf)
 {
struct firmware_priv *fw_priv = to_firmware_priv(dev);
-   int loading = test_bit(FW_STATUS_LOADING, fw_priv-buf-status);
+   int loading = 0;
+
+   mutex_lock(fw_lock);
+   if (fw_priv-buf)
+   loading = test_bit(FW_STATUS_LOADING, fw_priv-buf-status);
+   mutex_unlock(fw_lock);

return sprintf(buf, %d\n, loading);
 }
@@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct device *dev,
  const char *buf, size_t count)
 {
struct firmware_priv *fw_priv = to_firmware_priv(dev);
-   struct firmware_buf *fw_buf = fw_priv-buf;
+   struct firmware_buf *fw_buf;
int loading = simple_strtol(buf, NULL, 10);
int i;

mutex_lock(fw_lock);
-
+   fw_buf = fw_priv-buf;
if 

Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread nirinA raseliarison
on Fri, 14 Jun 2013 18:45:48 +0300, Guenter Roeck li...@roeck-us.net  
wrote:



On Fri, Jun 14, 2013 at 08:30:29AM -0600, Bjorn Helgaas wrote:

[+cc Ming, Hayes, Francois, r8169 list]

On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
nirina.raseliari...@gmail.com wrote:
 hello there,
 i have this ethernet controler:

  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast  
Ethernet

 controller (rev 05)

 that uses the r8169 module.
 it works fine, but sometimes after a reboot and issueing:

  ifconfig eth0 192.168.1.1 up

 i got the message below. after another reboot the
 message disappears. i also get the same message this 3.9.5 and 3.9.4.

 it seems i catch my first oops and don't know what to do with it.
 currently running:

  cat /proc/version
  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1  
(GCC) ) #1

 SMP Fri Jun 14 09:14:50 EAT 2013

  uname -a
  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013  
x86_64

 Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux

 thanks,
  
-8--8---


 [   57.877560] BUG: unable to handle kernel NULL pointer dereference  
at

 0040
 [   57.877603] IP: [81491844] fw_load_abort.isra.5+0x4/0x20
 [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
 [   57.877660] Oops: 0002 [#1] SMP
 [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev  
r8169

 microcode mii
 [   57.877735] CPU 0
 [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614  
#1 To be

 filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
 [   57.877790] RIP: 0010:[81491844]  [81491844]
 fw_load_abort.isra.5+0x4/0x20
 [   57.877824] RSP: 0018:8802119a7e80  EFLAGS: 00010246
 [   57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX:
 
 [   57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI:
 
 [   57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09:
 05aa
 [   57.877920] R10:  R11:  R12:
 
 [   57.877945] R13: 880213d34088 R14: 0003 R15:
 88020eafc230
 [   57.877970] FS:  7f3c6cb2a740() GS:88021f20()
 knlGS:
 [   57.877998] CS:  0010 DS:  ES:  CR0: 80050033
 [   57.878019] CR2: 0040 CR3: 000203155000 CR4:
 001407f0
 [   57.878044] DR0:  DR1:  DR2:
 
 [   57.878069] DR3:  DR6: 0ff0 DR7:
 0400
 [   57.878094] Process firmware (pid: 1950, threadinfo  
8802119a6000,

 task 8802158fe250)
 [   57.878124] Stack:
 [   57.878133]  8802119a7eb0 81491917 880211a4d5a0
 0003
 [   57.878168]  8802119a7f50 818765a0 8802119a7ec0
 81483063
 [   57.878203]  8802119a7f08 8119bc9e 880213d34098
 880211a4d5c0
 [   57.878237] Call Trace:
 [   57.878251]  [81491917] firmware_loading_store+0x77/0x150
 [   57.878275]  [81483063] dev_attr_store+0x13/0x20
 [   57.878297]  [8119bc9e] sysfs_write_file+0xce/0x140
 [   57.878320]  [81133e8a] vfs_write+0x9a/0x160
 [   57.878340]  [81134164] sys_write+0x44/0x90
 [   57.878360]  [817d70ed] system_call_fastpath+0x1a/0x1f
 [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79  
ff ff
 ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48  
89 e5

 f0 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
 [   57.881753] RIP  [81491844] fw_load_abort.isra.5+0x4/0x20
 [   57.882888]  RSP 8802119a7e80
 [   57.884019] CR2: 0040
 [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---



Please try the following patch.


patch applied and no longer have the bug message when i
reboot and wake up the ethernet controller.

thanks,


[ Bjorn, sorry I dropped you from the recipient list, but unfortunately
Google still considers me to be a spammer and doesn't let me send any
e-mail to you ]

Guenter

--

From 9feae0b1b33721573c41fbf2323db2a12c34c725 Mon Sep 17 00:00:00 2001
From: Guenter Roeck li...@roeck-us.net
Date: Fri, 14 Jun 2013 08:39:06 -0700
Subject: [PATCH] firmware: Fix race condition in firmware_loading_store

Fix:

BUG: unable to handle kernel NULL pointer dereference at 0040
IP: [81491844] fw_load_abort.isra.5+0x4/0x20
...
Call Trace:
[81491917] firmware_loading_store+0x77/0x150
[81483063] dev_attr_store+0x13/0x20
[8119bc9e] sysfs_write_file+0xce/0x140
[81133e8a] vfs_write+0x9a/0x160
[81134164] sys_write+0x44/0x90
[817d70ed] system_call_fastpath+0x1a/0x1f

Signed-off-by: Guenter Roeck li...@roeck-us.net
---
 drivers/base/firmware_class.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 

Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread nirinA raseliarison
on Fri, 14 Jun 2013 20:02:25 +0300, Ming Lei ming@canonical.com  
wrote:


On Fri, Jun 14, 2013 at 10:30 PM, Bjorn Helgaas bhelg...@google.com  
wrote:

[+cc Ming, Hayes, Francois, r8169 list]

On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
nirina.raseliari...@gmail.com wrote:

hello there,
i have this ethernet controler:

 Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast  
Ethernet

controller (rev 05)

that uses the r8169 module.
it works fine, but sometimes after a reboot and issueing:

 ifconfig eth0 192.168.1.1 up

i got the message below. after another reboot the
message disappears. i also get the same message this 3.9.5 and 3.9.4.

it seems i catch my first oops and don't know what to do with it.
currently running:

 cat /proc/version
 Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1  
(GCC) ) #1

SMP Fri Jun 14 09:14:50 EAT 2013

 uname -a
 Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013  
x86_64

Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux

thanks,
-8--8---

[   57.877560] BUG: unable to handle kernel NULL pointer dereference at
0040
[   57.877603] IP: [81491844] fw_load_abort.isra.5+0x4/0x20
[   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
[   57.877660] Oops: 0002 [#1] SMP
[   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev  
r8169

microcode mii
[   57.877735] CPU 0
[   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1  
To be

filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
[   57.877790] RIP: 0010:[81491844]  [81491844]
fw_load_abort.isra.5+0x4/0x20
[   57.877824] RSP: 0018:8802119a7e80  EFLAGS: 00010246
[   57.877844] RAX: 8802158fe250 RBX: 880211a03b40 RCX:

[   57.877869] RDX: 81c742c8 RSI: 8802158fe250 RDI:

[   57.877895] RBP: 8802119a7e80 R08: 8802119a6000 R09:
05aa
[   57.877920] R10:  R11:  R12:

[   57.877945] R13: 880213d34088 R14: 0003 R15:
88020eafc230
[   57.877970] FS:  7f3c6cb2a740() GS:88021f20()
knlGS:
[   57.877998] CS:  0010 DS:  ES:  CR0: 80050033
[   57.878019] CR2: 0040 CR3: 000203155000 CR4:
001407f0
[   57.878044] DR0:  DR1:  DR2:

[   57.878069] DR3:  DR6: 0ff0 DR7:
0400
[   57.878094] Process firmware (pid: 1950, threadinfo  
8802119a6000,

task 8802158fe250)
[   57.878124] Stack:
[   57.878133]  8802119a7eb0 81491917 880211a4d5a0
0003
[   57.878168]  8802119a7f50 818765a0 8802119a7ec0
81483063
[   57.878203]  8802119a7f08 8119bc9e 880213d34098
880211a4d5c0
[   57.878237] Call Trace:
[   57.878251]  [81491917] firmware_loading_store+0x77/0x150
[   57.878275]  [81483063] dev_attr_store+0x13/0x20
[   57.878297]  [8119bc9e] sysfs_write_file+0xce/0x140
[   57.878320]  [81133e8a] vfs_write+0x9a/0x160
[   57.878340]  [81134164] sys_write+0x44/0x90
[   57.878360]  [817d70ed] system_call_fastpath+0x1a/0x1f
[   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79  
ff ff
ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48  
89 e5

f0 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
[   57.881753] RIP  [81491844] fw_load_abort.isra.5+0x4/0x20
[   57.882888]  RSP 8802119a7e80
[   57.884019] CR2: 0040
[   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---


Looks it is a double abort race, could you try below patch?
(also attached for applying)


i've also applied this patch and up to now, after
reboot a few times all thing seems to work fine.

thanks,


--
diff --git a/drivers/base/firmware_class.c  
b/drivers/base/firmware_class.c

index 6ede229..a217ba8 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device  
*dev,

 struct device_attribute *attr, char *buf)
 {
struct firmware_priv *fw_priv = to_firmware_priv(dev);
-   int loading = test_bit(FW_STATUS_LOADING, fw_priv-buf-status);
+   int loading = 0;
+
+   mutex_lock(fw_lock);
+   if (fw_priv-buf)
+   loading = test_bit(FW_STATUS_LOADING, fw_priv-buf-status);
+   mutex_unlock(fw_lock);

return sprintf(buf, %d\n, loading);
 }
@@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct  
device *dev,

  const char *buf, size_t count)
 {
struct firmware_priv *fw_priv = to_firmware_priv(dev);
-   struct firmware_buf *fw_buf = fw_priv-buf;
+   struct firmware_buf 

Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

2013-06-14 Thread Ming Lei
On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
nirina.raseliari...@gmail.com wrote:

 patch applied and no longer have the bug message when i
 reboot and wake up the ethernet controller.

I am wondering if Guenter's patch can fix the race really, but I'd like to
see Guenter's explanation on his patch.

The race should be caused by below:

- request timeout triggered by internal timer

- user space aborts the requests before the line in _request_firmware_load()

 fw_priv-buf = NULL

which is run in timeout path

- then the abort() called from firmware_loading_store() may use a freed fw buf
since the timeout path will free the fw buffer.

Considered clearing 'fw_priv-buf' in _request_firmware_load()() isn't protected
by fw_lock now, so Guenter's patch can't avoid the race entirely.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/