Re: [PATCH] xsysace: Fix error handling in ace_setup

2019-02-20 Thread Guenter Roeck

On 2/19/19 11:09 PM, Michal Simek wrote:

On 19. 02. 19 17:49, Guenter Roeck wrote:

If xace hardware reports a bad version number, the error handling code
in ace_setup() calls put_disk(), followed by queue cleanup. However, since
the disk data structure has the queue pointer set, put_disk() also
cleans and releases the queue. This results in blk_cleanup_queue()
accessing an already released data structure, which in turn may result
in a crash such as the following.

[   10.681671] BUG: Kernel NULL pointer dereference at 0x0040
[   10.681826] Faulting instruction address: 0xc0431480
[   10.682072] Oops: Kernel access of bad area, sig: 11 [#1]
[   10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440
[   10.682387] Modules linked in:
[   10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: GW 
5.0.0-rc6-next-20190218+ #2
[   10.682733] NIP:  c0431480 LR: c043147c CTR: c0422ad8
[   10.682863] REGS: cf82fbe0 TRAP: 0300   Tainted: GW  
(5.0.0-rc6-next-20190218+)
[   10.683065] MSR:  00029000   CR: 22000222  XER: 
[   10.683236] DEAR: 0040 ESR: 
[   10.683236] GPR00: c043147c cf82fc90 cf82ccc0    
0002 
[   10.683236] GPR08:   c04310bc  22000222  
c0002c54 
[   10.683236] GPR16:  0001 c09aa39c c09021b0 c09021dc 0007 
c0a68c08 
[   10.683236] GPR24: 0001 ced6d400 ced6dcf0 c0815d9c   
 cedf0800
[   10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114
[   10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114
[   10.684602] Call Trace:
[   10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable)
[   10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c
[   10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68
[   10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c
[   10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508
[   10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8
[   10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c
[   10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464
[   10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4
[   10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc
[   10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0
[   10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234
[   10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c
[   10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac
[   10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330
[   10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478
[   10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114
[   10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c
[   10.687349] Instruction dump:
[   10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 
38810008
[   10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 4182002c 
80810008
[   10.688056] ---[ end trace 13c9ff51d41b9d40 ]---

Fix the problem by setting the disk queue pointer to NULL before calling
put_disk(). A more comprehensive fix might be to rearrange the code
to check the hardware version before initializing data structures,
but I don't know if this would have undesirable side effects, and
it would increase the complexity of backporting the fix to older kernels.

Fixes: 74489a91dd43a ("Add support for Xilinx SystemACE CompactFlash interface")
Signed-off-by: Guenter Roeck 
---
  drivers/block/xsysace.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/block/xsysace.c b/drivers/block/xsysace.c
index 87ccef4bd69e..32a21b8d1d85 100644
--- a/drivers/block/xsysace.c
+++ b/drivers/block/xsysace.c
@@ -1090,6 +1090,8 @@ static int ace_setup(struct ace_device *ace)
return 0;
  
  err_read:

+   /* prevent double queue cleanup */
+   ace->gd->queue = NULL;
put_disk(ace->gd);
  err_alloc_disk:
blk_cleanup_queue(ace->queue);



This driver is quite old and we are not actively using/testing it.
Are you wiring that up with qemu?



Not specifically; it may be wired up as uninitialized device.
Presumably that is why it reports version 0, which causes the driver
to bail out. It is instantiated by the devicetree file (I think).

Guenter


Maybe it should be labeled differently in MAINTAINERS file.

Anyway whatever fix is fine for me.

Acked-by: Michal Simek 

Thanks,
Michal





Re: [PATCH] xsysace: Fix error handling in ace_setup

2019-02-19 Thread Michal Simek
On 19. 02. 19 17:49, Guenter Roeck wrote:
> If xace hardware reports a bad version number, the error handling code
> in ace_setup() calls put_disk(), followed by queue cleanup. However, since
> the disk data structure has the queue pointer set, put_disk() also
> cleans and releases the queue. This results in blk_cleanup_queue()
> accessing an already released data structure, which in turn may result
> in a crash such as the following.
> 
> [   10.681671] BUG: Kernel NULL pointer dereference at 0x0040
> [   10.681826] Faulting instruction address: 0xc0431480
> [   10.682072] Oops: Kernel access of bad area, sig: 11 [#1]
> [   10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440
> [   10.682387] Modules linked in:
> [   10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: GW 
> 5.0.0-rc6-next-20190218+ #2
> [   10.682733] NIP:  c0431480 LR: c043147c CTR: c0422ad8
> [   10.682863] REGS: cf82fbe0 TRAP: 0300   Tainted: GW  
> (5.0.0-rc6-next-20190218+)
> [   10.683065] MSR:  00029000   CR: 22000222  XER: 
> [   10.683236] DEAR: 0040 ESR: 
> [   10.683236] GPR00: c043147c cf82fc90 cf82ccc0    
> 0002 
> [   10.683236] GPR08:   c04310bc  22000222  
> c0002c54 
> [   10.683236] GPR16:  0001 c09aa39c c09021b0 c09021dc 0007 
> c0a68c08 
> [   10.683236] GPR24: 0001 ced6d400 ced6dcf0 c0815d9c   
>  cedf0800
> [   10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114
> [   10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114
> [   10.684602] Call Trace:
> [   10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 
> (unreliable)
> [   10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c
> [   10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68
> [   10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c
> [   10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508
> [   10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8
> [   10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c
> [   10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464
> [   10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4
> [   10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc
> [   10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0
> [   10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234
> [   10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c
> [   10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac
> [   10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330
> [   10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478
> [   10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114
> [   10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c
> [   10.687349] Instruction dump:
> [   10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 
> 38810008
> [   10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 
> 4182002c 80810008
> [   10.688056] ---[ end trace 13c9ff51d41b9d40 ]---
> 
> Fix the problem by setting the disk queue pointer to NULL before calling
> put_disk(). A more comprehensive fix might be to rearrange the code
> to check the hardware version before initializing data structures,
> but I don't know if this would have undesirable side effects, and
> it would increase the complexity of backporting the fix to older kernels.
> 
> Fixes: 74489a91dd43a ("Add support for Xilinx SystemACE CompactFlash 
> interface")
> Signed-off-by: Guenter Roeck 
> ---
>  drivers/block/xsysace.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/block/xsysace.c b/drivers/block/xsysace.c
> index 87ccef4bd69e..32a21b8d1d85 100644
> --- a/drivers/block/xsysace.c
> +++ b/drivers/block/xsysace.c
> @@ -1090,6 +1090,8 @@ static int ace_setup(struct ace_device *ace)
>   return 0;
>  
>  err_read:
> + /* prevent double queue cleanup */
> + ace->gd->queue = NULL;
>   put_disk(ace->gd);
>  err_alloc_disk:
>   blk_cleanup_queue(ace->queue);
> 

This driver is quite old and we are not actively using/testing it.
Are you wiring that up with qemu?

Maybe it should be labeled differently in MAINTAINERS file.

Anyway whatever fix is fine for me.

Acked-by: Michal Simek 

Thanks,
Michal


[PATCH] xsysace: Fix error handling in ace_setup

2019-02-19 Thread Guenter Roeck
If xace hardware reports a bad version number, the error handling code
in ace_setup() calls put_disk(), followed by queue cleanup. However, since
the disk data structure has the queue pointer set, put_disk() also
cleans and releases the queue. This results in blk_cleanup_queue()
accessing an already released data structure, which in turn may result
in a crash such as the following.

[   10.681671] BUG: Kernel NULL pointer dereference at 0x0040
[   10.681826] Faulting instruction address: 0xc0431480
[   10.682072] Oops: Kernel access of bad area, sig: 11 [#1]
[   10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440
[   10.682387] Modules linked in:
[   10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: GW 
5.0.0-rc6-next-20190218+ #2
[   10.682733] NIP:  c0431480 LR: c043147c CTR: c0422ad8
[   10.682863] REGS: cf82fbe0 TRAP: 0300   Tainted: GW  
(5.0.0-rc6-next-20190218+)
[   10.683065] MSR:  00029000   CR: 22000222  XER: 
[   10.683236] DEAR: 0040 ESR: 
[   10.683236] GPR00: c043147c cf82fc90 cf82ccc0    
0002 
[   10.683236] GPR08:   c04310bc  22000222  
c0002c54 
[   10.683236] GPR16:  0001 c09aa39c c09021b0 c09021dc 0007 
c0a68c08 
[   10.683236] GPR24: 0001 ced6d400 ced6dcf0 c0815d9c   
 cedf0800
[   10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114
[   10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114
[   10.684602] Call Trace:
[   10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable)
[   10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c
[   10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68
[   10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c
[   10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508
[   10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8
[   10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c
[   10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464
[   10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4
[   10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc
[   10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0
[   10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234
[   10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c
[   10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac
[   10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330
[   10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478
[   10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114
[   10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c
[   10.687349] Instruction dump:
[   10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 
38810008
[   10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 4182002c 
80810008
[   10.688056] ---[ end trace 13c9ff51d41b9d40 ]---

Fix the problem by setting the disk queue pointer to NULL before calling
put_disk(). A more comprehensive fix might be to rearrange the code
to check the hardware version before initializing data structures,
but I don't know if this would have undesirable side effects, and
it would increase the complexity of backporting the fix to older kernels.

Fixes: 74489a91dd43a ("Add support for Xilinx SystemACE CompactFlash interface")
Signed-off-by: Guenter Roeck 
---
 drivers/block/xsysace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/xsysace.c b/drivers/block/xsysace.c
index 87ccef4bd69e..32a21b8d1d85 100644
--- a/drivers/block/xsysace.c
+++ b/drivers/block/xsysace.c
@@ -1090,6 +1090,8 @@ static int ace_setup(struct ace_device *ace)
return 0;
 
 err_read:
+   /* prevent double queue cleanup */
+   ace->gd->queue = NULL;
put_disk(ace->gd);
 err_alloc_disk:
blk_cleanup_queue(ace->queue);
-- 
2.7.4