[PATCH] scsi: ses: Guard against page 10 descriptors changes while processing them

2018-07-05 Thread Stefano Brivio
SAS page 10 descriptors might change once we re-read them in
ses_enclosure_data_process(), causing out-of-bound reads such as
this one found by KASAN:

[  321.349000] sd 1:2:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[  321.357723] sd 1:2:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 04 8b 92 0f 
80 00 00 00 08 00 00
[  321.366604] blk_update_request: I/O error, dev sdb, sector 19521474432
[  321.373346] sd 1:2:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[  321.382074] sd 1:2:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 04 8b 92 0f 
80 00 00 00 08 00 00
[  321.390963] blk_update_request: I/O error, dev sdb, sector 19521474432
[  321.397505] Buffer I/O error on dev sdb, logical block 2440184304, async 
page read
[  321.736854] 
==
[  321.744090] BUG: KASAN: slab-out-of-bounds in 
ses_enclosure_data_process+0xaa1/0xea0 [ses]
[  321.752352] Read of size 1 at addr 8807413710b9 by task systemd-udevd/708
[  321.759485]
[  321.760990] CPU: 0 PID: 708 Comm: systemd-udevd Tainted: G  I
   3.10.0-916.el7.test.x86_64 #1
[  321.771590] Hardware name: IBM  -[7147I10]-/Node 1, System Card, BIOS 
-[MLE179BUS-1.79]- 07/28/2013
[  321.780630] Call Trace:
[  321.783093]  [] dump_stack+0x19/0x1b
[  321.788236]  [] print_address_description+0xfc/0x290
[  321.802162]  [] kasan_report.part.3+0x242/0x330
[  321.808255]  [] __asan_report_load1_noabort+0x34/0x40
[  321.814869]  [] ses_enclosure_data_process+0xaa1/0xea0 
[ses]
[  321.822094]  [] ses_intf_add+0x85d/0xdde [ses]
[  321.828108]  [] class_interface_register+0x219/0x330
[  321.851979]  [] scsi_register_interface+0x38/0x50
[  321.858247]  [] ses_init+0x11/0x1000 [ses]
[  321.863911]  [] do_one_initcall+0x12a/0x370
[  321.869665]  [] load_module+0x5e3d/0x7550
[  321.928619]  [] SyS_init_module+0x253/0x350
[  321.986411]  [] system_call_fastpath+0x1c/0x21
[  321.999029]
[  322.000527] Allocated by task 708:
[  322.003931]  [] save_stack+0x43/0xe0
[  322.009100]  [] kasan_kmalloc+0xaa/0xe0
[  322.014524]  [] __kmalloc+0xee/0x270
[  322.019683]  [] ses_intf_add+0xa90/0xdde [ses]
[  322.025710]  [] class_interface_register+0x219/0x330
[  322.032259]  [] scsi_register_interface+0x38/0x50
[  322.038552]  [] ses_init+0x11/0x1000 [ses]
[  322.044234]  [] do_one_initcall+0x12a/0x370
[  322.050002]  [] load_module+0x5e3d/0x7550
[  322.055599]  [] SyS_init_module+0x253/0x350
[  322.061371]  [] system_call_fastpath+0x1c/0x21
[  322.067399]
[  322.068897] Freed by task 0:
[  322.071782] (stack is not available)
[  322.075358]
[  322.076857] The buggy address belongs to the object at 880741370f00
[  322.076857]  which belongs to the cache kmalloc-512 of size 512
[  322.089365] The buggy address is located 441 bytes inside of
[  322.089365]  512-byte region [880741370f00, 880741371100)
[  322.101089] The buggy address belongs to the page:
[  322.105885] page:ea001d04dc00 count:1 mapcount:1638426 mapping:  
(null) index:0x0
[  322.114404] page flags: 0x2f4080(slab|head)
[  322.119347] page dumped because: kasan: bad access detected
[  322.124917]
[  322.126416] Memory state around the buggy address:
[  322.131212]  880741370f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[  322.138431]  880741371000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[  322.145653] >880741371080: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc 
fc
[  322.152871] ^
[  322.157925]  880741371100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[  322.165144]  880741371180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  322.172360] 
==
[  322.179577] Disabling lock debugging due to kernel taint
[  322.184954] 
==
[  322.192180] BUG: KASAN: slab-out-of-bounds in 
ses_enclosure_data_process+0xaa1/0xea0 [ses]
[  322.200443] Read of size 1 at addr 8807413710bb by task systemd-udevd/708

Stop processing additional descriptors if we are already at the
end of page 10 allocated buffer.

Signed-off-by: Stefano Brivio 
---
 drivers/scsi/ses.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index 62f04c0511cf..d7fcda08a802 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -605,9 +605,15 @@ static void ses_enclosure_data_process(struct 
enclosure_device *edev,
 /* these elements are optional */
 type_ptr[0] == 
ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
 type_ptr[0] == 
ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
-type_ptr[0] == 
ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
-   addl_desc_ptr += addl_desc_ptr[1] 

Re: [PATCH RESEND] lpfc: Don't return internal MBXERR_ERROR code from probe function

2017-09-06 Thread Stefano Brivio
On Wed, 6 Sep 2017 11:30:34 +0200
Johannes Thumshirn  wrote:

> On Wed, Sep 06, 2017 at 11:02:56AM +0200, Stefano Brivio wrote:
> > Internal error codes happen to be positive, thus the PCI driver
> > core won't treat them as failure, but we do. This would cause a
> > crash later on as lpfc_pci_remove_one() is called (e.g. as
> > shutdown function).
> > 
> > Fixes: 6d368e532168 ("[SCSI] lpfc 8.3.24: Add resource extent support")
> > Signed-off-by: Stefano Brivio 
> > ---
> > This seems to have been ignored. Re-sending as suggested by Johannes.
> > 
> >  drivers/scsi/lpfc/lpfc_init.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
> > index 491aa95eb0f6..38cc2b5bb5a2 100644
> > --- a/drivers/scsi/lpfc/lpfc_init.c
> > +++ b/drivers/scsi/lpfc/lpfc_init.c
> > @@ -6118,6 +6118,7 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba)
> > "Extents and RPI headers enabled.\n");
> > }
> > mempool_free(mboxq, phba->mbox_mem_pool);
> > +   rc = -EIO;
> > goto out_free_bsmbx;
> > }
> >  
> > -- 
> > 2.9.4  
> 
> The patch looks good, but there are lots of 
>   if (rc) {
>   mempool_free(mboxq, phba->mbox_mem_pool);
>   rc = -EIO;
>   goto out_free_bsmbx;
>   }
> 
> in lpfc_sli4_driver_resource_setup(). Shouldn't out_free_bsmbx take care of it
> all so we only have:
>   if (rc)
>   goto out_free_bsmbx;

Thanks for your feedback!

I considered doing something similar, but there are different error
coded which are set when we reach the label out_free_mbsx. I checked all
of them (and I hope I didn't miss any), but they all looked correct,
and in quite a few cases different than -EIO (e.g. -ENODEV).

So I think always returning -EIO in those cases is not what we want.

> Because as this patch shows there's always a chance to miss an 'rc = -EIO'.
> 
> Out of curiosity, do you know what's the value of rc in the failure case?

Yes, MBXERR_ERROR (mentioned in patch subject -- sorry, I could have
repeated it in the message perhaps).


--
Stefano


[PATCH RESEND] lpfc: Don't return internal MBXERR_ERROR code from probe function

2017-09-06 Thread Stefano Brivio
Internal error codes happen to be positive, thus the PCI driver
core won't treat them as failure, but we do. This would cause a
crash later on as lpfc_pci_remove_one() is called (e.g. as
shutdown function).

Fixes: 6d368e532168 ("[SCSI] lpfc 8.3.24: Add resource extent support")
Signed-off-by: Stefano Brivio 
---
This seems to have been ignored. Re-sending as suggested by Johannes.

 drivers/scsi/lpfc/lpfc_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 491aa95eb0f6..38cc2b5bb5a2 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -6118,6 +6118,7 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba)
"Extents and RPI headers enabled.\n");
}
mempool_free(mboxq, phba->mbox_mem_pool);
+   rc = -EIO;
goto out_free_bsmbx;
}
 
-- 
2.9.4


Re: [PATCH] lpfc: Don't return internal MBXERR_ERROR code from probe function

2017-09-06 Thread Stefano Brivio
On Wed, 6 Sep 2017 10:42:35 +0200
Johannes Thumshirn  wrote:

> On Wed, Sep 06, 2017 at 10:32:22AM +0200, Stefano Brivio wrote:
> > 
> > I didn't get feedback about this patch. Is there any issue with the
> > submission?
> > 
> > I think it actually fixes a quite critical issue, if initialization
> > fails we have crashes on reboot like the one reported below [1], and
> > should perhaps also be queued for -stable.  
> 
> It seems to have slipped trough the cracks, can you please re-submit?

The original submission is archived at
https://marc.info/?l=linux-scsi&m=150392554622786&w=2. Before I cause
any confusion... do you want me to re-submit this with the same
subject? As v2 with a comment?


--
Stefano


Re: [PATCH] lpfc: Don't return internal MBXERR_ERROR code from probe function

2017-09-06 Thread Stefano Brivio
Hi,

On Mon, 28 Aug 2017 15:05:23 +0200
Stefano Brivio  wrote:

> Internal error codes happen to be positive, thus the PCI driver
> core won't treat them as failure, but we do. This would cause a
> crash later on as lpfc_pci_remove_one() is called (e.g. as
> shutdown function).
> 
> Fixes: 6d368e532168 ("[SCSI] lpfc 8.3.24: Add resource extent support")
> Signed-off-by: Stefano Brivio 
> ---
>  drivers/scsi/lpfc/lpfc_init.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
> index 491aa95eb0f6..38cc2b5bb5a2 100644
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -6118,6 +6118,7 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba)
>   "Extents and RPI headers enabled.\n");
>   }
>   mempool_free(mboxq, phba->mbox_mem_pool);
> + rc = -EIO;
>   goto out_free_bsmbx;
>   }
>  

I didn't get feedback about this patch. Is there any issue with the
submission?

I think it actually fixes a quite critical issue, if initialization
fails we have crashes on reboot like the one reported below [1], and
should perhaps also be queued for -stable.


[1]

[  568.638555] BUG: unable to handle kernel NULL pointer dereference at 
07c0 
[  568.679154] IP: lpfc_pci_remove_one+0x20/0x890 [lpfc] 
[  568.704062] PGD 0  
[  568.704063]  
[  568.721714] Oops:  [#1] SMP 
[  568.736895] Modules linked in: fuse vfat msdos fat binfmt_misc xfs fcoe 
libfcoe libfc rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod 
ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm 
ib_cm iw_cm ib_core intel_rapl x86_pkg_temp_thermal intel_powerclamp 
intel_cstate intel_uncore intel_rapl_perf sg ipmi_si hpilo ipmi_devintf pcspkr 
ipmi_msghandler lpc_ich ioatdma shpchp dca pcc_cpufreq acpi_cpufreq ext4 jbd2 
mbcache loop nfsv3 nfs_acl nfs lockd grace fscache sd_mod 8021q garp stp llc 
mrp mgag200 ata_generic i2c_algo_bit pata_acpi drm_kms_helper syscopyarea sfc 
sysfillrect sysimgblt fb_sys_fops ttm tg3 mtd crct10dif_pclmul lpfc 
crc32_pclmul ata_piix drm ptp hpsa serio_raw scsi_transport_fc be2net 
ghash_clmulni_intel libata i2c_core scsi_transport_sas pps_core 
[  569.082589]  mdio wmi sunrpc xts lrw gf128mul mcryptd dm_crypt 
dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash 
dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy 
libcrc32c crc32c_intel async_pq async_xor xor async_tx raid6_pq raid1 raid0 
iscsi_ibft iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi squashfs cramfs edd ctr(E) 
[  569.254181] CPU: 9 PID: 46361 Comm: reboot Tainted: GE  
   4.11.0-22.el7a.x86_64 #1 
[  569.301048] Hardware name: HP ProLiant DL388p Gen8, BIOS P70 12/14/2012 
[  569.332878] task: 880801161680 task.stack: c9000514 
[  569.362693] RIP: 0010:lpfc_pci_remove_one+0x20/0x890 [lpfc] 
[  569.390275] RSP: 0018:c90005143d18 EFLAGS: 00010296 
[  569.416040] RAX:  RBX: 8804bd9cf000 RCX: 
 
[  569.449874] RDX:  RSI: 0246 RDI: 
8804bd9cf000 
[  569.484402] RBP: c90005143d78 R08: 8804bd9cf0b8 R09: 
0006 
[  569.519146] R10: 0020 R11: ea0020975b80 R12: 
8804bd9cf000 
[  569.553408] R13: a057a2a0 R14: 8804bd9cf100 R15: 
fee1dead 
[  569.588474] FS:  7f99fe258880() GS:88083f4c() 
knlGS: 
[  569.628288] CS:  0010 DS:  ES:  CR0: 80050033 
[  569.656916] CR2: 07c0 CR3: 000825d91000 CR4: 
000406e0 
[  569.693185] Call Trace: 
[  569.705044]  pci_device_shutdown+0x36/0x70 
[  569.725702]  device_shutdown+0xdf/0x190 
[  569.744657]  kernel_restart_prepare+0x36/0x40 
[  569.767099]  kernel_restart+0x12/0x60 
[  569.784916]  SYSC_reboot+0x1f3/0x220 
[  569.802649]  ? __alloc_fd+0x46/0x170 
[  569.819539]  ? vfs_writev+0x3c/0x50 
[  569.836206]  ? do_writev+0x61/0xf0 
[  569.852730]  SyS_reboot+0xe/0x10 
[  569.868375]  entry_SYSCALL_64_fastpath+0x1a/0xa9 
[  569.890588] RIP: 0033:0x7f99fd065a56 
[  569.907894] RSP: 002b:7ffc8c5e2b58 EFLAGS: 0202 ORIG_RAX: 
00a9 
[  569.944700] RAX: ffda RBX: ff91 RCX: 
7f99fd065a56 
[  569.978659] RDX: 01234567 RSI: 28121969 RDI: 
fee1dead 
[  570.013272] RBP:  R08: 5583f9de32a0 R09: 
7ffc8c5e2220 
[  570.048039] R10: 0024 R11: 0202 R12: 
5583f9d6ef13 
[  570.082565] R13: 7ffc8c5e2e20 R14:  R15: 
 
[  570.116914] Code: 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 
e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 ec 38 48 8b 87 38 01 00 00 <4

[PATCH] lpfc: Don't return internal MBXERR_ERROR code from probe function

2017-08-28 Thread Stefano Brivio
Internal error codes happen to be positive, thus the PCI driver
core won't treat them as failure, but we do. This would cause a
crash later on as lpfc_pci_remove_one() is called (e.g. as
shutdown function).

Fixes: 6d368e532168 ("[SCSI] lpfc 8.3.24: Add resource extent support")
Signed-off-by: Stefano Brivio 
---
 drivers/scsi/lpfc/lpfc_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 491aa95eb0f6..38cc2b5bb5a2 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -6118,6 +6118,7 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba)
"Extents and RPI headers enabled.\n");
}
mempool_free(mboxq, phba->mbox_mem_pool);
+   rc = -EIO;
goto out_free_bsmbx;
}
 
-- 
2.9.4