Re: status of drivers/scsi/scsi_tgt_lib.c
On Tue, Jan 28, 2014 at 12:58:36PM +0100, Hannes Reinecke wrote: I actually know of a customer using this (with a semi-proprietary target mode driver). I'll be asking him what his plans are. They are using the IBM pseries vscsi driver for that? It they actually adding out of tree kernel code it doesn't really matter at all for this discussion.. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status of drivers/scsi/scsi_tgt_lib.c
On 01/30/2014 09:44 AM, Christoph Hellwig wrote: On Tue, Jan 28, 2014 at 12:58:36PM +0100, Hannes Reinecke wrote: I actually know of a customer using this (with a semi-proprietary target mode driver). I'll be asking him what his plans are. They are using the IBM pseries vscsi driver for that? It they actually adding out of tree kernel code it doesn't really matter at all for this discussion.. No, not the vscsi driver. But the tgt hooks. But yeah, I know. As said, I'll be in touch with them. I just wanted mention that _someone_ uses this code Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes
On Wed, Jan 29, 2014 at 09:52:46PM -0700, Matthew Wilcox wrote: On Fri, Jan 24, 2014 at 10:57:48AM +, Mel Gorman wrote: So far on the table is 1. major filesystem overhawl 2. major vm overhawl 3. use compound pages as they are today and hope it does not go completely to hell, reboot when it does Is the below paragraph an exposition of option 2, or is it an option 4, change the VM unit of allocation? Changing the VM unit of allocation is a major VM overhawl Other than the names you're using, this is basically what I said to Kirill in an earlier thread; either scrap the difference between PAGE_SIZE and PAGE_CACHE_SIZE, or start making use of it. No. The PAGE_CACHE_SIZE would depend on the underlying address space and vary. The large block patchset would have to have done this but I did not go back and review the patches due to lack of time. With that it starts hitting into fragmentation problems that have to be addressed somehow and cannot just be waved away. The fact that EVERYBODY in this thread has been using PAGE_SIZE when they should have been using PAGE_CACHE_SIZE makes me wonder if part of the problem is that the split in naming went the wrong way. ie use PTE_SIZE for 'the amount of memory pointed to by a pte_t' and use PAGE_SIZE for 'the amount of memory described by a struct page'. (we need to remove the current users of PTE_SIZE; sparc32 and powerpc32, but that's just a detail) And we need to fix all the places that are currently getting the distinction wrong. SMOP ... ;-) What would help is correct typing of variables, possibly with sparse support to help us out. Big Job. That's taking the approach of the large block patchset (as I understand it, not reviewed, not working on this etc) without dealing with potential fragmentation problems. Of course they could be remapped virtually if necessary but that will be very constrained on 32-bit, the final transfer to hardware will require scatter/gather and there is a setup/teardown cost with virtual mappings such as faulting (setup) and IPIs to flush TLBs (teardown) that would add overhead. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 4/9] ipr: Use pci_enable_msi_range() and pci_enable_msix_range()
On Wed, Jan 29, 2014 at 02:26:52PM +0100, Alexander Gordeev wrote: Do you want me to rediff your patches on top of this one, or do you want to keep the entire MSI series together and do the rediff? Otherwise the patches seem fine. I would prefer the former. Hi Brian, I am sending the refreshed patches on top of ipr: Handle early EEH. -- Regards, Alexander Gordeev agord...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] ipr: Get rid of superfluous call to pci_disable_msi/msix()
There is no need to call pci_disable_msi() or pci_disable_msix() in case the call to pci_enable_msi() or pci_enable_msix() failed. Signed-off-by: Alexander Gordeev agord...@redhat.com --- drivers/scsi/ipr.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index 96c10ce..48d0cfc 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -9329,7 +9329,6 @@ static int ipr_enable_msix(struct ipr_ioa_cfg *ioa_cfg) if (err 0) { ipr_wait_for_pci_err_recovery(ioa_cfg); - pci_disable_msix(ioa_cfg-pdev); return err; } @@ -9353,7 +9352,6 @@ static int ipr_enable_msi(struct ipr_ioa_cfg *ioa_cfg) if (err 0) { ipr_wait_for_pci_err_recovery(ioa_cfg); - pci_disable_msi(ioa_cfg-pdev); return err; } -- 1.7.7.6 -- Regards, Alexander Gordeev agord...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] ipr: Use pci_enable_msi_range() and pci_enable_msix_range()
As result deprecation of MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block() all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() and pci_enable_msix_range() interfaces. Signed-off-by: Alexander Gordeev agord...@redhat.com --- drivers/scsi/ipr.c | 47 ++- 1 files changed, 18 insertions(+), 29 deletions(-) diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index 48d0cfc..70f40ca 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -9317,51 +9317,40 @@ static void ipr_wait_for_pci_err_recovery(struct ipr_ioa_cfg *ioa_cfg) static int ipr_enable_msix(struct ipr_ioa_cfg *ioa_cfg) { struct msix_entry entries[IPR_MAX_MSIX_VECTORS]; - int i, err, vectors; + int i, vectors; for (i = 0; i ARRAY_SIZE(entries); ++i) entries[i].entry = i; - vectors = ipr_number_of_msix; - - while ((err = pci_enable_msix(ioa_cfg-pdev, entries, vectors)) 0) - vectors = err; - - if (err 0) { + vectors = pci_enable_msix_range(ioa_cfg-pdev, + entries, 1, ipr_number_of_msix); + if (vectors 0) { ipr_wait_for_pci_err_recovery(ioa_cfg); - return err; + return vectors; } - if (!err) { - for (i = 0; i vectors; i++) - ioa_cfg-vectors_info[i].vec = entries[i].vector; - ioa_cfg-nvectors = vectors; - } + for (i = 0; i vectors; i++) + ioa_cfg-vectors_info[i].vec = entries[i].vector; + ioa_cfg-nvectors = vectors; - return err; + return 0; } static int ipr_enable_msi(struct ipr_ioa_cfg *ioa_cfg) { - int i, err, vectors; + int i, vectors; - vectors = ipr_number_of_msix; - - while ((err = pci_enable_msi_block(ioa_cfg-pdev, vectors)) 0) - vectors = err; - - if (err 0) { + vectors = pci_enable_msi_range(ioa_cfg-pdev, 1, ipr_number_of_msix); + if (vectors 0) { ipr_wait_for_pci_err_recovery(ioa_cfg); - return err; + return vectors; } - if (!err) { - for (i = 0; i vectors; i++) - ioa_cfg-vectors_info[i].vec = ioa_cfg-pdev-irq + i; - ioa_cfg-nvectors = vectors; - } + for (i = 0; i vectors; i++) + ioa_cfg-vectors_info[i].vec = ioa_cfg-pdev-irq + i; + ioa_cfg-nvectors = vectors; - return err; + return 0; } static void name_msi_vectors(struct ipr_ioa_cfg *ioa_cfg) @@ -9426,7 +9415,7 @@ static irqreturn_t ipr_test_intr(int irq, void *devp) * ipr_test_msi - Test for Message Signaled Interrupt (MSI) support. * @pdev: PCI device struct * - * Description: The return value from pci_enable_msi() can not always be + * Description: The return value from pci_enable_msi_range() can not always be * trusted. This routine sets up and initiates a test interrupt to determine * if the interrupt is received via the ipr_test_intr() service routine. * If the tests fails, the driver will fall back to LSI. -- 1.7.7.6 -- Regards, Alexander Gordeev agord...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qla2xxx: eliminate dead code in qla24xx_process_bidir_cmd
Coverity reports that the test of req_data_len vs. rsp_data_len is dead code. This appears to be because the test occurs before any real assignment to either variable. Assuming that the sole in-tree execution path (QL_VND_DIAG_IO_CMD submitted via FC pass-through on a host /dev/bsg/X) does not require the response to have the same length as the request, all code related to the faulty test can be removed. If this is not the case, the test should be moved much earlier in the function since it does not depend on any resource acquitision. Signed-off-by: Steven J. Magnani st...@digidescorp.com --- --- linux-3.13/drivers/scsi/qla2xxx/qla_bsg.c 2014-01-29 13:50:02.050802907 -0600 +++ b/drivers/scsi/qla2xxx/qla_bsg.c2014-01-29 13:53:15.856549874 -0600 @@ -1732,8 +1732,6 @@ qla24xx_process_bidir_cmd(struct fc_bsg_ uint16_t nextlid = 0; uint32_t tot_dsds; srb_t *sp = NULL; - uint32_t req_data_len = 0; - uint32_t rsp_data_len = 0; /* Check the type of the adapter */ if (!IS_BIDI_CAPABLE(ha)) { @@ -1840,17 +1838,6 @@ qla24xx_process_bidir_cmd(struct fc_bsg_ goto done_unmap_sg; } - if (req_data_len != rsp_data_len) { - rval = EXT_STATUS_BUSY; - ql_log(ql_log_warn, vha, 0x70aa, - req_data_len != rsp_data_len\n); - goto done_unmap_sg; - } - - req_data_len = bsg_job-request_payload.payload_len; - rsp_data_len = bsg_job-reply_payload.payload_len; - - /* Alloc SRB structure */ sp = qla2x00_get_sp(vha, (vha-bidir_fcport), GFP_KERNEL); if (!sp) { --- linux-3.13/drivers/scsi/qla2xxx/qla_dbg.c 2014-01-29 13:50:49.435230824 -0600 +++ b/drivers/scsi/qla2xxx/qla_dbg.c2014-01-29 13:53:43.960820829 -0600 @@ -38,7 +38,8 @@ * | || 0x7073-0x7075, | * | || 0x707b,0x708c, | * | || 0x70a5,0x70a6, | - * | || 0x70a8,0x70ab, | + * | || 0x70a8,| + * | || 0x70aa-0x70ab, | * | || 0x70ad-0x70ae, | * | || 0x70d1-0x70db, | * | || 0x7047,0x703b | -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
libsrp: remove unreachable kfree
srp_iu_pool_alloc implements what seems like a standard common exit path with gotos but there's only one way to reach it. Additionally the kfree(q-items) is unreachable. fold the error path back into the if. Signed-off-by: Dave Jones da...@fedoraproject.org diff --git a/drivers/scsi/libsrp.c b/drivers/scsi/libsrp.c index 0707ecdbaa32..f45758f15381 100644 --- a/drivers/scsi/libsrp.c +++ b/drivers/scsi/libsrp.c @@ -55,9 +55,12 @@ static int srp_iu_pool_alloc(struct srp_queue *q, size_t max, q-pool = kcalloc(max, sizeof(struct iu_entry *), GFP_KERNEL); if (!q-pool) return -ENOMEM; + q-items = kcalloc(max, sizeof(struct iu_entry), GFP_KERNEL); - if (!q-items) - goto free_pool; + if (!q-items) { + kfree(q-pool); + return -ENOMEM; + } spin_lock_init(q-lock); kfifo_init(q-queue, (void *) q-pool, max * sizeof(void *)); @@ -68,11 +71,6 @@ static int srp_iu_pool_alloc(struct srp_queue *q, size_t max, iue++; } return 0; - - kfree(q-items); -free_pool: - kfree(q-pool); - return -ENOMEM; } static void srp_iu_pool_free(struct srp_queue *q) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] target updates for v3.14-rc1
Hi Linus! Here are the target-pending updates for the v3.14-rc1 merge window. Please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next The highlights this round include: - Add support for SCSI Referrals (Hannes) - Add support for T10 DIF into target core (nab + mkp) - Add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab) - Add support for T10 DIF - bio_integrity passthrough in IBLOCK backend (nab) - Prep changes to iser-target for = v3.15 T10 DIF support (Sagi) - Add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn) - Allow percpu_ida_alloc() to receive task state bitmask (Kent) - Fix = v3.12 iscsi-target session reset hung task regression (nab) - Fix = v3.13 percpu_ref se_lun-lun_ref_active race (nab) - Fix a long-standing network portal creation race (Andy) Just a heads up that you'll encounter two merge conflicts in target_core_tpg.c + qla_target.c due to upstream v3.13 bugfixes. The resolutions are straightforward enough, but please yell if something doesn't look right. Also, Stephen is carrying one extra target_core_iblock.c patch to address the immutable bio_vec changes that are currently outstanding in Jens block PULL request. That patch is here: linux-next: build failure after merge of the target-updates tree http://marc.info/?l=linux-nextm=139019553616605w=2 Thank you, --nab Andy Grover (7): target: Remove unused ua_dev_list member in struct se_ua target: Allocate more room for port default groups target: Fix sizeof in kmalloc for some default_groups arrays target: Refer to u32 luns as unpacked_lun target: Rename core_tpg_{pre,post}_addlun for clarity target: Don't use void* when passing dev in core_tpg_add_lun target/iscsi: Fix network portal creation race Hannes Reinecke (9): target_core_alua: validate ALUA state transition target_core_alua: Allocate ALUA metadata on demand target_core_alua: store old and pending ALUA state target_core_alua: Use workqueue for ALUA transitioning target_core: simplify scsi_name_len calculation target_core_spc: Include target device descriptor in VPD page 83 target_core_alua: Referrals infrastructure target_core_alua: Referrals configfs integration target_core_alua: check for buffer overflow Kent Overstreet (1): percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask Nicholas Bellinger (24): target: Convert inquiry temporary buffer to heap memory target: Add DIF related base definitions target: Add DIF CHECK_CONDITION ASC/ASCQ exception cases target/sbc: Add DIF setup in sbc_check_prot + sbc_parse_cdb target/sbc: Add DIF TYPE1+TYPE3 read/write verify emulation target/spc: Add protection bit to standard INQUIRY output target/spc: Add protection related bits to INQUIRY EVPD=0x86 target/sbc: Add P_TYPE + PROT_EN bits to READ_CAPACITY_16 target/spc: Expose ATO bit in control mode page target/configfs: Expose protection device attributes target: Add protection SGLs to target_submit_cmd_map_sgls target/iblock: Add blk_integrity + BIP passthrough support target/file: Add DIF protection init/format support target/file: Add DIF protection support to fd_execute_rw target/rd: Refactor rd_build_device_space + rd_release_device_space target/rd: Add support for protection SGL setup + release target/rd: Add DIF protection into rd_execute_rw tcm_loop: Enable DIF/DIX modes in SCSI host LLD qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport iscsi-target: Pre-allocate more tags to avoid ack starvation iscsi-target: Fix connection reset hang with percpu_ida_alloc iscsi-target: Convert gfp_t parameter to task state bitmask target: Fix percpu_ref_put race in transport_lun_remove_cmd Rashika Kheria (4): drivers: target: Move prototype declaration of function to header file target_core_pr.h drivers: target: Mark function as static in target_core_iblock.c drivers: target: Mark functions as static in tcm_loop.c drivers: target: Mark functions and structures as static in tfc_conf.c Sagi Grimberg (5): IB/isert: seperate connection protection domains and dma MRs IB/isert: Avoid frwr notation, user fastreg IB/isert: Move fastreg descriptor creation to a function IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine target: Report bad sector in sense data for DIF errors Saurav Kashyap (1): qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO. block/blk-mq-tag.c |6 +- drivers/infiniband/ulp/isert/ib_isert.c | 222 +- drivers/infiniband/ulp/isert/ib_isert.h | 10 +- drivers/scsi/qla2xxx/qla_attr.c |2 + drivers/scsi/qla2xxx/qla_def.h | 12 +- drivers/scsi/qla2xxx/qla_target.c| 170 drivers/scsi/qla2xxx/qla_target.h|4
Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
On 06/25/13 18:13, Michael Christie wrote: On Jun 25, 2013, at 10:31 AM, Bart Van Assche bvanass...@acm.org wrote: On 06/25/13 15:45, James Bottomley wrote: On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote: There is a difference though between moving the EH kthread_stop() call and the patch at the start of this thread: moving the EH kthread_stop() call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_* callback after scsi_remove_host() has finished. However, the scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can cause an eh_* callback to be invoked after scsi_remove_device() finished. OK, but this doesn't tell me what you're trying to achieve. An eh function is allowable as long as the host hadn't had the release callback executed. That means you must have to have a reference to the device/host to execute the eh function, which is currently guaranteed for all invocations. That raises a new question: how is an LLD expected to clean up resources without triggering a race condition ? What you wrote means that it's not safe for an LLD to start cleaning up the resources needed by the eh_* callbacks immediately after scsi_remove_device() returns since it it not guaranteed that at that time all references to the device have already been dropped. A callback in the device/target/host (whatever is needed) release function would do this right? If I understand James right, I think he suggested something like this in another mail. (replying to an e-mail of seven months ago - see also http://thread.gmane.org/gmane.linux.scsi/82572/focus=82822) Hello Mike, Sorry but I'm afraid that making the SCSI core invoke a callback function from a device, target or host release function would create a new challenge, namely making sure that all these callback functions have finished before the module is unloaded that contains the SCSI host template and the code implementing such a callback function. That challenge is not specific to the SCSI infrastructure but is a general question that has not yet been solved in the Linux kernel (see e.g. [PATCH] kobject: provide kobject_put_wait to fix module unload race for a more general discussion about how to ensure that kobject release functions are invoked before the module is unloaded that owns the release function - http://thread.gmane.org/gmane.linux.kernel/1622885). Or maybe this just means that I misunderstood you ? In case it is not clear why I'm reviving this discussion: now that the improved eh timeout handler patch is upstream (commit e494f6a728394ab0df194342549ee20e6f0752df) there is an additional way in which the SCSI core can invoke an EH function concurrently with or after scsi_remove_host() has finished, namely from the TMF work queue (tmf_work_q). Thanks, Bart. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 0/4] ata: Add APM X-Gene SoC SATA host controller support
Hi Tejun, As it had being awhile, any issue with this version of the SATA drivers before I post the follow on errata patches? -Loc On Thu, Jan 16, 2014 at 8:11 AM, Loc Ho l...@apm.com wrote: This patch adds support for the APM X-Gene SoC SATA host controller. In order for the host controller to work, the corresponding PHY driver musts also be available. v10: * Update binding documentation v9: * Remove ACPI/EFI include files * Remove the IO flush support, interrupt routine, and DTS resources * Remove function xgene_rd, xgene_wr, and xgene_wr_flush * Remove PMP support (function xgene_ahci_qc_issue, xgene_ahci_qc_prep, xgene_ahci_qc_fill_rtf, xgene_ahci_softreset, and xgene_ahci_do_softreset) * Rename function xgene_ahci_enable_phy to xgene_ahci_force_phy_rdy * Clean up hardreset functions * Require v7 of the PHY driver v8: * Remove _ADDR from defines * Remove define MSTAWAUX_COHERENT_BYPASS_SET and STARAUX_COHERENT_BYPASS_SET and use direct coding * Remove the un-necessary check for DTS boot with built in ACPI table * Switch to use dma_set_mask_and_coherent for setting DMA mask * Remove ACPI table matching code * Update clock-names for sata01clk, sata23clk, and sata45clk v7: * Update the clock code by toggle the clock * Update the DTS clock mask values due to the clock spilt between host and v5 of the PHY drivers v6: * Update binding documentation * Change select PHY_XGENE_SATA to PHY_XGENE * Add ULL to constants * Change indentation and comments * Clean up the probe functions a bit more * Remove xgene_ahci_remove function * Add the flush register to DTS * Remove the interrupt-parent from DTS v5: * Sync up to v3 of the PHY driver * Remove MSLIM wrapper functions * Change the memory shutdown loop to use usleep_range * Use devm_ioremap_resource instead devm_ioremap * Remove suspend/resume functions as not needed v4: * Remove the ID property in DT * Remove the temporary PHY direct function call and use PHY function * Change printk to pr_debug * Move the IOB flush addresses into the DT * Remove the parameters retrieval function as no longer needed * Remove the header file as no longer needed * Require v2 patch of the SATA PHY driver. Require slightly modification in the Kconfig as it is moved to folder driver/phy and use Kconfig PHY_XGENE_SATA instead SATA_XGENE_PHY. v3: * Move out the SATA PHY to another driver * Remove the clock-cells entry from DTS * Remove debug wrapper * Remove delay functions wrapper * Clean up resource and IRQ query * Remove query clock name * Switch to use dma_set_mask/dma_coherent_mask * Remove un-necessary devm_kfree * Update GPL license header to v2 * Spilt up function xgene_ahci_hardreset * Spilt up function xgene_ahci_probe * Remove all reference of CONFIG_ARCH_MSLIM * Clean up chip revision code v2: * Clean up file sata_xgene.c with Lindent and etc * Clean up file sata_xgene_serdes.c with Lindent and etc * Add description to each patch v1: * inital version Signed-off-by: Loc Ho l...@apm.com Signed-off-by: Tuan Phan tp...@apm.com Signed-off-by: Suman Tripathi stripa...@apm.com --- Loc Ho (4): ata: Export required functions by APM X-Gene SATA driver Documentation: Add documentation for APM X-Gene SoC SATA host controller DTS binding ata: Add APM X-Gene SoC SATA host controller driver arm64: Add APM X-Gene SoC SATA host controller DTS entries .../devicetree/bindings/ata/apm-xgene.txt | 70 +++ arch/arm64/boot/dts/apm-storm.dtsi | 75 +++ drivers/ata/Kconfig|8 + drivers/ata/Makefile |1 + drivers/ata/ahci.h |9 + drivers/ata/libahci.c | 16 +- drivers/ata/sata_xgene.c | 630 7 files changed, 803 insertions(+), 6 deletions(-) create mode 100644 Documentation/devicetree/bindings/ata/apm-xgene.txt create mode 100644 drivers/ata/sata_xgene.c -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] block devices: validate block device capacity
When running the LVM2 testsuite on 32-bit kernel, there are unkillable processes stuck in the kernel consuming 100% CPU: blkid R running 0 2005 1409 0x0004 ce009d00 0082 ffcf c11280ba 0060 560b5dfd 3111 00fe41cb ce009d00 d51cfeb0 001e 0002 0002 c10748c1 0002 c106cca4 Call Trace: [c11280ba] ? radix_tree_next_chunk+0xda/0x2c0 [c10748c1] ? release_pages+0x61/0x160 [c106cca4] ? find_get_pages+0x84/0x100 [c1251fbe] ? _cond_resched+0x1e/0x40 [c10758cb] ? truncate_inode_pages_range+0x12b/0x440 [c1075cb7] ? truncate_inode_pages+0x17/0x20 [c10cf2ba] ? __blkdev_put+0x3a/0x140 [c10d02db] ? blkdev_close+0x1b/0x40 [c10a60b2] ? __fput+0x72/0x1c0 [c1039461] ? task_work_run+0x61/0xa0 [c1253b6f] ? work_notifysig+0x24/0x35 This is caused by the fact that the LVM2 testsuite creates 64TB device. The kernel uses unsigned long to index pages in files and block devices, on 64TB device unsigned long overflows (it can address up to 16TB with 4k pages), causing the infinite loop. On 32-bit architectures, we must limit block device size to PAGE_SIZE*(2^32-1). The bug with untested device size is pervasive across the whole kernel, some drivers test that the device size fits in sector_t, but this test is not sufficient on 32-bit architectures. This patch introduces a new function validate_disk_capacity that tests if the disk capacity is OK for the current kernel and modifies the drivers brd, ide-gd, dm, sd to use it. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- block/genhd.c | 23 +++ drivers/block/brd.c | 15 +++ drivers/ide/ide-gd.c |8 drivers/md/dm-ioctl.c |3 +-- drivers/md/dm-table.c | 14 +- drivers/scsi/sd.c | 20 +++- include/linux/device-mapper.h |2 +- include/linux/genhd.h |2 ++ 8 files changed, 70 insertions(+), 17 deletions(-) Index: linux-2.6-compile/block/genhd.c === --- linux-2.6-compile.orig/block/genhd.c2014-01-30 17:23:15.0 +0100 +++ linux-2.6-compile/block/genhd.c 2014-01-30 19:28:42.0 +0100 @@ -1835,3 +1835,26 @@ static void disk_release_events(struct g WARN_ON_ONCE(disk-ev disk-ev-block != 1); kfree(disk-ev); } + +int validate_disk_capacity(u64 n_sectors, const char **reason) +{ + u64 n_pages; + if (n_sectors 9 9 != n_sectors) { + if (reason) + *reason = The number of bytes is greater than 2^64.; + return -EOVERFLOW; + } + n_pages = (n_sectors + (1 (PAGE_SHIFT - 9)) - 1) (PAGE_SHIFT - 9); + if (n_pages ULONG_MAX) { + if (reason) + *reason = Use 64-bit kernel.; + return -EFBIG; + } + if (n_sectors != (sector_t)n_sectors) { + if (reason) + *reason = Use a kernel compiled with support for large block devices.; + return -ENOSPC; + } + return 0; +} +EXPORT_SYMBOL(validate_disk_capacity); Index: linux-2.6-compile/drivers/block/brd.c === --- linux-2.6-compile.orig/drivers/block/brd.c 2014-01-30 17:23:15.0 +0100 +++ linux-2.6-compile/drivers/block/brd.c 2014-01-30 19:26:51.0 +0100 @@ -429,12 +429,12 @@ static const struct block_device_operati * And now the modules code and kernel interface. */ static int rd_nr; -int rd_size = CONFIG_BLK_DEV_RAM_SIZE; +static unsigned rd_size = CONFIG_BLK_DEV_RAM_SIZE; static int max_part; static int part_shift; module_param(rd_nr, int, S_IRUGO); MODULE_PARM_DESC(rd_nr, Maximum number of brd devices); -module_param(rd_size, int, S_IRUGO); +module_param(rd_size, uint, S_IRUGO); MODULE_PARM_DESC(rd_size, Size of each RAM disk in kbytes.); module_param(max_part, int, S_IRUGO); MODULE_PARM_DESC(max_part, Maximum number of partitions per RAM disk); @@ -446,7 +446,7 @@ MODULE_ALIAS(rd); /* Legacy boot options - nonmodular */ static int __init ramdisk_size(char *str) { - rd_size = simple_strtol(str, NULL, 0); + rd_size = simple_strtoul(str, NULL, 0); return 1; } __setup(ramdisk_size=, ramdisk_size); @@ -463,6 +463,13 @@ static struct brd_device *brd_alloc(int { struct brd_device *brd; struct gendisk *disk; + u64 capacity = (u64)rd_size * 2; + const char *reason; + + if (validate_disk_capacity(capacity, reason)) { + printk(KERN_ERR brd: disk is too big: %s\n, reason); + goto out; + } brd = kzalloc(sizeof(*brd), GFP_KERNEL); if (!brd) @@ -493,7 +500,7 @@ static struct brd_device *brd_alloc(int disk-queue = brd-brd_queue; disk-flags |=
Re: [PATCH] block devices: validate block device capacity
On Thu, 2014-01-30 at 15:40 -0500, Mikulas Patocka wrote: When running the LVM2 testsuite on 32-bit kernel, there are unkillable processes stuck in the kernel consuming 100% CPU: blkid R running 0 2005 1409 0x0004 ce009d00 0082 ffcf c11280ba 0060 560b5dfd 3111 00fe41cb ce009d00 d51cfeb0 001e 0002 0002 c10748c1 0002 c106cca4 Call Trace: [c11280ba] ? radix_tree_next_chunk+0xda/0x2c0 [c10748c1] ? release_pages+0x61/0x160 [c106cca4] ? find_get_pages+0x84/0x100 [c1251fbe] ? _cond_resched+0x1e/0x40 [c10758cb] ? truncate_inode_pages_range+0x12b/0x440 [c1075cb7] ? truncate_inode_pages+0x17/0x20 [c10cf2ba] ? __blkdev_put+0x3a/0x140 [c10d02db] ? blkdev_close+0x1b/0x40 [c10a60b2] ? __fput+0x72/0x1c0 [c1039461] ? task_work_run+0x61/0xa0 [c1253b6f] ? work_notifysig+0x24/0x35 This is caused by the fact that the LVM2 testsuite creates 64TB device. The kernel uses unsigned long to index pages in files and block devices, on 64TB device unsigned long overflows (it can address up to 16TB with 4k pages), causing the infinite loop. Why is this? the whole reason for CONFIG_LBDAF is supposed to be to allow 64 bit offsets for block devices on 32 bit. It sounds like there's somewhere not using sector_t ... or using it wrongly which needs fixing. On 32-bit architectures, we must limit block device size to PAGE_SIZE*(2^32-1). So you're saying CONFIG_LBDAF can never work, why? James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] block devices: validate block device capacity
On Thu, 30 Jan 2014, James Bottomley wrote: Why is this? the whole reason for CONFIG_LBDAF is supposed to be to allow 64 bit offsets for block devices on 32 bit. It sounds like there's somewhere not using sector_t ... or using it wrongly which needs fixing. The page cache uses unsigned long as a page index. Therefore, if unsigned long is 32-bit, the block device may have at most 2^32-1 pages. On 32-bit architectures, we must limit block device size to PAGE_SIZE*(2^32-1). So you're saying CONFIG_LBDAF can never work, why? James CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 16TiB (4096*2^32). Mikulas -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] block devices: validate block device capacity
On Thu, 2014-01-30 at 18:10 -0500, Mikulas Patocka wrote: On Thu, 30 Jan 2014, James Bottomley wrote: Why is this? the whole reason for CONFIG_LBDAF is supposed to be to allow 64 bit offsets for block devices on 32 bit. It sounds like there's somewhere not using sector_t ... or using it wrongly which needs fixing. The page cache uses unsigned long as a page index. Therefore, if unsigned long is 32-bit, the block device may have at most 2^32-1 pages. Um, that's the index into the mapping, not the device; a device can have multiple mappings and each mapping has a radix tree of pages. For most filesystems a mapping is equivalent to a file, so we can have large filesystems, but they can't have files over actually 4GB on 32 bits otherwise mmap fails. Are we running into a problems with struct address_space where we've assumed the inode belongs to the file and lvm is doing something where it's the whole device? On 32-bit architectures, we must limit block device size to PAGE_SIZE*(2^32-1). So you're saying CONFIG_LBDAF can never work, why? James CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 16TiB (4096*2^32). I don't think the people who did the large block device work expected to gain only 3 bits for all their pain. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] block devices: validate block device capacity
On Thu, 30 Jan 2014, James Bottomley wrote: On Thu, 2014-01-30 at 18:10 -0500, Mikulas Patocka wrote: On Thu, 30 Jan 2014, James Bottomley wrote: Why is this? the whole reason for CONFIG_LBDAF is supposed to be to allow 64 bit offsets for block devices on 32 bit. It sounds like there's somewhere not using sector_t ... or using it wrongly which needs fixing. The page cache uses unsigned long as a page index. Therefore, if unsigned long is 32-bit, the block device may have at most 2^32-1 pages. Um, that's the index into the mapping, not the device; a device can have multiple mappings and each mapping has a radix tree of pages. For most filesystems a mapping is equivalent to a file, so we can have large filesystems, but they can't have files over actually 4GB on 32 bits otherwise mmap fails. A device may be accessed direcly (by opening /dev/sdX) and it creates a mapping too - thus, the size of a mapping limits the size of a block device. The main problem is that pgoff_t has 4 bytes - chaning it to 8 bytes may fix it - but there may be some hidden places where pgoff is converted to unsigned long - who knows, if they exist or not? Are we running into a problems with struct address_space where we've assumed the inode belongs to the file and lvm is doing something where it's the whole device? lvm creates a 64TiB device, udev runs blkid on that device and blkid opens the device and gets stuck because of unsigned long overflow. On 32-bit architectures, we must limit block device size to PAGE_SIZE*(2^32-1). So you're saying CONFIG_LBDAF can never work, why? James CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 16TiB (4096*2^32). I don't think the people who did the large block device work expected to gain only 3 bits for all their pain. James One could change it to have three choices: 2TiB limit - 32-bit sector_t and 32-bit pgoff_t 16TiB limit - 64-bit sector_t and 32-bit pgoff_t 32PiB limit - 64-bit sector_t and 64-bit pgoff_t Though, we need to know if the people who designed memory management agree with changing pgoff_t to 64 bits. Mikulas -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] block devices: validate block device capacity
On Thu, 2014-01-30 at 19:20 -0500, Mikulas Patocka wrote: On Thu, 30 Jan 2014, James Bottomley wrote: On Thu, 2014-01-30 at 18:10 -0500, Mikulas Patocka wrote: On Thu, 30 Jan 2014, James Bottomley wrote: Why is this? the whole reason for CONFIG_LBDAF is supposed to be to allow 64 bit offsets for block devices on 32 bit. It sounds like there's somewhere not using sector_t ... or using it wrongly which needs fixing. The page cache uses unsigned long as a page index. Therefore, if unsigned long is 32-bit, the block device may have at most 2^32-1 pages. Um, that's the index into the mapping, not the device; a device can have multiple mappings and each mapping has a radix tree of pages. For most filesystems a mapping is equivalent to a file, so we can have large filesystems, but they can't have files over actually 4GB on 32 bits otherwise mmap fails. A device may be accessed direcly (by opening /dev/sdX) and it creates a mapping too - thus, the size of a mapping limits the size of a block device. Right, that's what I suspected below. We can't damage large block support on filesystems just because of this corner case. The main problem is that pgoff_t has 4 bytes - chaning it to 8 bytes may fix it - but there may be some hidden places where pgoff is converted to unsigned long - who knows, if they exist or not? I don't think we want to do that ... it will make struct page fatter and have knock on impacts in the radix tree code. To fix this, we need to make the corner case (i.e. opening large block devices without a filesystem) bear the pain. It sort of looks like we want to do a linear array of mappings of 64TB for the device so the page cache calculations don't overflow. Are we running into a problems with struct address_space where we've assumed the inode belongs to the file and lvm is doing something where it's the whole device? lvm creates a 64TiB device, udev runs blkid on that device and blkid opens the device and gets stuck because of unsigned long overflow. well a simple open won't cause this ... it must be trying to read the end of the device for some reason. But anyway, the way to fix this is to fix the large block open as a corner case. On 32-bit architectures, we must limit block device size to PAGE_SIZE*(2^32-1). So you're saying CONFIG_LBDAF can never work, why? James CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 16TiB (4096*2^32). I don't think the people who did the large block device work expected to gain only 3 bits for all their pain. James One could change it to have three choices: 2TiB limit - 32-bit sector_t and 32-bit pgoff_t 16TiB limit - 64-bit sector_t and 32-bit pgoff_t 32PiB limit - 64-bit sector_t and 64-bit pgoff_t Though, we need to know if the people who designed memory management agree with changing pgoff_t to 64 bits. I don't think we can change the size of pgoff_t ... because it won't just be that, it will be other problems like the radix tree. However, you also have to bear in mind that truncating large block device support to 64TB on 32 bits is a technical ABI break. Hopefully it is only technical because I don't know of any current consumer block device that is 64TB yet, but anyone who'd created a filesystem 64TB would find it no-longer mounted on 32 bits. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] block devices: validate block device capacity
On Thu, 30 Jan 2014, James Bottomley wrote: A device may be accessed direcly (by opening /dev/sdX) and it creates a mapping too - thus, the size of a mapping limits the size of a block device. Right, that's what I suspected below. We can't damage large block support on filesystems just because of this corner case. Devices larger than 16TiB never worked on 32-bit kernel, so this patch isn't damaging anything. Note that if you attach a 16TiB block device, don't open it and mount it, it still won't work, because the buffer cache uses the page cache (see the function __find_get_block_slow and the variable pgoff_t index - that variable would overflow if the filesystem accessed a buffer beyond 16TiB). The main problem is that pgoff_t has 4 bytes - chaning it to 8 bytes may fix it - but there may be some hidden places where pgoff is converted to unsigned long - who knows, if they exist or not? I don't think we want to do that ... it will make struct page fatter and have knock on impacts in the radix tree code. To fix this, we need to make the corner case (i.e. opening large block devices without a filesystem) bear the pain. It sort of looks like we want to do a linear array of mappings of 64TB for the device so the page cache calculations don't overflow. The code that reads and writes data to block devices and files is shared - the functions in mm/filemap.c work for both files and block devices. So, if you want 64-bit page offsets, you need to increase pgoff_t size, and that will increase the limit for both files and block devices. You shouldn't have separate functions for managing pages on files and separate functions for managing pages on block devices - that would increase code size and cause maintenance problems. Though, we need to know if the people who designed memory management agree with changing pgoff_t to 64 bits. I don't think we can change the size of pgoff_t ... because it won't just be that, it will be other problems like the radix tree. If we can't change it, then we must stay with the current 16TiB limit. There's no other way. However, you also have to bear in mind that truncating large block device support to 64TB on 32 bits is a technical ABI break. Hopefully it is only technical because I don't know of any current consumer block device that is 64TB yet, but anyone who'd created a filesystem 64TB would find it no-longer mounted on 32 bits. James It is not ABI break, because block devices larger than 16TiB never worked on 32-bit architectures. So it's better to refuse them outright, than to cause subtle lockups or data corruption. Mikulas -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
On Thu, 2014-01-30 at 20:46 +0100, Bart Van Assche wrote: On 06/25/13 18:13, Michael Christie wrote: On Jun 25, 2013, at 10:31 AM, Bart Van Assche bvanass...@acm.org wrote: On 06/25/13 15:45, James Bottomley wrote: On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote: There is a difference though between moving the EH kthread_stop() call and the patch at the start of this thread: moving the EH kthread_stop() call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_* callback after scsi_remove_host() has finished. However, the scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can cause an eh_* callback to be invoked after scsi_remove_device() finished. OK, but this doesn't tell me what you're trying to achieve. An eh function is allowable as long as the host hadn't had the release callback executed. That means you must have to have a reference to the device/host to execute the eh function, which is currently guaranteed for all invocations. That raises a new question: how is an LLD expected to clean up resources without triggering a race condition ? What you wrote means that it's not safe for an LLD to start cleaning up the resources needed by the eh_* callbacks immediately after scsi_remove_device() returns since it it not guaranteed that at that time all references to the device have already been dropped. A callback in the device/target/host (whatever is needed) release function would do this right? If I understand James right, I think he suggested something like this in another mail. (replying to an e-mail of seven months ago - see also http://thread.gmane.org/gmane.linux.scsi/82572/focus=82822) Hello Mike, Sorry but I'm afraid that making the SCSI core invoke a callback function from a device, target or host release function would create a new challenge, namely making sure that all these callback functions have finished before the module is unloaded that contains the SCSI host template and the code implementing such a callback function. That challenge is not specific to the SCSI infrastructure but is a general question that has not yet been solved in the Linux kernel (see e.g. [PATCH] kobject: provide kobject_put_wait to fix module unload race for a more general discussion about how to ensure that kobject release functions are invoked before the module is unloaded that owns the release function - http://thread.gmane.org/gmane.linux.kernel/1622885). For callbacks, that's easy: it's module_get/module_put Or maybe this just means that I misunderstood you ? In case it is not clear why I'm reviving this discussion: now that the improved eh timeout handler patch is upstream (commit e494f6a728394ab0df194342549ee20e6f0752df) there is an additional way in which the SCSI core can invoke an EH function concurrently with or after scsi_remove_host() has finished, namely from the TMF work queue (tmf_work_q). But the fundamental guarantee is that the eh thread for the host (the eh context if you will) has to be dead before the host can be removed and the module unloaded. The thread doesn't die until all the work is done. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
On 01/31/14 06:58, James Bottomley wrote: On Thu, 2014-01-30 at 20:46 +0100, Bart Van Assche wrote: On 06/25/13 18:13, Michael Christie wrote: Sorry but I'm afraid that making the SCSI core invoke a callback function from a device, target or host release function would create a new challenge, namely making sure that all these callback functions have finished before the module is unloaded that contains the SCSI host template and the code implementing such a callback function. That challenge is not specific to the SCSI infrastructure but is a general question that has not yet been solved in the Linux kernel (see e.g. [PATCH] kobject: provide kobject_put_wait to fix module unload race for a more general discussion about how to ensure that kobject release functions are invoked before the module is unloaded that owns the release function - http://thread.gmane.org/gmane.linux.kernel/1622885). For callbacks, that's easy: it's module_get/module_put Adding a module_get() call in scsi_add_host() and a module_put() call in scsi_host_dev_release() would cause a behavior change that would be reported by end users as system hangs during shutdown. Today it is possible to unload a kernel module via rmmod that created one or more SCSI hosts. Since user space does not remove SCSI hosts during system shutdown, trying to unload a SCSI LLD kernel module that had created one or more SCSI hosts would cause the module unload to fail because of a non-zero module reference count. In case it is not clear why I'm reviving this discussion: now that the improved eh timeout handler patch is upstream (commit e494f6a728394ab0df194342549ee20e6f0752df) there is an additional way in which the SCSI core can invoke an EH function concurrently with or after scsi_remove_host() has finished, namely from the TMF work queue (tmf_work_q). But the fundamental guarantee is that the eh thread for the host (the eh context if you will) has to be dead before the host can be removed and the module unloaded. The thread doesn't die until all the work is done. Maybe I'm misunderstanding something, but as far as I can see kthread_stop(shost-ehandler) is invoked from scsi_host_dev_release(). That last function is called by the last scsi_host_put(). And LLD's invoke scsi_host_put() after scsi_remove_host(). In other words, the SCSI error handler thread and TMF work queue are still active when scsi_remove_host() returns. Should I repost the patch at the start of this thread ? Thanks, Bart. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html