Re: status of drivers/scsi/scsi_tgt_lib.c

2014-01-30 Thread Christoph Hellwig
On Tue, Jan 28, 2014 at 12:58:36PM +0100, Hannes Reinecke wrote:
 I actually know of a customer using this (with a semi-proprietary
 target mode driver). I'll be asking him what his plans are.

They are using the IBM pseries vscsi driver for that?  It they actually
adding out of tree kernel code it doesn't really matter at all for this
discussion..

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: status of drivers/scsi/scsi_tgt_lib.c

2014-01-30 Thread Hannes Reinecke
On 01/30/2014 09:44 AM, Christoph Hellwig wrote:
 On Tue, Jan 28, 2014 at 12:58:36PM +0100, Hannes Reinecke wrote:
 I actually know of a customer using this (with a semi-proprietary
 target mode driver). I'll be asking him what his plans are.
 
 They are using the IBM pseries vscsi driver for that?  It they actually
 adding out of tree kernel code it doesn't really matter at all for this
 discussion..
 
No, not the vscsi driver. But the tgt hooks.

But yeah, I know. As said, I'll be in touch with them.
I just wanted mention that _someone_ uses this code

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes

2014-01-30 Thread Mel Gorman
On Wed, Jan 29, 2014 at 09:52:46PM -0700, Matthew Wilcox wrote:
 On Fri, Jan 24, 2014 at 10:57:48AM +, Mel Gorman wrote:
  So far on the table is
  
  1. major filesystem overhawl
  2. major vm overhawl
  3. use compound pages as they are today and hope it does not go
 completely to hell, reboot when it does
 
 Is the below paragraph an exposition of option 2, or is it an option 4,
 change the VM unit of allocation?

Changing the VM unit of allocation is a major VM overhawl

 Other than the names you're using,
 this is basically what I said to Kirill in an earlier thread; either
 scrap the difference between PAGE_SIZE and PAGE_CACHE_SIZE, or start
 making use of it.
 

No. The PAGE_CACHE_SIZE would depend on the underlying address space and
vary. The large block patchset would have to have done this but I did not
go back and review the patches due to lack of time. With that it starts
hitting into fragmentation problems that have to be addressed somehow and
cannot just be waved away.

 The fact that EVERYBODY in this thread has been using PAGE_SIZE when they
 should have been using PAGE_CACHE_SIZE makes me wonder if part of the
 problem is that the split in naming went the wrong way.  ie use PTE_SIZE
 for 'the amount of memory pointed to by a pte_t' and use PAGE_SIZE for
 'the amount of memory described by a struct page'.
 
 (we need to remove the current users of PTE_SIZE; sparc32 and powerpc32,
 but that's just a detail)
 
 And we need to fix all the places that are currently getting the
 distinction wrong.  SMOP ... ;-)  What would help is correct typing of
 variables, possibly with sparse support to help us out.  Big Job.
 

That's taking the approach of the large block patchset (as I understand
it, not reviewed, not working on this etc) without dealing with potential
fragmentation problems. Of course they could be remapped virtually if
necessary but that will be very constrained on 32-bit, the final transfer
to hardware will require scatter/gather and there is a setup/teardown
cost with virtual mappings such as faulting (setup) and IPIs to flush TLBs
(teardown) that would add overhead.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/9] ipr: Use pci_enable_msi_range() and pci_enable_msix_range()

2014-01-30 Thread Alexander Gordeev
On Wed, Jan 29, 2014 at 02:26:52PM +0100, Alexander Gordeev wrote:
  Do you want me to rediff your patches on top of this one,
  or do you want to keep the entire MSI series together
  and do the rediff? Otherwise the patches seem fine.
 
 I would prefer the former.

Hi Brian,

I am sending the refreshed patches on top of ipr: Handle early EEH.


-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] ipr: Get rid of superfluous call to pci_disable_msi/msix()

2014-01-30 Thread Alexander Gordeev
There is no need to call pci_disable_msi() or pci_disable_msix()
in case the call to pci_enable_msi() or pci_enable_msix() failed.

Signed-off-by: Alexander Gordeev agord...@redhat.com
---
 drivers/scsi/ipr.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 96c10ce..48d0cfc 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -9329,7 +9329,6 @@ static int ipr_enable_msix(struct ipr_ioa_cfg *ioa_cfg)
 
if (err  0) {
ipr_wait_for_pci_err_recovery(ioa_cfg);
-   pci_disable_msix(ioa_cfg-pdev);
return err;
}
 
@@ -9353,7 +9352,6 @@ static int ipr_enable_msi(struct ipr_ioa_cfg *ioa_cfg)
 
if (err  0) {
ipr_wait_for_pci_err_recovery(ioa_cfg);
-   pci_disable_msi(ioa_cfg-pdev);
return err;
}
 
-- 
1.7.7.6

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] ipr: Use pci_enable_msi_range() and pci_enable_msix_range()

2014-01-30 Thread Alexander Gordeev
As result deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() and pci_enable_msix_range()
interfaces.

Signed-off-by: Alexander Gordeev agord...@redhat.com
---
 drivers/scsi/ipr.c |   47 ++-
 1 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 48d0cfc..70f40ca 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -9317,51 +9317,40 @@ static void ipr_wait_for_pci_err_recovery(struct 
ipr_ioa_cfg *ioa_cfg)
 static int ipr_enable_msix(struct ipr_ioa_cfg *ioa_cfg)
 {
struct msix_entry entries[IPR_MAX_MSIX_VECTORS];
-   int i, err, vectors;
+   int i, vectors;
 
for (i = 0; i  ARRAY_SIZE(entries); ++i)
entries[i].entry = i;
 
-   vectors = ipr_number_of_msix;
-
-   while ((err = pci_enable_msix(ioa_cfg-pdev, entries, vectors))  0)
-   vectors = err;
-
-   if (err  0) {
+   vectors = pci_enable_msix_range(ioa_cfg-pdev,
+   entries, 1, ipr_number_of_msix);
+   if (vectors  0) {
ipr_wait_for_pci_err_recovery(ioa_cfg);
-   return err;
+   return vectors;
}
 
-   if (!err) {
-   for (i = 0; i  vectors; i++)
-   ioa_cfg-vectors_info[i].vec = entries[i].vector;
-   ioa_cfg-nvectors = vectors;
-   }
+   for (i = 0; i  vectors; i++)
+   ioa_cfg-vectors_info[i].vec = entries[i].vector;
+   ioa_cfg-nvectors = vectors;
 
-   return err;
+   return 0;
 }
 
 static int ipr_enable_msi(struct ipr_ioa_cfg *ioa_cfg)
 {
-   int i, err, vectors;
+   int i, vectors;
 
-   vectors = ipr_number_of_msix;
-
-   while ((err = pci_enable_msi_block(ioa_cfg-pdev, vectors))  0)
-   vectors = err;
-
-   if (err  0) {
+   vectors = pci_enable_msi_range(ioa_cfg-pdev, 1, ipr_number_of_msix);
+   if (vectors  0) {
ipr_wait_for_pci_err_recovery(ioa_cfg);
-   return err;
+   return vectors;
}
 
-   if (!err) {
-   for (i = 0; i  vectors; i++)
-   ioa_cfg-vectors_info[i].vec = ioa_cfg-pdev-irq + i;
-   ioa_cfg-nvectors = vectors;
-   }
+   for (i = 0; i  vectors; i++)
+   ioa_cfg-vectors_info[i].vec = ioa_cfg-pdev-irq + i;
+   ioa_cfg-nvectors = vectors;
 
-   return err;
+   return 0;
 }
 
 static void name_msi_vectors(struct ipr_ioa_cfg *ioa_cfg)
@@ -9426,7 +9415,7 @@ static irqreturn_t ipr_test_intr(int irq, void *devp)
  * ipr_test_msi - Test for Message Signaled Interrupt (MSI) support.
  * @pdev:  PCI device struct
  *
- * Description: The return value from pci_enable_msi() can not always be
+ * Description: The return value from pci_enable_msi_range() can not always be
  * trusted.  This routine sets up and initiates a test interrupt to determine
  * if the interrupt is received via the ipr_test_intr() service routine.
  * If the tests fails, the driver will fall back to LSI.
-- 
1.7.7.6

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qla2xxx: eliminate dead code in qla24xx_process_bidir_cmd

2014-01-30 Thread Steven J. Magnani
Coverity reports that the test of req_data_len vs. rsp_data_len is dead code.
This appears to be because the test occurs before any real assignment to
either variable.

Assuming that the sole in-tree execution path (QL_VND_DIAG_IO_CMD submitted
via FC pass-through on a host /dev/bsg/X) does not require the response to
have the same length as the request, all code related to the faulty test
can be removed. If this is not the case, the test should be moved much earlier
in the function since it does not depend on any resource acquitision.

Signed-off-by: Steven J. Magnani st...@digidescorp.com
---
--- linux-3.13/drivers/scsi/qla2xxx/qla_bsg.c   2014-01-29 13:50:02.050802907 
-0600
+++ b/drivers/scsi/qla2xxx/qla_bsg.c2014-01-29 13:53:15.856549874 -0600
@@ -1732,8 +1732,6 @@ qla24xx_process_bidir_cmd(struct fc_bsg_
uint16_t nextlid = 0;
uint32_t tot_dsds;
srb_t *sp = NULL;
-   uint32_t req_data_len = 0;
-   uint32_t rsp_data_len = 0;
 
/* Check the type of the adapter */
if (!IS_BIDI_CAPABLE(ha)) {
@@ -1840,17 +1838,6 @@ qla24xx_process_bidir_cmd(struct fc_bsg_
goto done_unmap_sg;
}
 
-   if (req_data_len != rsp_data_len) {
-   rval = EXT_STATUS_BUSY;
-   ql_log(ql_log_warn, vha, 0x70aa,
-   req_data_len != rsp_data_len\n);
-   goto done_unmap_sg;
-   }
-
-   req_data_len = bsg_job-request_payload.payload_len;
-   rsp_data_len = bsg_job-reply_payload.payload_len;
-
-
/* Alloc SRB structure */
sp = qla2x00_get_sp(vha, (vha-bidir_fcport), GFP_KERNEL);
if (!sp) {
--- linux-3.13/drivers/scsi/qla2xxx/qla_dbg.c   2014-01-29 13:50:49.435230824 
-0600
+++ b/drivers/scsi/qla2xxx/qla_dbg.c2014-01-29 13:53:43.960820829 -0600
@@ -38,7 +38,8 @@
  * |  || 0x7073-0x7075, |
  * |  || 0x707b,0x708c, |
  * |  || 0x70a5,0x70a6, |
- * |  || 0x70a8,0x70ab, |
+ * |  || 0x70a8,|
+ * |  || 0x70aa-0x70ab, |
  * |  || 0x70ad-0x70ae, |
  * |  || 0x70d1-0x70db, |
  * |  || 0x7047,0x703b |
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


libsrp: remove unreachable kfree

2014-01-30 Thread Dave Jones
srp_iu_pool_alloc implements what seems like a standard common exit path with 
gotos
but there's only one way to reach it. Additionally the kfree(q-items) is 
unreachable.

fold the error path back into the if.

Signed-off-by: Dave Jones da...@fedoraproject.org

diff --git a/drivers/scsi/libsrp.c b/drivers/scsi/libsrp.c
index 0707ecdbaa32..f45758f15381 100644
--- a/drivers/scsi/libsrp.c
+++ b/drivers/scsi/libsrp.c
@@ -55,9 +55,12 @@ static int srp_iu_pool_alloc(struct srp_queue *q, size_t max,
q-pool = kcalloc(max, sizeof(struct iu_entry *), GFP_KERNEL);
if (!q-pool)
return -ENOMEM;
+
q-items = kcalloc(max, sizeof(struct iu_entry), GFP_KERNEL);
-   if (!q-items)
-   goto free_pool;
+   if (!q-items) {
+   kfree(q-pool);
+   return -ENOMEM;
+   }
 
spin_lock_init(q-lock);
kfifo_init(q-queue, (void *) q-pool, max * sizeof(void *));
@@ -68,11 +71,6 @@ static int srp_iu_pool_alloc(struct srp_queue *q, size_t max,
iue++;
}
return 0;
-
-   kfree(q-items);
-free_pool:
-   kfree(q-pool);
-   return -ENOMEM;
 }
 
 static void srp_iu_pool_free(struct srp_queue *q)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] target updates for v3.14-rc1

2014-01-30 Thread Nicholas A. Bellinger
Hi Linus!

Here are the target-pending updates for the v3.14-rc1 merge window.

Please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next

The highlights this round include:

  - Add support for SCSI Referrals (Hannes)
  - Add support for T10 DIF into target core (nab + mkp)
  - Add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab)
  - Add support for T10 DIF - bio_integrity passthrough in IBLOCK backend (nab)
  - Prep changes to iser-target for = v3.15 T10 DIF support (Sagi)
  - Add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn)
  - Allow percpu_ida_alloc() to receive task state bitmask (Kent)
  - Fix = v3.12 iscsi-target session reset hung task regression (nab)
  - Fix = v3.13 percpu_ref se_lun-lun_ref_active race (nab)
  - Fix a long-standing network portal creation race (Andy)

Just a heads up that you'll encounter two merge conflicts in
target_core_tpg.c + qla_target.c due to upstream v3.13 bugfixes.  The
resolutions are straightforward enough, but please yell if something
doesn't look right.

Also, Stephen is carrying one extra target_core_iblock.c patch to
address the immutable bio_vec changes that are currently outstanding in
Jens block PULL request.  That patch is here:

linux-next: build failure after merge of the target-updates tree
http://marc.info/?l=linux-nextm=139019553616605w=2

Thank you,

--nab

Andy Grover (7):
  target: Remove unused ua_dev_list member in struct se_ua
  target: Allocate more room for port default groups
  target: Fix sizeof in kmalloc for some default_groups arrays
  target: Refer to u32 luns as unpacked_lun
  target: Rename core_tpg_{pre,post}_addlun for clarity
  target: Don't use void* when passing dev in core_tpg_add_lun
  target/iscsi: Fix network portal creation race

Hannes Reinecke (9):
  target_core_alua: validate ALUA state transition
  target_core_alua: Allocate ALUA metadata on demand
  target_core_alua: store old and pending ALUA state
  target_core_alua: Use workqueue for ALUA transitioning
  target_core: simplify scsi_name_len calculation
  target_core_spc: Include target device descriptor in VPD page 83
  target_core_alua: Referrals infrastructure
  target_core_alua: Referrals configfs integration
  target_core_alua: check for buffer overflow

Kent Overstreet (1):
  percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask

Nicholas Bellinger (24):
  target: Convert inquiry temporary buffer to heap memory
  target: Add DIF related base definitions
  target: Add DIF CHECK_CONDITION ASC/ASCQ exception cases
  target/sbc: Add DIF setup in sbc_check_prot + sbc_parse_cdb
  target/sbc: Add DIF TYPE1+TYPE3 read/write verify emulation
  target/spc: Add protection bit to standard INQUIRY output
  target/spc: Add protection related bits to INQUIRY EVPD=0x86
  target/sbc: Add P_TYPE + PROT_EN bits to READ_CAPACITY_16
  target/spc: Expose ATO bit in control mode page
  target/configfs: Expose protection device attributes
  target: Add protection SGLs to target_submit_cmd_map_sgls
  target/iblock: Add blk_integrity + BIP passthrough support
  target/file: Add DIF protection init/format support
  target/file: Add DIF protection support to fd_execute_rw
  target/rd: Refactor rd_build_device_space + rd_release_device_space
  target/rd: Add support for protection SGL setup + release
  target/rd: Add DIF protection into rd_execute_rw
  tcm_loop: Enable DIF/DIX modes in SCSI host LLD
  qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
  qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport
  iscsi-target: Pre-allocate more tags to avoid ack starvation
  iscsi-target: Fix connection reset hang with percpu_ida_alloc
  iscsi-target: Convert gfp_t parameter to task state bitmask
  target: Fix percpu_ref_put race in transport_lun_remove_cmd

Rashika Kheria (4):
  drivers: target: Move prototype declaration of function to header
file target_core_pr.h
  drivers: target: Mark function as static in target_core_iblock.c
  drivers: target: Mark functions as static in tcm_loop.c
  drivers: target: Mark functions and structures as static in
tfc_conf.c

Sagi Grimberg (5):
  IB/isert: seperate connection protection domains and dma MRs
  IB/isert: Avoid frwr notation, user fastreg
  IB/isert: Move fastreg descriptor creation to a function
  IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
  target: Report bad sector in sense data for DIF errors

Saurav Kashyap (1):
  qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with
TCM/LIO.

 block/blk-mq-tag.c   |6 +-
 drivers/infiniband/ulp/isert/ib_isert.c  |  222 +-
 drivers/infiniband/ulp/isert/ib_isert.h  |   10 +-
 drivers/scsi/qla2xxx/qla_attr.c  |2 +
 drivers/scsi/qla2xxx/qla_def.h   |   12 +-
 drivers/scsi/qla2xxx/qla_target.c|  170 
 drivers/scsi/qla2xxx/qla_target.h|4 

Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2014-01-30 Thread Bart Van Assche
On 06/25/13 18:13, Michael Christie wrote:
 On Jun 25, 2013, at 10:31 AM, Bart Van Assche bvanass...@acm.org wrote:
 On 06/25/13 15:45, James Bottomley wrote:
 On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:
 There is a difference though between moving the EH kthread_stop() call
 and the patch at the start of this thread: moving the EH kthread_stop()
 call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_*
 callback after scsi_remove_host() has finished. However, the
 scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can
 cause an eh_* callback to be invoked after scsi_remove_device() finished.

 OK, but this doesn't tell me what you're trying to achieve.

 An eh function is allowable as long as the host hadn't had the release
 callback executed.  That means you must have to have a reference to the
 device/host to execute the eh function, which is currently guaranteed
 for all invocations.

 That raises a new question: how is an LLD expected to clean up resources
 without triggering a race condition ? What you wrote means that it's not
 safe for an LLD to start cleaning up the resources needed by the eh_*
 callbacks immediately after scsi_remove_device() returns since it it not
 guaranteed that at that time all references to the device have already
 been dropped.

 A callback in the device/target/host (whatever is needed) release function
 would do this right? If I understand James right, I think he suggested
 something like this in another mail.

(replying to an e-mail of seven months ago - see also
http://thread.gmane.org/gmane.linux.scsi/82572/focus=82822)

Hello Mike,

Sorry but I'm afraid that making the SCSI core invoke a callback
function from a device, target or host release function would create a
new challenge, namely making sure that all these callback functions have
finished before the module is unloaded that contains the SCSI host
template and the code implementing such a callback function. That
challenge is not specific to the SCSI infrastructure but is a general
question that has not yet been solved in the Linux kernel (see e.g.
[PATCH] kobject: provide kobject_put_wait to fix module unload race
for a more general discussion about how to ensure that kobject release
functions are invoked before the module is unloaded that owns the
release function - http://thread.gmane.org/gmane.linux.kernel/1622885).

Or maybe this just means that I misunderstood you ?

In case it is not clear why I'm reviving this discussion: now that the
improved eh timeout handler patch is upstream (commit
e494f6a728394ab0df194342549ee20e6f0752df) there is an additional way in
which the SCSI core can invoke an EH function concurrently with or after
scsi_remove_host() has finished, namely from the TMF work queue
(tmf_work_q).

Thanks,

Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 0/4] ata: Add APM X-Gene SoC SATA host controller support

2014-01-30 Thread Loc Ho
Hi Tejun,

As it had being awhile, any issue with this version of the SATA
drivers before I post the follow on errata patches?

-Loc

On Thu, Jan 16, 2014 at 8:11 AM, Loc Ho l...@apm.com wrote:
 This patch adds support for the APM X-Gene SoC SATA host controller. In
 order for the host controller to work, the corresponding PHY driver
 musts also be available.

 v10:
  * Update binding documentation

 v9:
  * Remove ACPI/EFI include files
  * Remove the IO flush support, interrupt routine, and DTS resources
  * Remove function xgene_rd, xgene_wr, and xgene_wr_flush
  * Remove PMP support (function xgene_ahci_qc_issue, xgene_ahci_qc_prep,
xgene_ahci_qc_fill_rtf, xgene_ahci_softreset, and xgene_ahci_do_softreset)
  * Rename function xgene_ahci_enable_phy to xgene_ahci_force_phy_rdy
  * Clean up hardreset functions
  * Require v7 of the PHY driver

 v8:
  * Remove _ADDR from defines
  * Remove define MSTAWAUX_COHERENT_BYPASS_SET and
STARAUX_COHERENT_BYPASS_SET and use direct coding
  * Remove the un-necessary check for DTS boot with built in ACPI table
  * Switch to use dma_set_mask_and_coherent for setting DMA mask
  * Remove ACPI table matching code
  * Update clock-names for sata01clk, sata23clk, and sata45clk

 v7:
  * Update the clock code by toggle the clock
  * Update the DTS clock mask values due to the clock spilt between host and
v5 of the PHY drivers

 v6:
  * Update binding documentation
  * Change select PHY_XGENE_SATA to PHY_XGENE
  * Add ULL to constants
  * Change indentation and comments
  * Clean up the probe functions a bit more
  * Remove xgene_ahci_remove function
  * Add the flush register to DTS
  * Remove the interrupt-parent from DTS

 v5:
  * Sync up to v3 of the PHY driver
  * Remove MSLIM wrapper functions
  * Change the memory shutdown loop to use usleep_range
  * Use devm_ioremap_resource instead devm_ioremap
  * Remove suspend/resume functions as not needed

 v4:
  * Remove the ID property in DT
  * Remove the temporary PHY direct function call and use PHY function
  * Change printk to pr_debug
  * Move the IOB flush addresses into the DT
  * Remove the parameters retrieval function as no longer needed
  * Remove the header file as no longer needed
  * Require v2 patch of the SATA PHY driver. Require slightly modification
in the Kconfig as it is moved to folder driver/phy and use Kconfig
PHY_XGENE_SATA instead SATA_XGENE_PHY.

 v3:
  * Move out the SATA PHY to another driver
  * Remove the clock-cells entry from DTS
  * Remove debug wrapper
  * Remove delay functions wrapper
  * Clean up resource and IRQ query
  * Remove query clock name
  * Switch to use dma_set_mask/dma_coherent_mask
  * Remove un-necessary devm_kfree
  * Update GPL license header to v2
  * Spilt up function xgene_ahci_hardreset
  * Spilt up function xgene_ahci_probe
  * Remove all reference of CONFIG_ARCH_MSLIM
  * Clean up chip revision code

 v2:
  * Clean up file sata_xgene.c with Lindent and etc
  * Clean up file sata_xgene_serdes.c with Lindent and etc
  * Add description to each patch

 v1:
  * inital version

 Signed-off-by: Loc Ho l...@apm.com
 Signed-off-by: Tuan Phan tp...@apm.com
 Signed-off-by: Suman Tripathi stripa...@apm.com
 ---
 Loc Ho (4):
   ata: Export required functions by APM X-Gene SATA driver
   Documentation: Add documentation for APM X-Gene SoC SATA host
 controller DTS binding
   ata: Add APM X-Gene SoC SATA host controller driver
   arm64: Add APM X-Gene SoC SATA host controller DTS entries

  .../devicetree/bindings/ata/apm-xgene.txt  |   70 +++
  arch/arm64/boot/dts/apm-storm.dtsi |   75 +++
  drivers/ata/Kconfig|8 +
  drivers/ata/Makefile   |1 +
  drivers/ata/ahci.h |9 +
  drivers/ata/libahci.c  |   16 +-
  drivers/ata/sata_xgene.c   |  630 
 
  7 files changed, 803 insertions(+), 6 deletions(-)
  create mode 100644 Documentation/devicetree/bindings/ata/apm-xgene.txt
  create mode 100644 drivers/ata/sata_xgene.c

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] block devices: validate block device capacity

2014-01-30 Thread Mikulas Patocka
When running the LVM2 testsuite on 32-bit kernel, there are unkillable
processes stuck in the kernel consuming 100% CPU:
blkid   R running  0  2005   1409 0x0004
ce009d00 0082 ffcf c11280ba 0060 560b5dfd 3111 00fe41cb
 ce009d00  d51cfeb0  001e 0002 
0002 c10748c1 0002 c106cca4    
Call Trace:
[c11280ba] ? radix_tree_next_chunk+0xda/0x2c0
[c10748c1] ? release_pages+0x61/0x160
[c106cca4] ? find_get_pages+0x84/0x100
[c1251fbe] ? _cond_resched+0x1e/0x40
[c10758cb] ? truncate_inode_pages_range+0x12b/0x440
[c1075cb7] ? truncate_inode_pages+0x17/0x20
[c10cf2ba] ? __blkdev_put+0x3a/0x140
[c10d02db] ? blkdev_close+0x1b/0x40
[c10a60b2] ? __fput+0x72/0x1c0
[c1039461] ? task_work_run+0x61/0xa0
[c1253b6f] ? work_notifysig+0x24/0x35

This is caused by the fact that the LVM2 testsuite creates 64TB device.
The kernel uses unsigned long to index pages in files and block devices,
on 64TB device unsigned long overflows (it can address up to 16TB with
4k pages), causing the infinite loop.

On 32-bit architectures, we must limit block device size to
PAGE_SIZE*(2^32-1).

The bug with untested device size is pervasive across the whole kernel, 
some drivers test that the device size fits in sector_t, but this test is 
not sufficient on 32-bit architectures. This patch introduces a new 
function validate_disk_capacity that tests if the disk capacity is OK for 
the current kernel and modifies the drivers brd, ide-gd, dm, sd to use it.

Signed-off-by: Mikulas Patocka mpato...@redhat.com

---
 block/genhd.c |   23 +++
 drivers/block/brd.c   |   15 +++
 drivers/ide/ide-gd.c  |8 
 drivers/md/dm-ioctl.c |3 +--
 drivers/md/dm-table.c |   14 +-
 drivers/scsi/sd.c |   20 +++-
 include/linux/device-mapper.h |2 +-
 include/linux/genhd.h |2 ++
 8 files changed, 70 insertions(+), 17 deletions(-)

Index: linux-2.6-compile/block/genhd.c
===
--- linux-2.6-compile.orig/block/genhd.c2014-01-30 17:23:15.0 
+0100
+++ linux-2.6-compile/block/genhd.c 2014-01-30 19:28:42.0 +0100
@@ -1835,3 +1835,26 @@ static void disk_release_events(struct g
WARN_ON_ONCE(disk-ev  disk-ev-block != 1);
kfree(disk-ev);
 }
+
+int validate_disk_capacity(u64 n_sectors, const char **reason)
+{
+   u64 n_pages;
+   if (n_sectors  9  9 != n_sectors) {
+   if (reason)
+   *reason = The number of bytes is greater than 2^64.;
+   return -EOVERFLOW;
+   }
+   n_pages = (n_sectors + (1  (PAGE_SHIFT - 9)) - 1)  (PAGE_SHIFT - 9);
+   if (n_pages  ULONG_MAX) {
+   if (reason)
+   *reason = Use 64-bit kernel.;
+   return -EFBIG;
+   }
+   if (n_sectors != (sector_t)n_sectors) {
+   if (reason)
+   *reason = Use a kernel compiled with support for large 
block devices.;
+   return -ENOSPC;
+   }
+   return 0;
+}
+EXPORT_SYMBOL(validate_disk_capacity);
Index: linux-2.6-compile/drivers/block/brd.c
===
--- linux-2.6-compile.orig/drivers/block/brd.c  2014-01-30 17:23:15.0 
+0100
+++ linux-2.6-compile/drivers/block/brd.c   2014-01-30 19:26:51.0 
+0100
@@ -429,12 +429,12 @@ static const struct block_device_operati
  * And now the modules code and kernel interface.
  */
 static int rd_nr;
-int rd_size = CONFIG_BLK_DEV_RAM_SIZE;
+static unsigned rd_size = CONFIG_BLK_DEV_RAM_SIZE;
 static int max_part;
 static int part_shift;
 module_param(rd_nr, int, S_IRUGO);
 MODULE_PARM_DESC(rd_nr, Maximum number of brd devices);
-module_param(rd_size, int, S_IRUGO);
+module_param(rd_size, uint, S_IRUGO);
 MODULE_PARM_DESC(rd_size, Size of each RAM disk in kbytes.);
 module_param(max_part, int, S_IRUGO);
 MODULE_PARM_DESC(max_part, Maximum number of partitions per RAM disk);
@@ -446,7 +446,7 @@ MODULE_ALIAS(rd);
 /* Legacy boot options - nonmodular */
 static int __init ramdisk_size(char *str)
 {
-   rd_size = simple_strtol(str, NULL, 0);
+   rd_size = simple_strtoul(str, NULL, 0);
return 1;
 }
 __setup(ramdisk_size=, ramdisk_size);
@@ -463,6 +463,13 @@ static struct brd_device *brd_alloc(int
 {
struct brd_device *brd;
struct gendisk *disk;
+   u64 capacity = (u64)rd_size * 2;
+   const char *reason;
+
+   if (validate_disk_capacity(capacity, reason)) {
+   printk(KERN_ERR brd: disk is too big: %s\n, reason);
+   goto out;
+   }
 
brd = kzalloc(sizeof(*brd), GFP_KERNEL);
if (!brd)
@@ -493,7 +500,7 @@ static struct brd_device *brd_alloc(int
disk-queue = brd-brd_queue;
disk-flags |= 

Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread James Bottomley
On Thu, 2014-01-30 at 15:40 -0500, Mikulas Patocka wrote:
 When running the LVM2 testsuite on 32-bit kernel, there are unkillable
 processes stuck in the kernel consuming 100% CPU:
 blkid   R running  0  2005   1409 0x0004
 ce009d00 0082 ffcf c11280ba 0060 560b5dfd 3111 00fe41cb
  ce009d00  d51cfeb0  001e 0002 
 0002 c10748c1 0002 c106cca4    
 Call Trace:
 [c11280ba] ? radix_tree_next_chunk+0xda/0x2c0
 [c10748c1] ? release_pages+0x61/0x160
 [c106cca4] ? find_get_pages+0x84/0x100
 [c1251fbe] ? _cond_resched+0x1e/0x40
 [c10758cb] ? truncate_inode_pages_range+0x12b/0x440
 [c1075cb7] ? truncate_inode_pages+0x17/0x20
 [c10cf2ba] ? __blkdev_put+0x3a/0x140
 [c10d02db] ? blkdev_close+0x1b/0x40
 [c10a60b2] ? __fput+0x72/0x1c0
 [c1039461] ? task_work_run+0x61/0xa0
 [c1253b6f] ? work_notifysig+0x24/0x35
 
 This is caused by the fact that the LVM2 testsuite creates 64TB device.
 The kernel uses unsigned long to index pages in files and block devices,
 on 64TB device unsigned long overflows (it can address up to 16TB with
 4k pages), causing the infinite loop.

Why is this?  the whole reason for CONFIG_LBDAF is supposed to be to
allow 64 bit offsets for block devices on 32 bit.  It sounds like
there's somewhere not using sector_t ... or using it wrongly which needs
fixing.

 On 32-bit architectures, we must limit block device size to
 PAGE_SIZE*(2^32-1).

So you're saying CONFIG_LBDAF can never work, why?

James



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread Mikulas Patocka


On Thu, 30 Jan 2014, James Bottomley wrote:

 Why is this?  the whole reason for CONFIG_LBDAF is supposed to be to
 allow 64 bit offsets for block devices on 32 bit.  It sounds like
 there's somewhere not using sector_t ... or using it wrongly which needs
 fixing.

The page cache uses unsigned long as a page index. Therefore, if unsigned 
long is 32-bit, the block device may have at most 2^32-1 pages.

  On 32-bit architectures, we must limit block device size to
  PAGE_SIZE*(2^32-1).
 
 So you're saying CONFIG_LBDAF can never work, why?
 
 James

CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, 
without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 
16TiB (4096*2^32).

Mikulas
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread James Bottomley
On Thu, 2014-01-30 at 18:10 -0500, Mikulas Patocka wrote:
 
 On Thu, 30 Jan 2014, James Bottomley wrote:
 
  Why is this?  the whole reason for CONFIG_LBDAF is supposed to be to
  allow 64 bit offsets for block devices on 32 bit.  It sounds like
  there's somewhere not using sector_t ... or using it wrongly which needs
  fixing.
 
 The page cache uses unsigned long as a page index. Therefore, if unsigned 
 long is 32-bit, the block device may have at most 2^32-1 pages.

Um, that's the index into the mapping, not the device; a device can have
multiple mappings and each mapping has a radix tree of pages.  For most
filesystems a mapping is equivalent to a file, so we can have large
filesystems, but they can't have files over actually 4GB on 32 bits
otherwise mmap fails.

Are we running into a problems with struct address_space where we've
assumed the inode belongs to the file and lvm is doing something where
it's the whole device?

   On 32-bit architectures, we must limit block device size to
   PAGE_SIZE*(2^32-1).
  
  So you're saying CONFIG_LBDAF can never work, why?
  
  James
 
 CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, 
 without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 
 16TiB (4096*2^32).

I don't think the people who did the large block device work expected to
gain only 3 bits for all their pain.

James



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread Mikulas Patocka


On Thu, 30 Jan 2014, James Bottomley wrote:

 On Thu, 2014-01-30 at 18:10 -0500, Mikulas Patocka wrote:
  
  On Thu, 30 Jan 2014, James Bottomley wrote:
  
   Why is this?  the whole reason for CONFIG_LBDAF is supposed to be to
   allow 64 bit offsets for block devices on 32 bit.  It sounds like
   there's somewhere not using sector_t ... or using it wrongly which needs
   fixing.
  
  The page cache uses unsigned long as a page index. Therefore, if unsigned 
  long is 32-bit, the block device may have at most 2^32-1 pages.
 
 Um, that's the index into the mapping, not the device; a device can have
 multiple mappings and each mapping has a radix tree of pages.  For most
 filesystems a mapping is equivalent to a file, so we can have large
 filesystems, but they can't have files over actually 4GB on 32 bits
 otherwise mmap fails.

A device may be accessed direcly (by opening /dev/sdX) and it creates a 
mapping too - thus, the size of a mapping limits the size of a block 
device.

The main problem is that pgoff_t has 4 bytes - chaning it to 8 bytes may 
fix it - but there may be some hidden places where pgoff is converted to 
unsigned long - who knows, if they exist or not?

 Are we running into a problems with struct address_space where we've
 assumed the inode belongs to the file and lvm is doing something where
 it's the whole device?

lvm creates a 64TiB device, udev runs blkid on that device and blkid opens 
the device and gets stuck because of unsigned long overflow.

On 32-bit architectures, we must limit block device size to
PAGE_SIZE*(2^32-1).
   
   So you're saying CONFIG_LBDAF can never work, why?
   
   James
  
  CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, 
  without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 
  16TiB (4096*2^32).
 
 I don't think the people who did the large block device work expected to
 gain only 3 bits for all their pain.
 
 James

One could change it to have three choices:
2TiB limit - 32-bit sector_t and 32-bit pgoff_t
16TiB limit - 64-bit sector_t and 32-bit pgoff_t
32PiB limit - 64-bit sector_t and 64-bit pgoff_t

Though, we need to know if the people who designed memory management agree 
with changing pgoff_t to 64 bits.

Mikulas
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread James Bottomley
On Thu, 2014-01-30 at 19:20 -0500, Mikulas Patocka wrote:
 
 On Thu, 30 Jan 2014, James Bottomley wrote:
 
  On Thu, 2014-01-30 at 18:10 -0500, Mikulas Patocka wrote:
   
   On Thu, 30 Jan 2014, James Bottomley wrote:
   
Why is this?  the whole reason for CONFIG_LBDAF is supposed to be to
allow 64 bit offsets for block devices on 32 bit.  It sounds like
there's somewhere not using sector_t ... or using it wrongly which needs
fixing.
   
   The page cache uses unsigned long as a page index. Therefore, if unsigned 
   long is 32-bit, the block device may have at most 2^32-1 pages.
  
  Um, that's the index into the mapping, not the device; a device can have
  multiple mappings and each mapping has a radix tree of pages.  For most
  filesystems a mapping is equivalent to a file, so we can have large
  filesystems, but they can't have files over actually 4GB on 32 bits
  otherwise mmap fails.
 
 A device may be accessed direcly (by opening /dev/sdX) and it creates a 
 mapping too - thus, the size of a mapping limits the size of a block 
 device.

Right, that's what I suspected below.  We can't damage large block
support on filesystems just because of this corner case.

 The main problem is that pgoff_t has 4 bytes - chaning it to 8 bytes may 
 fix it - but there may be some hidden places where pgoff is converted to 
 unsigned long - who knows, if they exist or not?

I don't think we want to do that ... it will make struct page fatter and
have knock on impacts in the radix tree code.  To fix this, we need to
make the corner case (i.e. opening large block devices without a
filesystem) bear the pain.  It sort of looks like we want to do a linear
array of mappings of 64TB for the device so the page cache calculations
don't overflow.

  Are we running into a problems with struct address_space where we've
  assumed the inode belongs to the file and lvm is doing something where
  it's the whole device?
 
 lvm creates a 64TiB device, udev runs blkid on that device and blkid opens 
 the device and gets stuck because of unsigned long overflow.

well a simple open won't cause this ... it must be trying to read the
end of the device for some reason.  But anyway, the way to fix this is
to fix the large block open as a corner case.

 On 32-bit architectures, we must limit block device size to
 PAGE_SIZE*(2^32-1).

So you're saying CONFIG_LBDAF can never work, why?

James
   
   CONFIG_LBDAF works, but it doesn't allow unlimited capacity: on x86, 
   without CONFIG_LBDAF, the limit is 2TiB. With CONFIG_LBDAF, the limit is 
   16TiB (4096*2^32).
  
  I don't think the people who did the large block device work expected to
  gain only 3 bits for all their pain.
  
  James
 
 One could change it to have three choices:
 2TiB limit - 32-bit sector_t and 32-bit pgoff_t
 16TiB limit - 64-bit sector_t and 32-bit pgoff_t
 32PiB limit - 64-bit sector_t and 64-bit pgoff_t
 
 Though, we need to know if the people who designed memory management agree 
 with changing pgoff_t to 64 bits.

I don't think we can change the size of pgoff_t ... because it won't
just be that, it will be other problems like the radix tree.

However, you also have to bear in mind that truncating large block
device support to 64TB on 32 bits is a technical ABI break.  Hopefully
it is only technical because I don't know of any current consumer block
device that is 64TB yet, but anyone who'd created a filesystem 64TB
would find it no-longer mounted on 32 bits.
James

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block devices: validate block device capacity

2014-01-30 Thread Mikulas Patocka


On Thu, 30 Jan 2014, James Bottomley wrote:

  A device may be accessed direcly (by opening /dev/sdX) and it creates a 
  mapping too - thus, the size of a mapping limits the size of a block 
  device.
 
 Right, that's what I suspected below.  We can't damage large block
 support on filesystems just because of this corner case.

Devices larger than 16TiB never worked on 32-bit kernel, so this patch 
isn't damaging anything.

Note that if you attach a 16TiB block device, don't open it and mount it, 
it still won't work, because the buffer cache uses the page cache (see the 
function __find_get_block_slow and the variable pgoff_t index - that 
variable would overflow if the filesystem accessed a buffer beyond 16TiB).

  The main problem is that pgoff_t has 4 bytes - chaning it to 8 bytes may 
  fix it - but there may be some hidden places where pgoff is converted to 
  unsigned long - who knows, if they exist or not?
 
 I don't think we want to do that ... it will make struct page fatter and
 have knock on impacts in the radix tree code.  To fix this, we need to
 make the corner case (i.e. opening large block devices without a
 filesystem) bear the pain.  It sort of looks like we want to do a linear
 array of mappings of 64TB for the device so the page cache calculations
 don't overflow.

The code that reads and writes data to block devices and files is shared - 
the functions in mm/filemap.c work for both files and block devices.

So, if you want 64-bit page offsets, you need to increase pgoff_t size, 
and that will increase the limit for both files and block devices.

You shouldn't have separate functions for managing pages on files and 
separate functions for managing pages on block devices - that would 
increase code size and cause maintenance problems.

  Though, we need to know if the people who designed memory management agree 
  with changing pgoff_t to 64 bits.
 
 I don't think we can change the size of pgoff_t ... because it won't
 just be that, it will be other problems like the radix tree.

If we can't change it, then we must stay with the current 16TiB limit. 
There's no other way.

 However, you also have to bear in mind that truncating large block
 device support to 64TB on 32 bits is a technical ABI break.  Hopefully
 it is only technical because I don't know of any current consumer block
 device that is 64TB yet, but anyone who'd created a filesystem 64TB
 would find it no-longer mounted on 32 bits.
 James

It is not ABI break, because block devices larger than 16TiB never worked 
on 32-bit architectures. So it's better to refuse them outright, than to 
cause subtle lockups or data corruption.

Mikulas
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2014-01-30 Thread James Bottomley
On Thu, 2014-01-30 at 20:46 +0100, Bart Van Assche wrote:
 On 06/25/13 18:13, Michael Christie wrote:
  On Jun 25, 2013, at 10:31 AM, Bart Van Assche bvanass...@acm.org wrote:
  On 06/25/13 15:45, James Bottomley wrote:
  On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:
  There is a difference though between moving the EH kthread_stop() call
  and the patch at the start of this thread: moving the EH kthread_stop()
  call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_*
  callback after scsi_remove_host() has finished. However, the
  scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can
  cause an eh_* callback to be invoked after scsi_remove_device() finished.
 
  OK, but this doesn't tell me what you're trying to achieve.
 
  An eh function is allowable as long as the host hadn't had the release
  callback executed.  That means you must have to have a reference to the
  device/host to execute the eh function, which is currently guaranteed
  for all invocations.
 
  That raises a new question: how is an LLD expected to clean up resources
  without triggering a race condition ? What you wrote means that it's not
  safe for an LLD to start cleaning up the resources needed by the eh_*
  callbacks immediately after scsi_remove_device() returns since it it not
  guaranteed that at that time all references to the device have already
  been dropped.
 
  A callback in the device/target/host (whatever is needed) release function
  would do this right? If I understand James right, I think he suggested
  something like this in another mail.
 
 (replying to an e-mail of seven months ago - see also
 http://thread.gmane.org/gmane.linux.scsi/82572/focus=82822)
 
 Hello Mike,
 
 Sorry but I'm afraid that making the SCSI core invoke a callback
 function from a device, target or host release function would create a
 new challenge, namely making sure that all these callback functions have
 finished before the module is unloaded that contains the SCSI host
 template and the code implementing such a callback function. That
 challenge is not specific to the SCSI infrastructure but is a general
 question that has not yet been solved in the Linux kernel (see e.g.
 [PATCH] kobject: provide kobject_put_wait to fix module unload race
 for a more general discussion about how to ensure that kobject release
 functions are invoked before the module is unloaded that owns the
 release function - http://thread.gmane.org/gmane.linux.kernel/1622885).

For callbacks, that's easy: it's module_get/module_put

 Or maybe this just means that I misunderstood you ?
 
 In case it is not clear why I'm reviving this discussion: now that the
 improved eh timeout handler patch is upstream (commit
 e494f6a728394ab0df194342549ee20e6f0752df) there is an additional way in
 which the SCSI core can invoke an EH function concurrently with or after
 scsi_remove_host() has finished, namely from the TMF work queue
 (tmf_work_q).

But the fundamental guarantee is that the eh thread for the host (the eh
context if you will) has to be dead before the host can be removed and
the module unloaded.  The thread doesn't die until all the work is done.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2014-01-30 Thread Bart Van Assche
On 01/31/14 06:58, James Bottomley wrote:
 On Thu, 2014-01-30 at 20:46 +0100, Bart Van Assche wrote:
 On 06/25/13 18:13, Michael Christie wrote:
 Sorry but I'm afraid that making the SCSI core invoke a callback
 function from a device, target or host release function would create a
 new challenge, namely making sure that all these callback functions have
 finished before the module is unloaded that contains the SCSI host
 template and the code implementing such a callback function. That
 challenge is not specific to the SCSI infrastructure but is a general
 question that has not yet been solved in the Linux kernel (see e.g.
 [PATCH] kobject: provide kobject_put_wait to fix module unload race
 for a more general discussion about how to ensure that kobject release
 functions are invoked before the module is unloaded that owns the
 release function - http://thread.gmane.org/gmane.linux.kernel/1622885).
 
 For callbacks, that's easy: it's module_get/module_put

Adding a module_get() call in scsi_add_host() and a module_put() call in
scsi_host_dev_release() would cause a behavior change that would be
reported by end users as system hangs during shutdown. Today it is
possible to unload a kernel module via rmmod that created one or more
SCSI hosts. Since user space does not remove SCSI hosts during system
shutdown, trying to unload a SCSI LLD kernel module that had created one
or more SCSI hosts would cause the module unload to fail because of a
non-zero module reference count.

 In case it is not clear why I'm reviving this discussion: now that the
 improved eh timeout handler patch is upstream (commit
 e494f6a728394ab0df194342549ee20e6f0752df) there is an additional way in
 which the SCSI core can invoke an EH function concurrently with or after
 scsi_remove_host() has finished, namely from the TMF work queue
 (tmf_work_q).
 
 But the fundamental guarantee is that the eh thread for the host (the eh
 context if you will) has to be dead before the host can be removed and
 the module unloaded.  The thread doesn't die until all the work is done.

Maybe I'm misunderstanding something, but as far as I can see
kthread_stop(shost-ehandler) is invoked from scsi_host_dev_release().
That last function is called by the last scsi_host_put(). And LLD's
invoke scsi_host_put() after scsi_remove_host(). In other words, the
SCSI error handler thread and TMF work queue are still active when
scsi_remove_host() returns.

Should I repost the patch at the start of this thread ?

Thanks,

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html