[PATCH] x86/Kconfig: compile 32bit drivers for COMPILE_TEST

2017-08-28 Thread Hannes Reinecke
When COMPILE_TEST is set we should be compiling 32-bit only
drivers and features (like ISA support etc), too.
After all, it's a compile test.

Signed-off-by: Hannes Reinecke 
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..df61bc37b1a2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2548,7 +2548,7 @@ config ISA_DMA_API
  Enables ISA-style DMA support for devices requiring such controllers.
  If unsure, say Y.
 
-if X86_32
+if X86_32 || COMPILE_TEST
 
 config ISA
bool "ISA support"
-- 
2.12.3



Re: [PATCH v2 00/13] mpt3sas driver NVMe support:

2017-08-02 Thread Hannes Reinecke
On 08/02/2017 10:14 AM, Hannes Reinecke wrote:
> On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
>> Ventura Series controller are Tri-mode. The controller and
>> firmware are capable of supporting NVMe devices and
>> PCIe switches to be connected with the controller. This
>> patch set adds driver level support for NVMe devices and
>> PCIe switches.
>>
>> Suganath Prabu S (13):
>>   mpt3sas: Update MPI Header
>>   mpt3sas: Add nvme device support in slave alloc, target alloc and
>> probe
>>   mpt3sas: SGL to PRP Translation for I/Os to NVMe  devices
>>   mpt3sas: Added support for nvme encapsulated request message.
>>   mpt3sas: API 's to support NVMe drive addition to SML
>>   mpt3sas: API's to remove nvme drive from sml
>>   mpt3sas: Handle NVMe PCIe device related events generated
>>from firmware.
>>   mpt3sas: Set NVMe device queue depth as 128
>>   mpt3sas: scan and add nvme device after controller reset
>>   mpt3as: Add-Task-management-debug-info-for-NVMe-drives.
>>   mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log
>> info
>>   mpt3sas: Fix nvme drives checking for tlr.
>>   mpt3sas: Update mpt3sas driver version.
>>
>>  drivers/scsi/mpt3sas/mpi/mpi2.h  |   43 +-
>>  drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h |  647 ++-
>>  drivers/scsi/mpt3sas/mpi/mpi2_init.h |   11 +-
>>  drivers/scsi/mpt3sas/mpi/mpi2_ioc.h  |  331 ++-
>>  drivers/scsi/mpt3sas/mpi/mpi2_pci.h  |  142 +++
>>  drivers/scsi/mpt3sas/mpi/mpi2_tool.h |   14 +-
>>  drivers/scsi/mpt3sas/mpt3sas_base.c  |  710 +++-
>>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  171 +++-
>>  drivers/scsi/mpt3sas/mpt3sas_config.c|  100 ++
>>  drivers/scsi/mpt3sas/mpt3sas_ctl.c   |  158 ++-
>>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1874 
>> --
>>  drivers/scsi/mpt3sas/mpt3sas_warpdrive.c |2 +-
>>  12 files changed, 4063 insertions(+), 140 deletions(-)
>>  create mode 100644 drivers/scsi/mpt3sas/mpi/mpi2_pci.h
>>
> I'm not happy with this approach.
> NVMe devices should _not_ appear as SCSI devices; this will just confuse
> matters _and_ will be incompatible with 'normal' NVMe devices.
> 
> Rather I would like to see the driver to hook into the existing NVMe
> framework (which essentially means to treat the mpt3sas as a weird
> NVMe-over-Fabrics HBA), and expose the NVMe devices like any other NVMe HBA.
> 
> I'm sorry that you'll have to redo your patchset (again), but this is
> the only way I see how this patchset can be brought forward.
> 
> I'd be happy to discuss implementation details with you.
> 
After discussion with Broadcom it turns out that the NVMe passthrough
functionality of the current firmware is rather limited and wouldn't be
suitable for full NVMe device support.
And one of the goals is to have intermixed NVMe and SCSI support,
possibly with RAID thrown in for good measure.
So my idea of exposing the NVMe devices directly won't work, and I'll
have to retract my above comment.

But I'll continue to bug Broadcom; maybe we'll be getting NVMe device
support eventually :-)

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 13/13] mpt3sas: Update mpt3sas driver version.

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> Updated mpt3sas driver version to 15.101.00.00
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.h |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
> b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index b7855c8..b705199 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -74,9 +74,9 @@
>  #define MPT3SAS_DRIVER_NAME  "mpt3sas"
>  #define MPT3SAS_AUTHOR "Avago Technologies 
> "
>  #define MPT3SAS_DESCRIPTION  "LSI MPT Fusion SAS 3.0 Device Driver"
> -#define MPT3SAS_DRIVER_VERSION   "15.100.00.00"
> +#define MPT3SAS_DRIVER_VERSION   "15.101.00.00"
>  #define MPT3SAS_MAJOR_VERSION15
> -#define MPT3SAS_MINOR_VERSION100
> +#define MPT3SAS_MINOR_VERSION    101
>  #define MPT3SAS_BUILD_VERSION    0
>  #define MPT3SAS_RELEASE_VERSION  00
>  
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 12/13] mpt3sas: Fix nvme drives checking for tlr.

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> Check for NVMe drives before enabling or checking tlr.
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c |   22 --
>  1 files changed, 16 insertions(+), 6 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 11/13] mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log info

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> * Added debug prints for pcie devices in ioctl debug path. Which
> will be helpful for debugging.
> * Added PCIe device support for ioctl BTDHMAPPING ioctl.
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_ctl.c |   88 +++
>  1 files changed, 58 insertions(+), 30 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 10/13] mpt3as: Add-Task-management-debug-info-for-NVMe-drives.

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> Added debug information for NVMe/PCIe drives in target rest path.
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c |   83 -
>  1 files changed, 70 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 08/13] mpt3sas: Set NVMe device queue depth as 128

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> Sets nvme device queue depth, name and displays device capabilities
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |2 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c |   40 
> ++
>  2 files changed, 41 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
> b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index 0a8187e..b7855c8 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -115,7 +115,7 @@
>  
>  #define MPT3SAS_RAID_MAX_SECTORS 8192
>  #define MPT3SAS_HOST_PAGE_SIZE_4K12
> -
> +#define MPT3SAS_NVME_QUEUE_DEPTH 128
>  #define MPT_NAME_LENGTH  32  /* generic length of 
> strings */
>  #define MPT_STRING_LENGTH64
>  
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
> b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 1dd9674..c5a131f 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -2290,6 +2290,7 @@ scsih_slave_configure(struct scsi_device *sdev)
>   struct MPT3SAS_DEVICE *sas_device_priv_data;
>   struct MPT3SAS_TARGET *sas_target_priv_data;
>   struct _sas_device *sas_device;
> + struct _pcie_device *pcie_device;
>   struct _raid_device *raid_device;
>   unsigned long flags;
>   int qdepth;
> @@ -2420,6 +2421,45 @@ scsih_slave_configure(struct scsi_device *sdev)
>   }
>   }
>  
> + /* PCIe handling */
> + if (sas_target_priv_data->flags & MPT_TARGET_FLAGS_PCIE_DEVICE) {
> + spin_lock_irqsave(&ioc->pcie_device_lock, flags);
> + pcie_device = __mpt3sas_get_pdev_by_wwid(ioc,
> + sas_device_priv_data->sas_target->sas_address);
> + if (!pcie_device) {
> + spin_unlock_irqrestore(&ioc->pcie_device_lock, flags);
> + dfailprintk(ioc, pr_warn(MPT3SAS_FMT
> + "failure at %s:%d/%s()!\n", ioc->name, __FILE__,
> + __LINE__, __func__));
> + return 1;
> + }
> +
> + /*TODO-right Queue Depth?*/
> + qdepth = MPT3SAS_NVME_QUEUE_DEPTH;
> + ds = "NVMe";
> + /*TODO-Add device name when defined*/
> + sdev_printk(KERN_INFO, sdev,
> + "%s: handle(0x%04x), wwid(0x%016llx), port(%d)\n",
> + ds, handle, (unsigned long long)pcie_device->wwid,
> + pcie_device->port_num);
> + if (pcie_device->enclosure_handle != 0)
> + sdev_printk(KERN_INFO, sdev,
> + "%s: enclosure logical id(0x%016llx), slot(%d)\n",
> + ds,
> + (unsigned long long)pcie_device->enclosure_logical_id,
> + pcie_device->slot);
> + if (pcie_device->connector_name[0] != '\0')
> + sdev_printk(KERN_INFO, sdev,
> + "%s: enclosure level(0x%04x),"
> + "connector name( %s)\n", ds,
> + pcie_device->enclosure_level,
> + pcie_device->connector_name);
> + pcie_device_put(pcie_device);
> + spin_unlock_irqrestore(&ioc->pcie_device_lock, flags);
> + scsih_change_queue_depth(sdev, qdepth);
> + return 0;
> + }
> +
>   spin_lock_irqsave(&ioc->sas_device_lock, flags);
>   sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>  sas_device_priv_data->sas_target->sas_address);
> 
Well; what are these TODOs doing here?
If you know things are not correct, why not doing them correctly?
If you cannot do them correctly, why?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 09/13] mpt3sas: scan and add nvme device after controller reset

2017-08-02 Thread Hannes Reinecke
o_cpu(pcie_device_pg0.Flags);
> + pcie_device_pg0.DevHandle = handle;
> + _scsih_mark_responding_pcie_device(ioc, &pcie_device_pg0);
> + }
> +out:
> + pr_info(MPT3SAS_FMT "search for PCIe end-devices: complete\n",
> + ioc->name);
> +}
> +
> +/**
>   * _scsih_mark_responding_raid_device - mark a raid_device as responding
>   * @ioc: per adapter object
>   * @wwid: world wide identifier for raid volume
> @@ -8863,6 +9009,7 @@ _scsih_scan_for_devices_after_reset(struct 
> MPT3SAS_ADAPTER *ioc)
>  {
>   Mpi2ExpanderPage0_t expander_pg0;
>   Mpi2SasDevicePage0_t sas_device_pg0;
> + Mpi26PCIeDevicePage0_t pcie_device_pg0;
>   Mpi2RaidVolPage1_t volume_pg1;
>   Mpi2RaidVolPage0_t volume_pg0;
>   Mpi2RaidPhysDiskPage0_t pd_pg0;
> @@ -8873,6 +9020,7 @@ _scsih_scan_for_devices_after_reset(struct 
> MPT3SAS_ADAPTER *ioc)
>   u16 handle, parent_handle;
>   u64 sas_address;
>   struct _sas_device *sas_device;
> + struct _pcie_device *pcie_device;
>   struct _sas_node *expander_device;
>   static struct _raid_device *raid_device;
>   u8 retry_count;
> @@ -9098,7 +9246,44 @@ _scsih_scan_for_devices_after_reset(struct 
> MPT3SAS_ADAPTER *ioc)
>   }
>   pr_info(MPT3SAS_FMT "\tscan devices: end devices complete\n",
>   ioc->name);
> + pr_info(MPT3SAS_FMT "\tscan devices: pcie end devices start\n",
> + ioc->name);
>  
> + /* pcie devices */
> + handle = 0x;
> + while (!(mpt3sas_config_get_pcie_device_pg0(ioc, &mpi_reply,
> + &pcie_device_pg0, MPI26_PCIE_DEVICE_PGAD_FORM_GET_NEXT_HANDLE,
> + handle))) {
> + ioc_status = le16_to_cpu(mpi_reply.IOCStatus)
> + & MPI2_IOCSTATUS_MASK;
> + if (ioc_status != MPI2_IOCSTATUS_SUCCESS) {
> + pr_info(MPT3SAS_FMT "\tbreak from pcie end device"
> + " scan: ioc_status(0x%04x), loginfo(0x%08x)\n",
> + ioc->name, ioc_status,
> + le32_to_cpu(mpi_reply.IOCLogInfo));
> + break;
> + }
> + handle = le16_to_cpu(pcie_device_pg0.DevHandle);
> + if (!(_scsih_is_nvme_device(
> + le32_to_cpu(pcie_device_pg0.DeviceInfo
> + continue;
> + pcie_device = mpt3sas_get_pdev_by_wwid(ioc,
> + le64_to_cpu(pcie_device_pg0.WWID));
> + if (pcie_device) {
> + pcie_device_put(pcie_device);
> + continue;
> + }
> + retry_count = 0;
> + parent_handle = le16_to_cpu(pcie_device_pg0.ParentDevHandle);
> + _scsih_pcie_add_device(ioc, handle);
> +
> + pr_info(MPT3SAS_FMT "\tAFTER adding pcie end device: "
> + "handle (0x%04x), wwid(0x%016llx)\n", ioc->name,
> + handle,
> + (unsigned long long) le64_to_cpu(pcie_device_pg0.WWID));
> + }
> + pr_info(MPT3SAS_FMT "\tpcie devices: pcie end devices complete\n",
> + ioc->name);
>   pr_info(MPT3SAS_FMT "scan devices: complete\n", ioc->name);
>  }
>  /**
> @@ -9148,6 +9333,7 @@ mpt3sas_scsih_reset_handler(struct MPT3SAS_ADAPTER 
> *ioc, int reset_phase)
>   !ioc->sas_hba.num_phys)) {
>   _scsih_prep_device_scan(ioc);
>   _scsih_search_responding_sas_devices(ioc);
> + _scsih_search_responding_pcie_devices(ioc);
>   _scsih_search_responding_raid_devices(ioc);
>   _scsih_search_responding_expanders(ioc);
>   _scsih_error_recovery_delete_devices(ioc);
> 
Again; when using 'scan_start()/scan_finished()' callbacks this will
probably look a bit differently...

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 07/13] mpt3sas: Handle NVMe PCIe device related events generated from firmware.

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> * The controller firmware sends separate events for NVMe devices and
> PCIe switches similar to existing SAS events.
> 
> * NVMe device detection, addition and removal are reported by the
> firmware through PCIe Topology Change list events.
> 
> * The PCIe device state change events are sent when the firmware
> detects any abnormal conditions with a NVMe device or switch.
> 
> * The enumeration event are sent when the firmware starts PCIe device
> enumeration and stops.
> 
> * This patch has the code change to handle the events and add/remove
> NVMe devices in driver's inventory.
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.c  |   30 ++-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c |  471 
> +-
>  2 files changed, 495 insertions(+), 6 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 06/13] mpt3sas: API's to remove nvme drive from sml

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> Below API's are included in nvme drive remove path.
> _scsih_pcie_device_remove_by_handle
> _scsih_pcie_device_remove_from_sml
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c |  148 
> +-
>  1 files changed, 145 insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 05/13] mpt3sas: API 's to support NVMe drive addition to SML

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> Below Functions are added in various paths to support NVMe
> drive addition.
> 
> _scsih_pcie_add_device
> _scsih_pcie_device_add
> _scsih_pcie_device_init_add
> _scsih_check_pcie_access_status
> _scsih_pcie_check_device
> 
> mpt3sas_get_pdev_by_handle
> 
> mpt3sas_config_get_pcie_device_pg0
> mpt3sas_config_get_pcie_device_pg2
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.h   |   10 +
>  drivers/scsi/mpt3sas/mpt3sas_config.c |  100 +++
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c  |  473 
> -
>  3 files changed, 575 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 04/13] mpt3sas: Added support for nvme encapsulated request message.

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> * Mpt3sas driver uses the NVMe Encapsulated Request message to
> send an NVMe command to an NVMe device attached to the IOC.
> 
> * Normal I/O commands like reads and writes are passed to the
> controller as SCSI commands and the controller has the ability
> to translate the commands to NVMe equivalent.
> 
> * This encapsulated NVMe command is used by applications to send
> direct NVMe commands to NVMe drives or for handling unmap where
> the translation at controller/firmware level is having
> performance issues.
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.c |   56 -
>  drivers/scsi/mpt3sas/mpt3sas_base.h |1 +
>  drivers/scsi/mpt3sas/mpt3sas_ctl.c  |   69 
> ---
>  3 files changed, 119 insertions(+), 7 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 02/13] mpt3sas: Add nvme device support in slave alloc, target alloc and probe

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> 1) Added support for probing pcie device and adding NVMe drives to
> SML and driver's internal list pcie_device_list.
> 
> 2) Added support for determing NVMe as boot device.
> 
> 3) Added nvme device support for call back functions scan_finished
> target_alloc,slave_alloc,target destroy and slave destroy.
> 
>  a) During scan, pcie devices are probed and added to SML to drivers
> internal list.
> 
>  b) target_alloc & slave alloc API's allocates resources for
> (MPT3SAS_TARGET & MPT3SAS_DEVICE) private datas and holds
> information like handle, target_id etc.
> 
>  c) slave_destroy & target_destroy are called when driver unregisters
> or removes device. Also frees allocated resources and info.
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  110 -
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c |  431 
> +++---
>  2 files changed, 507 insertions(+), 34 deletions(-)
> 
Have you considered using 'scan_start()/scan_finished()' SCSI midlayer
callbacks here?
Seeing that you are enumerating the devices internally already that
should give you better control about the scanning process.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 01/13] mpt3sas: Update MPI Header

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> Update MPI Files for NVMe support
> 
> Signed-off-by: Chaitra P B 
> Signed-off-by: Suganath Prabu S 
> ---
>  drivers/scsi/mpt3sas/mpi/mpi2.h  |   43 +++-
>  drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h |  647 
> +-
>  drivers/scsi/mpt3sas/mpi/mpi2_init.h |   11 +-
>  drivers/scsi/mpt3sas/mpi/mpi2_ioc.h  |  331 +-
>  drivers/scsi/mpt3sas/mpi/mpi2_pci.h  |  142 
>  drivers/scsi/mpt3sas/mpi/mpi2_tool.h |   14 +-
>  6 files changed, 1177 insertions(+), 11 deletions(-)
>  create mode 100644 drivers/scsi/mpt3sas/mpi/mpi2_pci.h
> 
> diff --git a/drivers/scsi/mpt3sas/mpi/mpi2.h b/drivers/scsi/mpt3sas/mpi/mpi2.h
> index a9a659f..bc59058 100644
> --- a/drivers/scsi/mpt3sas/mpi/mpi2.h
> +++ b/drivers/scsi/mpt3sas/mpi/mpi2.h
> @@ -8,7 +8,7 @@
>   * scatter/gather formats.
>   * Creation Date:  June 21, 2006
>   *
> - * mpi2.h Version:  02.00.42
> + * mpi2.h Version:  02.00.48
>   *
>   * NOTE: Names (typedefs, defines, etc.) beginning with an MPI25 or Mpi25
>   *   prefix are for use only on MPI v2.5 products, and must not be used
> @@ -103,6 +103,16 @@
>   * 08-25-15  02.00.40  Bumped MPI2_HEADER_VERSION_UNIT.
>   * 12-15-15  02.00.41  Bumped MPI_HEADER_VERSION_UNIT
>   * 01-01-16  02.00.42  Bumped MPI_HEADER_VERSION_UNIT
> + * 04-05-16  02.00.43  Modified  MPI26_DIAG_BOOT_DEVICE_SELECT defines
> + * to be unique within first 32 characters.
> + * Removed AHCI support.
> + * Removed SOP support.
> + * Bumped MPI2_HEADER_VERSION_UNIT.
> + * 04-10-16  02.00.44  Bumped MPI2_HEADER_VERSION_UNIT.
> + * 07-06-16  02.00.45  Bumped MPI2_HEADER_VERSION_UNIT.
> + * 09-02-16  02.00.46  Bumped MPI2_HEADER_VERSION_UNIT.
> + * 11-23-16  02.00.47  Bumped MPI2_HEADER_VERSION_UNIT.
> + * 02-03-17  02.00.48  Bumped MPI2_HEADER_VERSION_UNIT.
>   * --
>   */
>  
> @@ -142,7 +152,7 @@
>  #define MPI2_VERSION_02_06   (0x0206)
>  
>  /*Unit and Dev versioning for this MPI header set */
> -#define MPI2_HEADER_VERSION_UNIT(0x2A)
> +#define MPI2_HEADER_VERSION_UNIT(0x30)
>  #define MPI2_HEADER_VERSION_DEV (0x00)
>  #define MPI2_HEADER_VERSION_UNIT_MASK   (0xFF00)
>  #define MPI2_HEADER_VERSION_UNIT_SHIFT  (8)
> @@ -249,6 +259,12 @@ typedef volatile struct _MPI2_SYSTEM_INTERFACE_REGS {
>  #define MPI2_DIAG_BOOT_DEVICE_SELECT_DEFAULT(0x)
>  #define MPI2_DIAG_BOOT_DEVICE_SELECT_HCDW   (0x0800)
>  
> +/* Defines for V7A/V7R HostDiagnostic Register */
> +#define MPI26_DIAG_BOOT_DEVICE_SEL_64FLASH  (0x)
> +#define MPI26_DIAG_BOOT_DEVICE_SEL_64HCDW   (0x0800)
> +#define MPI26_DIAG_BOOT_DEVICE_SEL_32FLASH  (0x1000)
> +#define MPI26_DIAG_BOOT_DEVICE_SEL_32HCDW   (0x1800)
> +
>  #define MPI2_DIAG_CLEAR_FLASH_BAD_SIG   (0x0400)
>  #define MPI2_DIAG_FORCE_HCB_ON_RESET(0x0200)
>  #define MPI2_DIAG_HCB_MODE  (0x0100)
> @@ -367,6 +383,7 @@ typedef struct _MPI2_DEFAULT_REQUEST_DESCRIPTOR {
>  #define MPI2_REQ_DESCRIPT_FLAGS_DEFAULT_TYPE(0x08)
>  #define MPI2_REQ_DESCRIPT_FLAGS_RAID_ACCELERATOR(0x0A)
>  #define MPI25_REQ_DESCRIPT_FLAGS_FAST_PATH_SCSI_IO  (0x0C)
> +#define MPI26_REQ_DESCRIPT_FLAGS_PCIE_ENCAPSULATED  (0x10)
>  
>  #define MPI2_REQ_DESCRIPT_FLAGS_IOC_FIFO_MARKER (0x01)
>  
> @@ -425,6 +442,13 @@ typedef MPI2_SCSI_IO_REQUEST_DESCRIPTOR
>   Mpi25FastPathSCSIIORequestDescriptor_t,
>   *pMpi25FastPathSCSIIORequestDescriptor_t;
>  
> +/*PCIe Encapsulated Request Descriptor */
> +typedef MPI2_SCSI_IO_REQUEST_DESCRIPTOR
> + MPI26_PCIE_ENCAPSULATED_REQUEST_DESCRIPTOR,
> + *PTR_MPI26_PCIE_ENCAPSULATED_REQUEST_DESCRIPTOR,
> + Mpi26PCIeEncapsulatedRequestDescriptor_t,
> + *pMpi26PCIeEncapsulatedRequestDescriptor_t;
> +
>  /*union of Request Descriptors */
>  typedef union _MPI2_REQUEST_DESCRIPTOR_UNION {
>   MPI2_DEFAULT_REQUEST_DESCRIPTOR Default;
> @@ -433,6 +457,7 @@ typedef union _MPI2_REQUEST_DESCRIPTOR_UNION {
>   MPI2_SCSI_TARGET_REQUEST_DESCRIPTOR SCSITarget;
>   MPI2_RAID_ACCEL_REQUEST_DESCRIPTOR RAIDAccelerator;
>   MPI25_FP_SCSI_IO_REQUEST_DESCRIPTOR FastPathSCSIIO;
> + MPI26_PCIE_ENCAPSULATED_REQUEST_DESCRIPTOR PCIeEncapsulated;
>   U64 Words;
>  } MPI2_REQUEST_DESCRIPTOR_UNION,
>   *PTR_MPI2_REQUEST_DESCRIPTOR_UNION,
> @@ -450,6 +475,7 @@ typedef union _MPI2_REQUEST_DESCRIPTOR_UNION {
>   *  Atomic SCSI Target Request Descriptor
>   *  Atomic RAID Accelerator Request Descriptor
>   *  Atomic Fast Path SCSI IO Request Descriptor
> + *  Atomic PCIe Encapsulated Request Descriptor
>   */
>  
>  /*Atomic Request Descriptor */
> @@ -487,6 +513,7 @@ typedef struct _MPI2_DEFAULT_REPLY_DESCRIPTOR {
>  #define MPI2_RPY_

Re: [PATCH v2 00/13] mpt3sas driver NVMe support:

2017-08-02 Thread Hannes Reinecke
On 07/14/2017 03:22 PM, Suganath Prabu S wrote:
> Ventura Series controller are Tri-mode. The controller and
> firmware are capable of supporting NVMe devices and
> PCIe switches to be connected with the controller. This
> patch set adds driver level support for NVMe devices and
> PCIe switches.
> 
> Suganath Prabu S (13):
>   mpt3sas: Update MPI Header
>   mpt3sas: Add nvme device support in slave alloc, target alloc and
> probe
>   mpt3sas: SGL to PRP Translation for I/Os to NVMe  devices
>   mpt3sas: Added support for nvme encapsulated request message.
>   mpt3sas: API 's to support NVMe drive addition to SML
>   mpt3sas: API's to remove nvme drive from sml
>   mpt3sas: Handle NVMe PCIe device related events generated
>from firmware.
>   mpt3sas: Set NVMe device queue depth as 128
>   mpt3sas: scan and add nvme device after controller reset
>   mpt3as: Add-Task-management-debug-info-for-NVMe-drives.
>   mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log
> info
>   mpt3sas: Fix nvme drives checking for tlr.
>   mpt3sas: Update mpt3sas driver version.
> 
>  drivers/scsi/mpt3sas/mpi/mpi2.h  |   43 +-
>  drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h |  647 ++-
>  drivers/scsi/mpt3sas/mpi/mpi2_init.h |   11 +-
>  drivers/scsi/mpt3sas/mpi/mpi2_ioc.h  |  331 ++-
>  drivers/scsi/mpt3sas/mpi/mpi2_pci.h  |  142 +++
>  drivers/scsi/mpt3sas/mpi/mpi2_tool.h |   14 +-
>  drivers/scsi/mpt3sas/mpt3sas_base.c  |  710 +++-
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  171 +++-
>  drivers/scsi/mpt3sas/mpt3sas_config.c|  100 ++
>  drivers/scsi/mpt3sas/mpt3sas_ctl.c   |  158 ++-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1874 
> --
>  drivers/scsi/mpt3sas/mpt3sas_warpdrive.c |2 +-
>  12 files changed, 4063 insertions(+), 140 deletions(-)
>  create mode 100644 drivers/scsi/mpt3sas/mpi/mpi2_pci.h
> 
I'm not happy with this approach.
NVMe devices should _not_ appear as SCSI devices; this will just confuse
matters _and_ will be incompatible with 'normal' NVMe devices.

Rather I would like to see the driver to hook into the existing NVMe
framework (which essentially means to treat the mpt3sas as a weird
NVMe-over-Fabrics HBA), and expose the NVMe devices like any other NVMe HBA.

I'm sorry that you'll have to redo your patchset (again), but this is
the only way I see how this patchset can be brought forward.

I'd be happy to discuss implementation details with you.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] scsi: sg: only check for dxfer_len greater than 256M

2017-07-27 Thread Hannes Reinecke
On 07/27/2017 09:11 AM, Johannes Thumshirn wrote:
> Don't make any assumptions on the sg_io_hdr_t::dxfer_direction or the
> sg_io_hdr_t::dxferp in order to determine if it is a valid request. The
> only way we can check for bad requests is by checking if the length exceeds
> 256M.
> 
> Signed-off-by: Johannes Thumshirn 
> Fixes: 28676d869bbb (scsi: sg: check for valid direction before starting the
> request)
> Reported-by: Jason L Tibbitts III 
> Tested-by: Jason L Tibbitts III 
> Suggested-by: Doug Gilbert 
> Cc: Doug Gilbert 
> Cc: 
> ---
>  drivers/scsi/sg.c | 31 +--
>  1 file changed, 1 insertion(+), 30 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v3 1/3] nvmet: prefix version configfs file with attr

2017-07-14 Thread Hannes Reinecke
On 07/14/2017 02:53 PM, Johannes Thumshirn wrote:
> The NVMe target's attribute files need an attr prefix in order to have
> nvmetcli recognize them. Add this attribute.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/configfs.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
> index a358ecd93e11..ceee57bb0c24 100644
> --- a/drivers/nvme/target/configfs.c
> +++ b/drivers/nvme/target/configfs.c
> @@ -650,7 +650,7 @@ static ssize_t 
> nvmet_subsys_attr_allow_any_host_store(struct config_item *item,
>  
>  CONFIGFS_ATTR(nvmet_subsys_, attr_allow_any_host);
>  
> -static ssize_t nvmet_subsys_version_show(struct config_item *item,
> +static ssize_t nvmet_subsys_attr_version_show(struct config_item *item,
> char *page)
>  {
>   struct nvmet_subsys *subsys = to_subsys(item);
> @@ -666,7 +666,7 @@ static ssize_t nvmet_subsys_version_show(struct 
> config_item *item,
>   (int)NVME_MINOR(subsys->ver));
>  }
>  
> -static ssize_t nvmet_subsys_version_store(struct config_item *item,
> +static ssize_t nvmet_subsys_attr_version_store(struct config_item *item,
>  const char *page, size_t count)
>  {
>   struct nvmet_subsys *subsys = to_subsys(item);
> @@ -684,11 +684,11 @@ static ssize_t nvmet_subsys_version_store(struct 
> config_item *item,
>  
>   return count;
>  }
> -CONFIGFS_ATTR(nvmet_subsys_, version);
> +CONFIGFS_ATTR(nvmet_subsys_, attr_version);
>  
>  static struct configfs_attribute *nvmet_subsys_attrs[] = {
>   &nvmet_subsys_attr_attr_allow_any_host,
> - &nvmet_subsys_attr_version,
> + &nvmet_subsys_attr_attr_version,
>   NULL,
>  };
>  
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v3 7/7] libsas: release disco mutex during waiting in sas_ex_discover_end_dev

2017-07-13 Thread Hannes Reinecke
On 07/10/2017 09:06 AM, Yijing Wang wrote:
> Disco mutex was introudced to prevent domain rediscovery competing
> with ata error handling(87c8331). If we have already hold the lock
> in sas_revalidate_domain and sync executing probe, deadlock caused,
> because, sas_probe_sata() also need hold disco_mutex. Since disco mutex
> use to prevent revalidata domain happen during ata error handler,
> it should be safe to release disco mutex when sync probe, because
> no new revalidate domain event would be process until the sync return,
> and the current sas revalidate domain finish.
> 
> Signed-off-by: Yijing Wang 
> CC: John Garry 
> CC: Johannes Thumshirn 
> CC: Ewan Milne 
> CC: Christoph Hellwig 
> CC: Tomas Henzl 
> CC: Dan Williams 
> ---
>  drivers/scsi/libsas/sas_expander.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/scsi/libsas/sas_expander.c 
> b/drivers/scsi/libsas/sas_expander.c
> index 9d26c28..077024e 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -776,6 +776,7 @@ static struct domain_device *sas_ex_discover_end_dev(
>   struct ex_phy *phy = &parent_ex->ex_phy[phy_id];
>   struct domain_device *child = NULL;
>   struct sas_rphy *rphy;
> + bool prev_lock;
>   int res;
>  
>   if (phy->attached_sata_host || phy->attached_sata_ps)
> @@ -803,6 +804,7 @@ static struct domain_device *sas_ex_discover_end_dev(
>   sas_ex_get_linkrate(parent, child, phy);
>   sas_device_set_phy(child, phy->port);
>  
> + prev_lock = mutex_is_locked(&child->port->ha->disco_mutex);
>  #ifdef CONFIG_SCSI_SAS_ATA
>   if ((phy->attached_tproto & SAS_PROTOCOL_STP) || 
> phy->attached_sata_dev) {
>   res = sas_get_ata_info(child, phy);
> @@ -832,7 +834,11 @@ static struct domain_device *sas_ex_discover_end_dev(
>   SAS_ADDR(parent->sas_addr), phy_id, res);
>   goto out_list_del;
>   }
> + if (prev_lock)
> + mutex_unlock(&child->port->ha->disco_mutex);
>   sas_disc_wait_completion(child->port, DISCE_PROBE);
> + if (prev_lock)
> + mutex_lock(&child->port->ha->disco_mutex);
>  
>   } else
>  #endif
> @@ -861,7 +867,11 @@ static struct domain_device *sas_ex_discover_end_dev(
>   SAS_ADDR(parent->sas_addr), phy_id, res);
>   goto out_list_del;
>   }
> + if (prev_lock)
> + mutex_unlock(&child->port->ha->disco_mutex);
>   sas_disc_wait_completion(child->port, DISCE_PROBE);
> + if (prev_lock)
> + mutex_lock(&child->port->ha->disco_mutex);
>   } else {
>       SAS_DPRINTK("target proto 0x%x at %016llx:0x%x not handled\n",
>   phy->attached_tproto, SAS_ADDR(parent->sas_addr),
> 
I would rather have an analysis if this really cannot happen; 'should
not' is rather vague. But seeing that it _is_ quite complex:

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v3 6/7] libsas: add wait-complete support to sync discovery event

2017-07-13 Thread Hannes Reinecke
pe == 
> SAS_FANOUT_EXPANDER_DEVICE)
>   sas_unregister_ex_tree(parent->port, 
> child);
> - else
> + else {
> + sas_disc_wait_init(parent->port, 
> DISCE_DESTRUCT);
>   sas_unregister_dev(parent->port, child);
> + sas_disc_wait_completion(parent->port, 
> DISCE_DESTRUCT);
> + }
>   found = child;
>   break;
>   }
> diff --git a/drivers/scsi/libsas/sas_internal.h 
> b/drivers/scsi/libsas/sas_internal.h
> index 890b5d26..09a9b10 100644
> --- a/drivers/scsi/libsas/sas_internal.h
> +++ b/drivers/scsi/libsas/sas_internal.h
> @@ -134,6 +134,33 @@ static inline void sas_port_get(struct asd_sas_port 
> *port)
>   kref_get(&port->ref);
>  }
>  
> +static inline void sas_disc_cancel_sync(struct sas_discovery_event *event)
> +{
> + event->is_sync = false;
> +}
> +
> +static inline void sas_disc_wakeup(struct sas_discovery_event *event)
> +{
> + if (event->is_sync)
> + complete(&event->completion);
> +}
> +
> +static inline void sas_disc_wait_init(struct asd_sas_port *port,
> + enum discover_event event)
> +{
> + port->disc.disc_work[event].is_sync = true;
> + init_completion(&port->disc.disc_work[event].completion);
> +}
> +
> +static inline void sas_disc_wait_completion(struct asd_sas_port *port,
> + enum discover_event event)
> +{
> + if (port->disc.disc_work[event].is_sync) {
> + wait_for_completion(&port->disc.disc_work[event].completion);
> + port->disc.disc_work[event].is_sync = false;
> + }
> +}
> +
>  #ifdef CONFIG_SCSI_SAS_HOST_SMP
>  extern int sas_smp_host_handler(struct Scsi_Host *shost, struct request *req,
>   struct request *rsp);
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index 4bcb9fe..21e9fb140 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -243,6 +243,8 @@ struct sas_discovery_event {
>   struct sas_work work;
>   struct asd_sas_port *port;
>   enum discover_event type;
> + bool is_sync;
> + struct completion completion;
>  };
>  
>  static inline struct sas_discovery_event *to_sas_discovery_event(struct 
> work_struct *work)
> 
Some comments as the earlier patch: please use DECLARE_COMPETION_ONSTACK
here.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v3 5/7] libsas: add a new workqueue to run probe/destruct discovery event

2017-07-13 Thread Hannes Reinecke
On 07/10/2017 09:06 AM, Yijing Wang wrote:
> Sometimes, we want sync libsas probe or destruct in sas discovery work,
> like when libsas revalidate domain. We need to split probe and destruct
> work from the scsi host workqueue.
> 
> Signed-off-by: Yijing Wang 
> CC: John Garry 
> CC: Johannes Thumshirn 
> CC: Ewan Milne 
> CC: Christoph Hellwig 
> CC: Tomas Henzl 
> CC: Dan Williams 
> ---
>  drivers/scsi/libsas/sas_discover.c | 13 -
>  drivers/scsi/libsas/sas_init.c |  8 
>  include/scsi/libsas.h  |  1 +
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v3 4/7] libsas: add sas event wait-complete support

2017-07-13 Thread Hannes Reinecke
> +}
> +
> +static inline void sas_port_get(struct asd_sas_port *port)
> +{
> + if (port && port->is_sync)
> + kref_get(&port->ref);
> +}
> +
>  #ifdef CONFIG_SCSI_SAS_HOST_SMP
>  extern int sas_smp_host_handler(struct Scsi_Host *shost, struct request *req,
>   struct request *rsp);
> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
> index 9326628..d589adb 100644
> --- a/drivers/scsi/libsas/sas_port.c
> +++ b/drivers/scsi/libsas/sas_port.c
> @@ -191,7 +191,9 @@ static void sas_form_port(struct asd_sas_phy *phy)
>   if (si->dft->lldd_port_formed)
>   si->dft->lldd_port_formed(phy);
>  
> + sas_port_wait_init(port);
>   sas_discover_event(phy->port, DISCE_DISCOVER_DOMAIN);
> + sas_port_wait_completion(port);
>  }
>  
>  /**
> @@ -218,7 +220,9 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>   dev->pathways--;
>  
>   if (port->num_phys == 1) {
> + sas_port_wait_init(port);
>   sas_unregister_domain_devices(port, gone);
> + sas_port_wait_completion(port);
>   sas_port_delete(port->port);
>   port->port = NULL;
>   } else {

I would rather use the standard on-stack completion here;
like this:

DECLARE_COMPLETION_ONSTACK(complete);
port->completion = &complete;
sas_unregister_domain_devices(port, gone);
wait_for_completion(&complete);
sas_port_delete(port->port);

which would simplify the above helpers to:

static inline void sas_port_put(struct asd_sas_port *port)
{
if (port->completion)
kref_put(&port->ref, sas_complete_event);
}

and you could do away with the 'is_sync' helper.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v3 3/7] libsas: Use new workqueue to run sas event

2017-07-13 Thread Hannes Reinecke
On 07/10/2017 09:06 AM, Yijing Wang wrote:
> Now all libsas works are queued to scsi host workqueue,
> include sas event work post by LLDD and sas discovery
> work, and a sas hotplug flow may be divided into several
> works, e.g libsas receive a PORTE_BYTES_DMAED event,
> now we process it as following steps:
> sas_form_port  --- run in work in shot workq
>   sas_discover_domain  --- run in another work in shost workq
>   ...
>   sas_probe_devices  --- run in new work in shost workq
> We found during hot-add a device, libsas may need run several
> works in same workqueue to add device in system, the process is
> not atomic, it may interrupt by other sas event works, like
> PHYE_LOSS_OF_SIGNAL. Finally, we would found lots unexpected
> errors. This patch is preparation of execute libsas sas event
> in sync.
> 
> Signed-off-by: Yijing Wang 
> CC: John Garry 
> CC: Johannes Thumshirn 
> CC: Ewan Milne 
> CC: Christoph Hellwig 
> CC: Tomas Henzl 
> CC: Dan Williams 
> ---
>  drivers/scsi/libsas/sas_event.c | 4 ++--
>  drivers/scsi/libsas/sas_init.c  | 7 +++
>  include/scsi/libsas.h   | 1 +
>  3 files changed, 10 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v3 2/7] libsas: remove unused port_gone_completion

2017-07-13 Thread Hannes Reinecke
On 07/10/2017 09:06 AM, Yijing Wang wrote:
> No one uses the port_gone_completion in struct asd_sas_port,
> clean it out.
> 
> Signed-off-by: Yijing Wang 
> ---
>  include/scsi/libsas.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index c41328d..628f48b 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -263,8 +263,6 @@ struct sas_discovery {
>  /* The port struct is Class:RW, driver:RO */
>  struct asd_sas_port {
>  /* private: */
> - struct completion port_gone_completion;
> -
>   struct sas_discovery disc;
>   struct domain_device *port_dev;
>   spinlock_t dev_list_lock;
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v3 1/7] libsas: Use static sas event pool to appease sas event lost

2017-07-13 Thread Hannes Reinecke
t; - clear_bit(PORTE_TIMER_EVENT, &phy->port_events_pending);
> -
>   sas_deform_port(phy, 1);
>  }
>  
> @@ -308,8 +300,6 @@ void sas_porte_hard_reset(struct work_struct *work)
>   struct asd_sas_event *ev = to_asd_sas_event(work);
>   struct asd_sas_phy *phy = ev->phy;
>  
> - clear_bit(PORTE_HARD_RESET, &phy->port_events_pending);
> -
>   sas_deform_port(phy, 1);
>  }
>  
> @@ -353,3 +343,11 @@ void sas_unregister_ports(struct sas_ha_struct *sas_ha)
>   sas_deform_port(sas_ha->sas_phy[i], 0);
>  
>  }
> +
> +const work_func_t sas_port_event_fns[PORT_NUM_EVENTS] = {
> + [PORTE_BYTES_DMAED] = sas_porte_bytes_dmaed,
> + [PORTE_BROADCAST_RCVD] = sas_porte_broadcast_rcvd,
> + [PORTE_LINK_RESET_ERR] = sas_porte_link_reset_err,
> + [PORTE_TIMER_EVENT] = sas_porte_timer_event,
> + [PORTE_HARD_RESET] = sas_porte_hard_reset,
> +};
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index cfaeed2..c41328d 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -229,6 +229,8 @@ struct domain_device {
>  struct sas_work {
>   struct list_head drain_node;
>   struct work_struct work;
> + struct list_head list;
> + bool used;
>  };
>  
>  static inline void INIT_SAS_WORK(struct sas_work *sw, void (*fn)(struct 
> work_struct *))
> @@ -300,6 +302,7 @@ struct asd_sas_port {
>  struct asd_sas_event {
>   struct sas_work work;
>   struct asd_sas_phy *phy;
> + int type;
>  };
>  
>  static inline struct asd_sas_event *to_asd_sas_event(struct work_struct 
> *work)
> @@ -309,16 +312,16 @@ static inline struct asd_sas_event 
> *to_asd_sas_event(struct work_struct *work)
>   return ev;
>  }
>  
> +#define  PORT_POOL_SIZE  (PORT_NUM_EVENTS * 5)
> +#define  PHY_POOL_SIZE   (PHY_NUM_EVENTS * 5)
> +
>  /* The phy pretty much is controlled by the LLDD.
>   * The class only reads those fields.
>   */
>  struct asd_sas_phy {
>  /* private: */
> - struct asd_sas_event   port_events[PORT_NUM_EVENTS];
> - struct asd_sas_event   phy_events[PHY_NUM_EVENTS];
> -
> - unsigned long port_events_pending;
> - unsigned long phy_events_pending;
> + struct asd_sas_event   port_events[PORT_POOL_SIZE];
> + struct asd_sas_event   phy_events[PHY_POOL_SIZE];
>  
>   int error;
>   int suspended;
> @@ -365,6 +368,7 @@ struct scsi_core {
>  struct sas_ha_event {
>   struct sas_work work;
>   struct sas_ha_struct *ha;
> + int type;
>  };
>  
>  static inline struct sas_ha_event *to_sas_ha_event(struct work_struct *work)
> @@ -384,8 +388,6 @@ enum sas_ha_state {
>  struct sas_ha_struct {
>  /* private: */
>   struct sas_ha_event ha_events[HA_NUM_EVENTS];
> - unsigned longpending;
> -
>   struct list_head  defer_q; /* work queued while draining */
>   struct mutex  drain_mutex;
>   unsigned long state;
> 
I would make 'PORT_NUM_EVENTS' and 'PHY_NUM_EVENTS' configurable (module
parameter?); after all, there _might_ be an event burst or even a large
fabric causing the driver to overflow the event table again.

Otherwise looks good to me.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2 1/4] scsi: scsi_dh_alua: allow I/O in target port unavailable and standby states

2017-07-11 Thread Hannes Reinecke
057,6 +1077,8 @@ static void alua_check(struct scsi_device *sdev, bool 
> force)
>   *
>   * Fail I/O to all paths not in state
>   * active/optimized or active/non-optimized.
> + * Allow I/O to paths in state unavailable/standby
> + * so path checkers can actually check them.
>   */
>  static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
>  {
> @@ -1072,6 +1094,9 @@ static int alua_prep_fn(struct scsi_device *sdev, 
> struct request *req)
>   rcu_read_unlock();
>   if (state == SCSI_ACCESS_STATE_TRANSITIONING)
>   ret = BLKPREP_DEFER;
> + else if (state == SCSI_ACCESS_STATE_UNAVAILABLE ||
> +  state == SCSI_ACCESS_STATE_STANDBY)
> + req->rq_flags |= RQF_QUIET;
>   else if (state != SCSI_ACCESS_STATE_OPTIMAL &&
>state != SCSI_ACCESS_STATE_ACTIVE &&
>state != SCSI_ACCESS_STATE_LBA) {
> 
NACK.

The whole _point_ of having device handlers is to _avoid_ I/O errors
during booting.

And the ALUA checker is prepared to handle this situation properly.
The directio checker of course doesn't know about this, but then no-one
expected the directio checker to work with ALUA.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: device support in hpasa, was: Re: OOPS from cciss_ioctl in 4.12+git

2017-07-07 Thread Hannes Reinecke
On 07/07/2017 05:03 PM, Jens Axboe wrote:
> On 07/07/2017 09:00 AM, Christoph Hellwig wrote:
>> On Thu, Jul 06, 2017 at 12:55:04PM +0300, Meelis Roos wrote:
>>>> Also we're trying to move people away from the cciss driver, can you
>>>> check if the hpsa SCSI driver works for you as well?
>>>
>>> I have older adapter:
>>>
>>> Compaq Computer Corporation Smart Array 64xx [0e11:0046] (rev 01)
>>>
>>> That does not seem to be supported by hpsa AFAICS.
>>
>> Looks like.  Although hpsa has support for various SA5 controllers
>> it seems like it decided to skip all Compaq branded controllers.
>>
>> As far as I can tell we could simply add support for those to
>> hpsa.  Ccing hpsa folks to figure out if that's the case.
> 
> Pretty sure Hannes had a patch he tested for that, he talked about
> that back at LSFMM earlier this year. Hannes?
> 
Oh, I do.
hpsa is working happily on SLES for _all_ SmartArray controllers.
You need to enable 'hpsa_allow_any=1', though.
But I'm perfectly happy with making that the default and drop cciss
completely.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] scsi: sg: fix SG_DXFER_FROM_DEV transfers

2017-07-06 Thread Hannes Reinecke
On 07/05/2017 03:49 PM, Johannes Thumshirn wrote:
> SG_DXFER_FROM_DEV transfers do not have a dxferp as we set it to NULL,
> but must have a length bigger than 0. This fixes a regression introduced
> by commit 28676d869bbb ("scsi: sg: check for valid direction before
> starting the request")
> 
> Signed-off-by: Johannes Thumshirn 
> Fixes: 28676d869bbb ("scsi: sg: check for valid direction before starting the 
> request")
> Reported-by: Chris Clayton 
> Tested-by: Chris Clayton 
> Cc: Doug Gilbert 
> ---
>  drivers/scsi/sg.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 21225d62b0c1..3c91593260aa 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -758,8 +758,11 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
>   if (hp->dxferp || hp->dxfer_len > 0)
>   return false;
>   return true;
> - case SG_DXFER_TO_DEV:
>   case SG_DXFER_FROM_DEV:
> + if (hp->dxferp || hp->dxfer_len < 0)
> + return false;
> + return true;
> + case SG_DXFER_TO_DEV:
>   case SG_DXFER_TO_FROM_DEV:
>   if (!hp->dxferp || hp->dxfer_len == 0)
>   return false;
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] qla2xxx: Protect access to qpair members with qpair->qp_lock

2017-06-22 Thread Hannes Reinecke
On 06/22/2017 03:43 PM, Johannes Thumshirn wrote:
> In qla2xx_start_scsi_mq() and qla2xx_dif_start_scsi_mq() we grab the
> qpair->qp_lock but do access members of the qpair before having the lock.
> Re-order the locking sequence to have all read and write access to qpair
> members under the qpair->qp_lock.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/scsi/qla2xxx/qla_iocb.c | 20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
> index 8404f17f3c6c..425ca1646a9a 100644
> --- a/drivers/scsi/qla2xxx/qla_iocb.c
> +++ b/drivers/scsi/qla2xxx/qla_iocb.c
> @@ -1770,10 +1770,6 @@ qla2xxx_start_scsi_mq(srb_t *sp)
>   struct qla_hw_data *ha = vha->hw;
>   struct qla_qpair *qpair = sp->qpair;
>  
> - /* Setup qpair pointers */
> - rsp = qpair->rsp;
> - req = qpair->req;
> -
>   /* So we know we haven't pci_map'ed anything yet */
>   tot_dsds = 0;
>  
> @@ -1788,6 +1784,10 @@ qla2xxx_start_scsi_mq(srb_t *sp)
>   /* Acquire qpair specific lock */
>   spin_lock_irqsave(&qpair->qp_lock, flags);
>  
> + /* Setup qpair pointers */
> + rsp = qpair->rsp;
> + req = qpair->req;
> +
>   /* Check for room in outstanding command list. */
>   handle = req->current_outstanding_cmd;
>   for (index = 1; index < req->num_outstanding_cmds; index++) {
> @@ -1924,24 +1924,33 @@ qla2xxx_dif_start_scsi_mq(srb_t *sp)
>  
>  #define QDSS_GOT_Q_SPACE BIT_0
>  
> + /* Acquire ring specific lock */
> + spin_lock_irqsave(&qpair->qp_lock, flags);
> +
>   /* Check for host side state */
>   if (!qpair->online) {
>   cmd->result = DID_NO_CONNECT << 16;
> + spin_unlock_irqrestore(&qpair->qp_lock, flags);
>   return QLA_INTERFACE_ERROR;
>   }
>  
>   if (!qpair->difdix_supported &&
>   scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL) {
>   cmd->result = DID_NO_CONNECT << 16;
> + spin_unlock_irqrestore(&qpair->qp_lock, flags);
>   return QLA_INTERFACE_ERROR;
>   }
>  
> + spin_unlock_irqrestore(&qpair->qp_lock, flags);
> +
>   /* Only process protection or >16 cdb in this routine */
>   if (scsi_get_prot_op(cmd) == SCSI_PROT_NORMAL) {
>   if (cmd->cmd_len <= 16)
>       return qla2xxx_start_scsi_mq(sp);
>   }
>  
Not sure if that is required; at worst we're missing a device being
online; difdix support is probably not changing that often to be
warranted being protected with a lock here.

So I'd rather omit spinlocks here.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] loop: fix error handling regression

2017-06-10 Thread Hannes Reinecke
On 06/09/2017 12:19 PM, Arnd Bergmann wrote:
> gcc points out an unusual indentation:
> 
> drivers/block/loop.c: In function 'loop_set_status':
> drivers/block/loop.c:1149:3: error: this 'if' clause does not guard... 
> [-Werror=misleading-indentation]
>if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit,
>^~
> drivers/block/loop.c:1152:4: note: ...this statement, but the latter is 
> misleadingly indented as if it were guarded by the 'if'
> goto exit;
> 
> This was introduced by a new feature that accidentally moved the opening
> braces from one condition to another. Adding a second pair of braces
> makes it work correctly again and also more readable.
> 
> Fixes: f2c6df7dbf9a ("loop: support 4k physical blocksize")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/block/loop.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 4d376c10a97a..e288fb30100f 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1147,10 +1147,11 @@ loop_set_status(struct loop_device *lo, const struct 
> loop_info64 *info)
>   ((lo->lo_flags & LO_FLAGS_BLOCKSIZE) &&
>lo->lo_logical_blocksize != LO_INFO_BLOCKSIZE(info))) {
>   if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit,
> -  LO_INFO_BLOCKSIZE(info)))
> +  LO_INFO_BLOCKSIZE(info))) {
>   err = -EFBIG;
>   goto exit;
>   }
> + }
>  
>   loop_config_discard(lo);
>  
> 
Oops. You are correct, of course.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCH 0/7] Enable iSCSI offload drivers to use information from iface.

2017-06-07 Thread Hannes Reinecke
On 06/06/2017 08:07 PM, Robert LeBlanc wrote:
> This patchset enables iSCSI offload drivers to have access to the iface
> information provided by iscsid. This allows users to have more control
> of how the driver connects to the iSCSI target. iSER is updated to use
> iface.ipaddress to set the source IP address if configured. This allows
> iSER to use multiple ports on the same network or in more complicated
> routed configurations.
> 
> Since there is already a change to the function parameters, dst_addr
> is upgraded to sockaddr_storage so that it is more future proof and makes
> the size of the struct static and not dependent on checking the SA_FAMILY.
> 
> This is dependent on updates to Open-iSCSI.
> 
> Robert LeBlanc (7):
>   scsi/scsi_transport_iscsi: Add iface struct to kernel.
>   scsi/scsi_transport_iscsi: Update ep_connect to include iface.
>   ib/iSER: Add binding to source IP address.
>   scsi/be2iscsi: Update beiscsi_ep_connect to accept iface and
> sockaddr_storage.
>   scsi/bnx2i: Update bnx2i_ep_connect to accept iface and
> sockaddr_storage.
>   scsi/cxgbi: Update cxgbi_ep_connect to accept iface and
> sockaddr_storage.
>   scsi/qla4xxx: Update qla4xxx_ep_connect to accept iface and
> sockaddr_storage.
> 
>  drivers/infiniband/ulp/iser/iscsi_iser.c |  33 +++--
>  drivers/infiniband/ulp/iser/iscsi_iser.h |   4 +-
>  drivers/infiniband/ulp/iser/iser_initiator.c |   1 +
>  drivers/infiniband/ulp/iser/iser_memory.c|   1 +
>  drivers/infiniband/ulp/iser/iser_verbs.c |   8 ++-
>  drivers/scsi/be2iscsi/be_cmds.c  |   1 +
>  drivers/scsi/be2iscsi/be_iscsi.c |   8 ++-
>  drivers/scsi/be2iscsi/be_iscsi.h |   5 +-
>  drivers/scsi/be2iscsi/be_main.c  |   1 +
>  drivers/scsi/be2iscsi/be_mgmt.c  |   1 +
>  drivers/scsi/bnx2i/bnx2i_hwi.c   |   1 +
>  drivers/scsi/bnx2i/bnx2i_iscsi.c |  13 ++--
>  drivers/scsi/cxgbi/libcxgbi.c|  15 ++--
>  drivers/scsi/cxgbi/libcxgbi.h|   2 +-
>  drivers/scsi/qla4xxx/ql4_os.c|  15 ++--
>  drivers/scsi/scsi_transport_iscsi.c  |   9 ++-
>  include/scsi/scsi_transport_iscsi.h  | 100 
> ++-
>  17 files changed, 179 insertions(+), 39 deletions(-)
> 
Hmm.

That it rather large, just for passing information from userspace into
the kernel.
What is the actual benefit here?

Personally, I would rather see iscsid creating tap interfaces associated
with each iSCSI offload interface, and having the kernel accessing the
information from _that_.
Then each connection would have a network interface attached to it, and
could retrieve the information from there.
Plus we could use 'normal' userspace tools like 'ip' to configure the
iSCSI offload network.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v4 8/8] nvmet: use NVME_IDENTIFY_DATA_SIZE

2017-06-05 Thread Hannes Reinecke
On 06/04/2017 12:36 PM, Johannes Thumshirn wrote:
> Use NVME_IDENTIFY_DATA_SIZE define instead of hard coding the magic
> 4096 value.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/admin-cmd.c | 4 ++--
>  drivers/nvme/target/discovery.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v4 7/8] nvmet: allow overriding the NVMe VS via configfs

2017-06-05 Thread Hannes Reinecke
On 06/04/2017 12:36 PM, Johannes Thumshirn wrote:
> Allow overriding the announced NVMe Version of a via configfs.
> 
> This is particularly helpful when debugging new features for the host
> or target side without bumping the hard coded version (as the target
> might not be fully compliant to the announced version yet).
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/configfs.c | 34 ++
>  include/linux/nvme.h   |  4 
>  2 files changed, 38 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v4 5/8] nvme: get list of namespace descriptors

2017-06-05 Thread Hannes Reinecke
On 06/04/2017 12:36 PM, Johannes Thumshirn wrote:
> If a target identifies itself as NVMe 1.3 compliant, try to get the
> list of Namespace Identification Descriptors and populate the UUID,
> NGUID and EUI64 fileds in the NVMe namespace structure with these
> values.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/host/core.c | 87 
> 
>  drivers/nvme/host/nvme.h |  1 +
>  2 files changed, 88 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v4 3/8] nvmet: implement namespace identify descriptor list

2017-06-05 Thread Hannes Reinecke
On 06/04/2017 12:36 PM, Johannes Thumshirn wrote:
> A NVMe Identify NS command with a CNS value of '3' is expecting a list
> of Namespace Identification Descriptor structures to be returned to
> the host for the namespace requested in the namespace identify
> command.
> 
> This Namespace Identification Descriptor structure consists of the
> type of the namespace identifier, the length of the identifier and the
> actual identifier.
> 
> Valid types are NGUID and UUID which we have saved in our nvme_ns
> structure if they have been configured via configfs. If no value has
> been assigened to one of these we return an "invalid opcode" back to
> the host to maintain backward compatibiliy with older implementations
> without Namespace Identify Descriptor list support.
> 
> Also as the Namespace Identify Descriptor list is the only mandatory
> feature change between 1.2.1 and 1.3.0 we can bump the advertised
> version as well.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/admin-cmd.c | 53 
> +
>  drivers/nvme/target/core.c  |  3 ++-
>  drivers/nvme/target/nvmet.h |  1 +
>  3 files changed, 56 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v4 4/8] nvmet: add uuid field to nvme_ns and populate via configfs

2017-06-05 Thread Hannes Reinecke
On 06/04/2017 12:36 PM, Johannes Thumshirn wrote:
> Add the UUID field from the NVMe Namespace Identification Descriptor
> to the nvmet_ns structure and allow it's population via configfs.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/configfs.c | 31 +++
>  1 file changed, 31 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v4 1/8] nvme: introduce NVMe Namespace Identification Descriptor structures

2017-06-05 Thread Hannes Reinecke
On 06/04/2017 12:36 PM, Johannes Thumshirn wrote:
> Signed-off-by: Johannes Thumshirn 
> Reviewed-by: Max Gurtovoy 
> ---
>  include/linux/nvme.h | 19 +++
>  1 file changed, 19 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v1 7/7] nvmet: allow overriding the NVMe VS via configfs

2017-06-01 Thread Hannes Reinecke
On 06/01/2017 08:52 AM, Christoph Hellwig wrote:
> On Wed, May 31, 2017 at 03:32:17PM +0200, Johannes Thumshirn wrote:
>> Allow overriding the announced NVMe Version of a via configfs.
>>
>> This is particularly helpful when debugging new features for the host
>> or target side without bumping the hard coded version (as the target
>> might not be fully compliant to the announced version yet).
> 
> It is.  Just bump to 1.3, please as Identify 3h is the only mandatory
> change: http://nvmexpress.org/changes/
> 
Actually, I still do like this.
With it it'll be very easy to test version compliance; and validate our
initiator behaviour against older versions.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 7/7] nvme: provide UUID value to userspace

2017-05-30 Thread Hannes Reinecke
On 05/30/2017 10:08 AM, Johannes Thumshirn wrote:
> Now that we have a way for getting the UUID from a target, provide it
> to userspace as well.
> 
> Unfortunately there is already a sysfs attribute called UUID which is
> a misnomer as it holds the NGUID value. So instead of creating yet
> another wrong name, create a new 'nguid' sysfs attribute for the
> NGUID. For the UUID attribute add a check wheter the namespace has a
> UUID assigned to it and return this or return the NGUID to maintain
> backwards compatibility. This should give userspace a chance to catch
> up.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/host/core.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 6/7] nvme: get list of namespace descriptors

2017-05-30 Thread Hannes Reinecke
On 05/30/2017 10:08 AM, Johannes Thumshirn wrote:
> If a target identifies itself as NVMe 1.3 compliant, try to get the
> list of Namespace Identification Descriptors and populate the UUID,
> NGUID and EUI64 fileds in the NVMe namespace structure with these
> values.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/host/core.c | 88 
> 
>  drivers/nvme/host/nvme.h |  1 +
>  2 files changed, 89 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 7b254be16887..d2365f7ea612 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -643,6 +643,71 @@ int nvme_identify_ctrl(struct nvme_ctrl *dev, struct 
> nvme_id_ctrl **id)
>   return error;
>  }
>  
> +static void nvme_parse_ns_descs(struct nvme_ns *ns, void *ns_nid)
> +{
> + struct nvme_ns_nid *cur;
> + const u8 *p;
> + int pos = 0;
> + int len;
> +
> + p = (u8 *)ns_nid;
> +
> + for (;;) {
> + cur = (struct nvme_ns_nid *)p;
> +
> + switch (cur->nidl) {
> + case 0:
> + return;
> + case 8:
> + case 16:
> + break;
> + default:
> + dev_warn(ns->ctrl->dev,
> +  "Target returned bogus Namespace 
> Identification Descriptor length: %d\n",
> +  cur->nidl);
> + return;
> +
> + }
> +
> + switch (cur->nidt) {
> + case NVME_NIDT_EUI64:
> + len = 8;
> + memcpy(ns->eui, cur->nid, len);
> + break;
> + case NVME_NIDT_NGUID:
> + len = 16;
> + memcpy(ns->nguid, cur->nid, len);
> + break;
> + case NVME_NIDT_UUID:
> + len = 16;
> + memcpy(ns->uuid, cur->nid, len);
> + break;
> + default:
> + dev_warn(ns->ctrl->dev,
> +  "Invalid Namespace Identification Descriptor 
> Type: %d\n",
> +  cur->nidt);
> + return;
> + }
> +
> + pos += sizeof(struct nvme_ns_nid) + len;
> + if (pos >= 4096)
> + return;
> + p += pos;
> + }
> +}
> +
> +static int nvme_identify_ns_descs(struct nvme_ctrl *dev, unsigned nsid,
> +   void *ns_nid)
> +{
> + struct nvme_command c = { };
> +
> + c.identify.opcode = nvme_admin_identify;
> + c.identify.nsid = cpu_to_le32(nsid);
> + c.identify.cns = NVME_ID_CNS_NS_DESC_LIST;
> +
> + return nvme_submit_sync_cmd(dev->admin_q, &c, ns_nid, 4096);
> +}
> +
>  static int nvme_identify_ns_list(struct nvme_ctrl *dev, unsigned nsid, 
> __le32 *ns_list)
>  {
>   struct nvme_command c = { };
> @@ -1017,6 +1082,29 @@ static int nvme_revalidate_ns(struct nvme_ns *ns, 
> struct nvme_id_ns **id)
>   memcpy(ns->eui, (*id)->eui64, sizeof(ns->eui));
>   if (ns->ctrl->vs >= NVME_VS(1, 2, 0))
>   memcpy(ns->nguid, (*id)->nguid, sizeof(ns->nguid));
> + if (ns->ctrl->vs >= NVME_VS(1, 3, 0)) {
> + void *ns_nid;
> + int ret;
> +
> +
> + ns_nid = kzalloc(4096, GFP_KERNEL);
> + if (!ns_nid) {
> + dev_warn(ns->ctrl->dev,
> +  "%s: Identify Descriptors failed\n", __func__);
> + return 0;
> + }
> +
> + ret = nvme_identify_ns_descs(ns->ctrl, ns->ns_id, ns_nid);
> + if (ret) {
> + dev_warn(ns->ctrl->dev,
> +  "%s: Identify Descriptors failed\n", __func__);
> +  /* Don't treat error as fatal we potentially
> +   * already have a NGUID or EUI-64 */
> + return 0;
> + }
> + nvme_parse_ns_descs(ns, ns_nid);
> + kfree(ns_nid);
> + }
>  
>   return 0;
>  }
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 5004f0c41397..7007521e8194 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -190,6 +190,7 @@ struct nvme_ns {
>  
>   u8 eui[8];
>   u8 nguid[16];
> + 

Re: [PATCH 5/7] nvmet: implement namespace identify descriptor list

2017-05-30 Thread Hannes Reinecke
On 05/30/2017 10:08 AM, Johannes Thumshirn wrote:
> A NVMe Identify NS command with a CNS value of '3' is expecting a list
> of Namespace Identification Descriptor structures to be returned to
> the host for the namespace requested in the namespace identify
> command.
> 
> This Namespace Identification Descriptor structure consists of the
> type of the namespace identifier, the length of the identifier and the
> actual identifier.
> 
> Valid types are EUI-64, NGUID and UUID which we have saved in our
> nvme_ns structure if they have been configured via configfs. If no
> value has been assigened to one of these we return an "invalid opcode"
> back to the host to maintain backward compatibiliy with older
> implementations without Namespace Identify Descriptor list support.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/admin-cmd.c | 75 
> +
>  include/linux/nvme.h    | 14 
>  2 files changed, 89 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 4/7] nvme: also report include the EUI-64 in identify NS report

2017-05-30 Thread Hannes Reinecke
On 05/30/2017 10:08 AM, Johannes Thumshirn wrote:
> Now that we can configure a namespace's EUI-64, report it back to the
> host in an 'Identify Namespace' command reply.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/admin-cmd.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index ff1f97006322..60c9eec57986 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -322,6 +322,7 @@ static void nvmet_execute_identify_ns(struct nvmet_req 
> *req)
>   id->nmic = (1 << 0);
>  
>   memcpy(&id->nguid, &ns->nguid, sizeof(uuid_le));
> + memcpy(&id->eui64, &ns->eui64, sizeof(ns->eui64));
>  
>   id->lbaf[0].ds = ns->blksize_shift;
>  
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 3/7] nvmet: add eui64 field to nvme_ns and populate via configfs

2017-05-30 Thread Hannes Reinecke
On 05/30/2017 10:08 AM, Johannes Thumshirn wrote:
> Add the EUI-64 field from the NVMe Namespace Identification Descriptor
> to the nvmet_ns structure and allow it's population via configfs.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/configfs.c | 48 
> ++
>  drivers/nvme/target/nvmet.h|  1 +
>  2 files changed, 49 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 2/7] nvmet: add uuid field to nvme_ns and populate via configfs

2017-05-30 Thread Hannes Reinecke
On 05/30/2017 10:08 AM, Johannes Thumshirn wrote:
> Add the UUID field from the NVMe Namespace Identification Descriptor
> to the nvmet_ns structure and allow it's population via configfs.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/configfs.c | 48 
> ++
>  drivers/nvme/target/nvmet.h|  1 +
>  2 files changed, 49 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 1/7] nvme: rename uuid to nguid in nvme_ns

2017-05-30 Thread Hannes Reinecke
On 05/30/2017 10:08 AM, Johannes Thumshirn wrote:
> The uuid field in the nvme_ns structure represents the nguid field
> from the identify namespace command. And as NVMe 1.3 introduced an
> UUID in the NVMe Namespace Identification Descriptor this will
> collide.
> 
> So rename the uuid to nguid to prevent any further
> confusion. Unfortunately we export the nguid to sysfs in the uuid
> sysfs attribute, but this can't be changed anymore without possibly
> breaking existing userspace.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/host/core.c | 10 +-
>  drivers/nvme/host/nvme.h |  2 +-
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2] scsi: sg: don't return bogus Sg_requests

2017-05-10 Thread Hannes Reinecke
On 05/10/2017 09:53 AM, Johannes Thumshirn wrote:
> If the list search in sg_get_rq_mark() fails to find a valid request, we
> return a bogus element. This then can later lead to a GPF in sg_remove_scat().
> 
> So don't return bogus Sg_requests in sg_get_rq_mark() but NULL in case the
> list search doesn't find a valid request.
> 
> Signed-off-by: Johannes Thumshirn 
> Reported-by: Andrey Konovalov 
> Cc: Hannes Reinecke 
> Cc: Christoph Hellwig 
> Cc: Doug Gilbert 
> ---
> Changes to v1:
> * Directly return found element within the loop (Hannes)
> 
>  drivers/scsi/sg.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 0a38ba01b7b4..82c33a6edbea 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -2074,11 +2074,12 @@ sg_get_rq_mark(Sg_fd * sfp, int pack_id)
>   if ((1 == resp->done) && (!resp->sg_io_owned) &&
>   ((-1 == pack_id) || (resp->header.pack_id == pack_id))) {
>   resp->done = 2; /* guard against other readers */
> - break;
> + write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
> + return resp;
>   }
>   }
>   write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
> - return resp;
> + return NULL;
>  }
>  
>  /* always adds to end of list */
> 
Better.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] scsi: sg: don't return bogus Sg_requests

2017-05-10 Thread Hannes Reinecke
On 05/10/2017 09:41 AM, Johannes Thumshirn wrote:
> If the list search in sg_get_rq_mark() fails to find a valid request, we
> return a bogus element. This then can later lead to a GPF in sg_remove_scat().
> 
> So don't return bogus Sg_requests in sg_get_rq_mark() but NULL in case the
> list search doesn't find a valid request.
> 
> Signed-off-by: Johannes Thumshirn 
> Reported-by: Andrey Konovalov 
> Cc: Hannes Reinecke 
> Cc: Christoph Hellwig 
> Cc: Doug Gilbert 
> ---
>  drivers/scsi/sg.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 0a38ba01b7b4..abfde23fa186 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -2078,7 +2078,7 @@ sg_get_rq_mark(Sg_fd * sfp, int pack_id)
>   }
>   }
>   write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
> - return resp;
> + return (resp->done == 2) ? resp : NULL;
>  }
>  
>  /* always adds to end of list */
> 
Not quite.
Please return 'resp' directly from within the loop.
With this fix we run into the risk that by chance the uninitialized
'resp' pointer contains a '2' at the 'done' location, triggering this
issue again.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] scsi: sas: move scsi_remove_host call into sas_remove_host

2017-04-24 Thread Hannes Reinecke
On 04/21/2017 02:11 PM, Johannes Thumshirn wrote:
> Move scsi_remove_host call into sas_remove_host and remove it from SAS HBA
> drivers, so we don't mess up the ordering. This solves an issue with double
> deleting sysfs entries that was introduced by the change of sysfs behaviour
> from commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive").
> 
> Signed-off-by: Johannes Thumshirn 
> Suggested-by: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: James Bottomley 
> Cc: Jinpu Wang 
> Cc: John Garry 
> ---
>  drivers/scsi/aic94xx/aic94xx_init.c   | 1 -
>  drivers/scsi/hisi_sas/hisi_sas_main.c | 1 -
>  drivers/scsi/isci/init.c  | 1 -
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c  | 1 -
>  drivers/scsi/mvsas/mv_init.c  | 1 -
>  drivers/scsi/pm8001/pm8001_init.c | 1 -
>  drivers/scsi/scsi_transport_sas.c | 8 ++--
>  7 files changed, 6 insertions(+), 8 deletions(-)
> 
Sadly, you've left out mptsas; that's still in use for VMWare installations.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v2] scsi: return correct blkprep status code in case scsi_init_io() fails.

2017-04-12 Thread Hannes Reinecke
On 04/12/2017 09:21 AM, Johannes Thumshirn wrote:
> When instrumenting the SCSI layer to run into the
> !blk_rq_nr_phys_segments(rq) case the following  warning emitted from the
> block layer:
> 
> blk_peek_request: bad return=-22
> 
> This happens because since commit fd3fc0b4d730 ('scsi: don't BUG_ON()
> empty DMA transfers') we return the wrong error value from scsi_prep_fn()
> back to the block layer.
> 
> Signed-off-by: Johannes Thumshirn 
> Fixes: fd3fc0b4d730 scsi: don't BUG_ON() empty DMA transfers
> Cc: 
> Reviewed-by: Christoph Hellwig 
> ---
> Changes to v1:
> * s/iscsi_prep_fn()/scsi_prep_fn()
> * Add Cc stable
> 
>  drivers/scsi/scsi_lib.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 19125d72f322..5558e212368b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1061,10 +1061,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
>   struct scsi_device *sdev = cmd->device;
>   struct request *rq = cmd->request;
>   bool is_mq = (rq->mq_ctx != NULL);
> - int error;
> + int error = BLKPREP_KILL;
>  
>   if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
> - return -EINVAL;
> + goto err_exit;
>  
>   error = scsi_init_sgtable(rq, &cmd->sdb);
>   if (error)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 9/9] bio-integrity: Restore original iterator on verify stage

2017-04-04 Thread Hannes Reinecke
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Currently ->verify_fn not woks at all because at the moment it is called
> bio->bi_iter.bi_size == 0, so we do not iterate integrity bvecs at all.
> 
> In order to perform verification we need to know original data vector,
> with new bvec rewind API this is trivial.
> 
> testcase: 
> https://github.com/dmonakhov/xfstests/commit/3c6509eaa83b9c17cd0bc95d73fcdd76e1c54a85
> Signed-off-by: Dmitry Monakhov 
> ---
>  block/bio-integrity.c | 22 +++---
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 8/9] bio: add bvec_iter rewind API

2017-04-04 Thread Hannes Reinecke
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Some ->bi_end_io handlers (for example: pi_verify or decrypt handlers)
> need to know original data vector, but after bio traverse io-stack it may
> be advanced, splited and relocated many times so it is hard to guess
> original iterator. Let's add 'bi_done' conter which accounts number
> of bytes iterator was advanced during it's evolution. Later end_io handler
> may easily restore original iterator by rewinding iterator to
> iter->bi_done.
> 
> Note: this change makes sizeof (struct bvec_iter) multiple to 8
> Signed-off-by: Dmitry Monakhov 
> ---
>  include/linux/bio.h  | 21 +++--
>  include/linux/bvec.h | 26 ++
>  2 files changed, 45 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 7/9] Guard bvec iteration logic v3

2017-04-04 Thread Hannes Reinecke
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Currently if some one try to advance bvec beyond it's size we simply
> dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
> This simply means that we endup dereferencing/corrupting random memory
> region.
> 
> Sane reaction would be to propagate error back to calling context
> But bvec_iter_advance's calling context is not always good for error
> handling. For safity reason let truncate iterator size to zero which
> will break external iteration loop which prevent us from unpredictable
> memory range corruption. And even it caller ignores an error, it will
> corrupt it's own bvecs, not others.
> 
> This patch does:
> - Return error back to caller with hope that it will react on this
> - Truncate iterator size
> 
> Code was added long time ago here 4550dd6c, luckily no one hit it
> in real life :)
> 
> changes since V1:
>  - Replace  BUG_ON with error logic.
> 
> Signed-off-by: Dmitry Monakhov 
> ---
>  drivers/nvdimm/blk.c |  4 +++-
>  drivers/nvdimm/btt.c |  4 +++-
>  include/linux/bio.h  |  8 ++--
>  include/linux/bvec.h | 11 ---
>  4 files changed, 20 insertions(+), 7 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 6/9] T10: Move opencoded contants to common header

2017-04-04 Thread Hannes Reinecke
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Signed-off-by: Dmitry Monakhov 
> ---
>  block/t10-pi.c   | 9 +++--
>  drivers/scsi/lpfc/lpfc_scsi.c| 5 +++--
>  drivers/scsi/qla2xxx/qla_isr.c   | 8 
>  drivers/target/target_core_sbc.c | 2 +-
>  include/linux/t10-pi.h   | 2 ++
>  5 files changed, 13 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 5/9] bio-integrity: fold bio_integrity_enabled to bio_integrity_prep

2017-04-04 Thread Hannes Reinecke
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Currently all integrity prep hooks are open-coded, and if prepare fails
> we ignore it's code and fail bio with EIO. Let's return real error to
> upper layer, so later caller may react accordingly.
> 
> In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
> so it is reasonable to fold it in to one function.
> 
> Signed-off-by: Dmitry Monakhov 
> ---
>  Documentation/block/data-integrity.txt |  3 --
>  block/bio-integrity.c  | 88 
> ++
>  block/blk-core.c   |  5 +-
>  block/blk-mq.c |  8 +---
>  drivers/nvdimm/blk.c   | 13 +
>  drivers/nvdimm/btt.c   | 13 +
>  include/linux/bio.h|  6 ---
>  7 files changed, 55 insertions(+), 81 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 4/9] bio-integrity: fix interface for bio_integrity_trim v2

2017-04-04 Thread Hannes Reinecke
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> bio_integrity_trim inherent it's interface from bio_trim and accept
> offset and size, but this API is error prone because data offset
> must always be insync with bio's data offset. That is why we have
> integrity update hook in bio_advance()
> 
> So only meaningful values are: offset == 0, sectors == bio_sectors(bio)
> Let's just remove them completely.
> 
> changes from v1:
>  - remove 'sectors' arguments
> 
> Signed-off-by: Dmitry Monakhov 
> ---
>  block/bio-integrity.c | 11 ++-
>  block/bio.c   |  4 ++--
>  drivers/md/dm.c   |  2 +-
>  include/linux/bio.h   |  5 ++---
>  4 files changed, 7 insertions(+), 15 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 3/9] bio-integrity: bio_integrity_advance must update integrity seed

2017-04-04 Thread Hannes Reinecke
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> SCSI drivers do care about bip_seed so we must update it accordingly.
> 
> Signed-off-by: Dmitry Monakhov 
> ---
>  block/bio-integrity.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index b5009a8..82a6ffb 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -425,6 +425,7 @@ void bio_integrity_advance(struct bio *bio, unsigned int 
> bytes_done)
>   struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
>   unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
>  
> + bip->bip_iter.bi_sector += bytes_done >> 9;
>   bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
>  }
>  EXPORT_SYMBOL(bio_integrity_advance);
> 
Odd that we've missed that one...

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 2/9] bio-integrity: bio_trim should truncate integrity vector accordingly

2017-04-04 Thread Hannes Reinecke
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Reviewed-by: Christoph Hellwig 
> Signed-off-by: Dmitry Monakhov 
> ---
>  block/bio.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/block/bio.c b/block/bio.c
> index e75878f..fa84323 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1907,6 +1907,10 @@ void bio_trim(struct bio *bio, int offset, int size)
>   bio_advance(bio, offset << 9);
>  
>   bio->bi_iter.bi_size = size;
> +
> + if (bio_integrity(bio))
> + bio_integrity_trim(bio, 0, size);
> +
>  }
>  EXPORT_SYMBOL_GPL(bio_trim);
>  
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 1/9] bio-integrity: Do not allocate integrity context for bio w/o data

2017-04-04 Thread Hannes Reinecke
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> If bio has no data, such as ones from blkdev_issue_flush(),
> then we have nothing to protect.
> 
> This patch prevent bugon like follows:
> 
> kfree_debugcheck: out of range ptr ac1fa1d106742a5ah
> kernel BUG at mm/slab.c:2773!
> invalid opcode:  [#1] SMP
> Modules linked in: bcache
> CPU: 0 PID: 4428 Comm: xfs_io Tainted: GW   
> 4.11.0-rc4-ext4-00041-g2ef0043-dirty #43
> Hardware name: Virtuozzo KVM, BIOS seabios-1.7.5-11.vz7.4 04/01/2014
> task: 880137786440 task.stack: c9ba8000
> RIP: 0010:kfree_debugcheck+0x25/0x2a
> RSP: 0018:c9babde0 EFLAGS: 00010082
> RAX: 0034 RBX: ac1fa1d106742a5a RCX: 0007
> RDX:  RSI:  RDI: 88013f3ccb40
> RBP: c9babde8 R08:  R09: 
> R10: fcb76420 R11: 725172ed R12: 0282
> R13: 8150e766 R14: 88013a145e00 R15: 0001
> FS:  7fb09384bf40() GS:88013f20() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7fd0172f9e40 CR3: 000137fa9000 CR4: 06f0
> Call Trace:
>  kfree+0xc8/0x1b3
>  bio_integrity_free+0xc3/0x16b
>  bio_free+0x25/0x66
>  bio_put+0x14/0x26
>  blkdev_issue_flush+0x7a/0x85
>  blkdev_fsync+0x35/0x42
>  vfs_fsync_range+0x8e/0x9f
>  vfs_fsync+0x1c/0x1e
>  do_fsync+0x31/0x4a
>  SyS_fsync+0x10/0x14
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> 
> Reviewed-by: Christoph Hellwig 
> Signed-off-by: Dmitry Monakhov 
> ---
>  block/bio-integrity.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index 5384713..b5009a8 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -175,6 +175,9 @@ bool bio_integrity_enabled(struct bio *bio)
>   if (bio_op(bio) != REQ_OP_READ && bio_op(bio) != REQ_OP_WRITE)
>   return false;
>  
> + if (!bio_sectors(bio))
> + return false;
> +
>   /* Already protected? */
>   if (bio_integrity(bio))
>   return false;
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] sas: remove sas_domain_release_transport

2017-04-04 Thread Hannes Reinecke
On 04/03/2017 04:32 PM, Johannes Thumshirn wrote:
> sas_domain_release_transport is unused since at least v3.13, remove it.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/scsi/libsas/sas_init.c | 7 ---
>  include/scsi/libsas.h  | 1 -
>  2 files changed, 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 6/6] scsi: pm8001: remove the SAS host after the SCSI host

2017-03-29 Thread Hannes Reinecke
On 03/29/2017 12:27 PM, Jinpu Wang wrote:
> On Wed, Mar 29, 2017 at 11:41 AM, Johannes Thumshirn  
> wrote:
>> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed 
>> the
>> removal path of kernfs to make it recursive we have to remove the SAS host
>> before the SCSI host or we will see sysfs warnings on not found sysfs groups 
>> for
>> kobjects.
>>
>> Signed-off-by: Johannes Thumshirn 
>> ---
>>  drivers/scsi/pm8001/pm8001_init.c | 14 --
>>  1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/scsi/pm8001/pm8001_init.c 
>> b/drivers/scsi/pm8001/pm8001_init.c
>> index 417368c..9116e9c 100644
>> --- a/drivers/scsi/pm8001/pm8001_init.c
>> +++ b/drivers/scsi/pm8001/pm8001_init.c
>> @@ -1086,11 +1086,21 @@ static void pm8001_pci_remove(struct pci_dev *pdev)
>>  {
>> struct sas_ha_struct *sha = pci_get_drvdata(pdev);
>> struct pm8001_hba_info *pm8001_ha;
>> +   unsigned long flags;
>> +   struct Scsi_Host *shost;
>> int i, j;
>> pm8001_ha = sha->lldd_ha;
>> -   scsi_remove_host(pm8001_ha->shost);
>> +   shost = pm8001_ha->shost;
>> +
>> +   spin_lock_irqsave(shost->host_lock, flags);
>> +   if (scsi_host_set_state(shost, SHOST_CANCEL))
>> +   WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
>> +   spin_unlock_irqrestore(shost->host_lock, flags);
>> +
>> sas_unregister_ha(sha);
>> -   sas_remove_host(pm8001_ha->shost);
>> +   sas_remove_host(shost);
>> +   scsi_remove_host(shost);
>> +
>> list_del(&pm8001_ha->list);
>> PM8001_CHIP_DISP->interrupt_disable(pm8001_ha, 0xFF);
>> PM8001_CHIP_DISP->chip_soft_rst(pm8001_ha);
>> --
>> 1.8.5.6
>>
> Thanks Johannes for taking care of this. Looks good to me,
> 
> I have a question regarding the scsi_host_set_state change, why do we need 
> that?
> Can't we simply change the order of sas_remove_host and scsi_remove_host?
> 
If we don't do that I/O might still be coming in for the attached ports
while we're trying to remove the same. Which might cause all sorts of
race conditions.
So it's better to set the host state to CANCEL to stop I/O before
removing the ports.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 5/6] mvsas: remove the SAS host after the SCSI host

2017-03-29 Thread Hannes Reinecke
On 03/29/2017 11:41 AM, Johannes Thumshirn wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed 
> the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings on not found sysfs groups 
> for
> kobjects.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/scsi/mvsas/mv_init.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 4/6] scsi: hisi_sas: remove the SAS host after the SCSI host

2017-03-29 Thread Hannes Reinecke
On 03/29/2017 11:41 AM, Johannes Thumshirn wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed 
> the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings on not found sysfs groups 
> for
> kobjects.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/scsi/hisi_sas/hisi_sas_main.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 6/6] scsi: pm8001: remove the SAS host after the SCSI host

2017-03-29 Thread Hannes Reinecke
On 03/29/2017 11:41 AM, Johannes Thumshirn wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed 
> the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings on not found sysfs groups 
> for
> kobjects.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/scsi/pm8001/pm8001_init.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 3/6] aic94xx: remove the SAS host after the SCSI host

2017-03-29 Thread Hannes Reinecke
On 03/29/2017 11:41 AM, Johannes Thumshirn wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed 
> the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings on not found sysfs groups 
> for
> kobjects.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/scsi/aic94xx/aic94xx_init.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 2/2] scsi: isci: remove the SAS host after the SCSI host

2017-03-29 Thread Hannes Reinecke
On 03/29/2017 11:41 AM, Johannes Thumshirn wrote:
> After commit bcdde7e ("sysfs: make __sysfs_remove_dir() recursive") changed 
> the
> removal path of kernfs to make it recursive we have to remove the SAS host
> before the SCSI host or we will see sysfs warnings like the below when
> triggering the the removal of the SAS HBA PCI device like with writing to the
> sysfs pci remove file:
> 
> echo 1 > /sys/module/isci/drivers/pci:isci//remove
> 
> WARNING: CPU: 2 PID: 5 at fs/sysfs/group.c:241 sysfs_remove_group+0xc3/0xd0
> sysfs group 'power' not found for kobject 'end_device-6:0'
> CPU: 16 PID: 5884 Comm: echo Not tainted 4.11.0-rc3-libsas+ #504
> Call Trace:
>  dump_stack+0x85/0xc2
>  __warn+0xc6/0xe0
>  warn_slowpath_fmt+0x4a/0x50
>  sysfs_remove_group+0xc3/0xd0
>  dpm_sysfs_remove+0x52/0x60
>  device_del+0x13c/0x360
>  ? device_remove_file+0x14/0x20
>  attribute_container_class_device_del+0x15/0x20
>  transport_remove_classdev+0x4c/0x60
>  ? transport_add_class_device+0x40/0x40
>  attribute_container_device_trigger+0xb3/0xc0
>  transport_remove_device+0x10/0x20
>  sas_port_delete+0x12d/0x160 [scsi_transport_sas]
>  sas_deform_port+0x1bf/0x1d0 [libsas]
>  sas_unregister_ports+0x36/0x50 [libsas]
>  sas_unregister_ha+0x1b/0x40 [libsas]
>  isci_unregister+0x2a/0x40 [isci]
>  isci_pci_remove+0x52/0xb0 [isci]
>  ? __pm_runtime_resume+0x56/0x80
>  pci_device_remove+0x34/0xb0
>  device_release_driver_internal+0x158/0x210
>  device_release_driver+0xd/0x10
>  pci_stop_bus_device+0x85/0x90
>  pci_stop_and_remove_bus_device_locked+0x15/0x30
>  remove_store+0x59/0x70
>  dev_attr_store+0x13/0x20
>  sysfs_kf_write+0x40/0x50
>  kernfs_fop_write+0x130/0x1b0
>  __vfs_write+0x23/0x130
>  ? rcu_read_lock_sched_held+0x6d/0x80
>  ? rcu_sync_lockdep_assert+0x2a/0x50
>  ? __sb_start_write+0xd7/0x1e0
>  ? vfs_write+0x1a4/0x1f0
>  vfs_write+0xc6/0x1f0
>  SyS_write+0x44/0xa0
>  entry_SYSCALL_64_fastpath+0x23/0xc6
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/scsi/isci/init.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


[PATCH] blk-mq: add statistics for tag allocation failures

2017-03-21 Thread Hannes Reinecke
Signed-off-by: Hannes Reinecke 
---
 block/blk-mq-debugfs.c | 15 +++
 block/blk-mq-tag.c |  6 --
 block/blk-mq-tag.h |  3 +++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index f6d9179..51622f6 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -161,6 +161,8 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
 {
seq_printf(m, "nr_tags=%u\n", tags->nr_tags);
seq_printf(m, "nr_reserved_tags=%u\n", tags->nr_reserved_tags);
+   seq_printf(m, "starved_count=%u\n", tags->starved_count);
+   seq_printf(m, "failed_count=%u\n", tags->failed_count);
seq_printf(m, "active_queues=%d\n",
   atomic_read(&tags->active_queues));
 
@@ -190,6 +192,18 @@ static int hctx_tags_show(struct seq_file *m, void *v)
return res;
 }
 
+static ssize_t hctx_tags_write(struct file *file, const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   struct seq_file *m = file->private_data;
+   struct blk_mq_hw_ctx *hctx = m->private;
+   struct blk_mq_tags *tags = hctx->tags;
+
+   tags->starved_count = 0;
+   tags->failed_count = 0;
+   return count;
+}
+
 static int hctx_tags_open(struct inode *inode, struct file *file)
 {
return single_open(file, hctx_tags_show, inode->i_private);
@@ -198,6 +212,7 @@ static int hctx_tags_open(struct inode *inode, struct file 
*file)
 static const struct file_operations hctx_tags_fops = {
.open   = hctx_tags_open,
.read   = seq_read,
+   .write  = hctx_tags_write,
.llseek = seq_lseek,
.release= single_release,
 };
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 54c8436..692ca28 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -125,9 +125,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
if (tag != -1)
goto found_tag;
 
-   if (data->flags & BLK_MQ_REQ_NOWAIT)
+   if (data->flags & BLK_MQ_REQ_NOWAIT) {
+   tags->failed_count++;
return BLK_MQ_TAG_FAIL;
-
+   }
ws = bt_wait_ptr(bt, data->hctx);
drop_ctx = data->ctx == NULL;
do {
@@ -152,6 +153,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
if (tag != -1)
break;
 
+   tags->starved_count++;
if (data->ctx)
blk_mq_put_ctx(data->ctx);
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 6349742..f4f618f 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -10,6 +10,9 @@ struct blk_mq_tags {
unsigned int nr_tags;
unsigned int nr_reserved_tags;
 
+   unsigned int starved_count;
+   unsigned int failed_count;
+
atomic_t active_queues;
 
struct sbitmap_queue bitmap_tags;
-- 
1.8.5.6



Re: [PATCH v1 3/3] blk-mq: start to freeze queue just after setting dying

2017-03-18 Thread Hannes Reinecke
On 03/17/2017 10:57 AM, Ming Lei wrote:
> Before commit 780db2071a(blk-mq: decouble blk-mq freezing
> from generic bypassing), the dying flag is checked before
> entering queue, and Tejun converts the checking into .mq_freeze_depth,
> and assumes the counter is increased just after dying flag
> is set. Unfortunately we doesn't do that in blk_set_queue_dying().
> 
> This patch calls blk_mq_freeze_queue_start() for blk-mq in
> blk_set_queue_dying(), so that we can block new I/O coming
> once the queue is set as dying.
> 
> Given blk_set_queue_dying() is always called in remove path
> of block device, and queue will be cleaned up later, we don't
> need to worry about undoing the counter.
> 
> Cc: Bart Van Assche 
> Cc: Tejun Heo 
> Signed-off-by: Ming Lei 
> ---
>  block/blk-core.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCH v1 2/3] blk-mq: comment on races related with timeout handler

2017-03-18 Thread Hannes Reinecke
On 03/17/2017 10:57 AM, Ming Lei wrote:
> This patch adds comment on two races related with
> timeout handler:
> 
> - requeue from queue busy vs. timeout
> - rq free & reallocation vs. timeout
> 
> Both the races themselves and current solution aren't
> explicit enough, so add comments on them.
> 
> Cc: Bart Van Assche 
> Signed-off-by: Ming Lei 
> ---
>  block/blk-mq.c | 22 ++
>  1 file changed, 22 insertions(+)
> 

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCH v1 1/3] blk-mq: don't complete un-started request in timeout handler

2017-03-18 Thread Hannes Reinecke
BIOS 2.2.5 
> 09/06/2016
> [ 6984.857427] Workqueue: kblockd blk_mq_run_work_fn
> [ 6984.857429] task: 880476e3da00 task.stack: c90002e9
> [ 6984.857432] RIP: 0010:nvme_queue_rq+0x6e6/0x8cd [nvme]
> [ 6984.857433] RSP: 0018:c90002e93c50 EFLAGS: 00010246
> [ 6984.857434] RAX:  RBX: 880275646600 RCX: 
> 1000
> [ 6984.857435] RDX: 0fff RSI: 0002fba2a000 RDI: 
> 8804734e6950
> [ 6984.857436] RBP: c90002e93d30 R08: 2000 R09: 
> 1000
> [ 6984.857437] R10: 1000 R11:  R12: 
> 8804741d8000
> [ 6984.857438] R13: 0040 R14: 880475649f80 R15: 
> 8804734e6780
> [ 6984.857439] FS:  () GS:88047fcc() 
> knlGS:
> [ 6984.857440] CS:  0010 DS:  ES:  CR0: 80050033
> [ 6984.857442] CR2: 0010 CR3: 01c09000 CR4: 
> 001406e0
> [ 6984.857443] Call Trace:
> [ 6984.857451]  ? mempool_free+0x2b/0x80
> [ 6984.857455]  ? bio_free+0x4e/0x60
> [ 6984.857459]  blk_mq_dispatch_rq_list+0xf5/0x230
> [ 6984.857462]  blk_mq_process_rq_list+0x133/0x170
> [ 6984.857465]  __blk_mq_run_hw_queue+0x8c/0xa0
> [ 6984.857467]  blk_mq_run_work_fn+0x12/0x20
> [ 6984.857473]  process_one_work+0x165/0x410
> [ 6984.857475]  worker_thread+0x137/0x4c0
> [ 6984.857478]  kthread+0x101/0x140
> [ 6984.857480]  ? rescuer_thread+0x3b0/0x3b0
> [ 6984.857481]  ? kthread_park+0x90/0x90
> [ 6984.857489]  ret_from_fork+0x2c/0x40
> [ 6984.857490] Code: 8b bd 70 ff ff ff 89 95 50 ff ff ff 89 8d 58 ff ff ff 44
> 89 95 60 ff ff ff e8 b7 dd 12 e1 8b 95 50 ff ff ff 48 89 85 68 ff ff ff <4c>
> 8b 48 10 44 8b 58 18 8b 8d 58 ff ff ff 44 8b 95 60 ff ff ff
> [ 6984.857511] RIP: nvme_queue_rq+0x6e6/0x8cd [nvme] RSP: c90002e93c50
> [ 6984.857512] CR2: 0010
> [ 6984.895359] ---[ end trace 2d7ceb528432bf83 ]---
> 
> Cc: sta...@vger.kernel.org
> Reported-by: Yi Zhang 
> Tested-by: Yi Zhang 
> Reviewed-by: Bart Van Assche 
> Signed-off-by: Ming Lei 
> ---
>  block/blk-mq.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCH] storvsc: workaround for virtual DVD SCSI version

2017-03-08 Thread Hannes Reinecke
On 03/07/2017 06:15 PM, Stephen Hemminger wrote:
> Hyper-V host emulation of SCSI for virtual DVD device reports SCSI
> version 0 (UNKNOWN) but is still capable of supporting REPORTLUN. 
> 
> Without this patch, a GEN2 Linux guest on Hyper-V will not boot 4.11
> successfully with virtual DVD ROM device. What happens is that the
> SCSI scan process falls back to doing sequential probing by INQUIRY.
> But the storvsc driver has a previous workaround that masks/blocks all
> errors reports from INQUIRY (or MODE_SENSE) commands.  This workaround
> causes the scan to then populate a full set of bogus LUN's on the
> target and then sends kernel spinning off into a death spiral doing
> block reads on the non-existent LUNs.
> 
> By setting the correct blacklist flags, the target with the
> DVD device is scanned with REPORTLUN and that works correctly.
> 
> Patch needs to go in current 4.11, it is safe but not necessary
> in older kernels.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/scsi/storvsc_drv.c | 27 +--
>  1 file changed, 17 insertions(+), 10 deletions(-)
> 
> PS: The error handling does need to be fixed (have patches pending)
> but that is interrelated with hotplug and can wait.
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses

2017-03-07 Thread Hannes Reinecke
On 03/07/2017 08:23 AM, Minchan Kim wrote:
> Hi Hannes,
> 
> On Tue, Mar 7, 2017 at 4:00 PM, Hannes Reinecke  wrote:
>> On 03/07/2017 06:22 AM, Minchan Kim wrote:
>>> Hello Johannes,
>>>
>>> On Mon, Mar 06, 2017 at 11:23:35AM +0100, Johannes Thumshirn wrote:
>>>> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When 
>>>> using
>>>> the NVMe over Fabrics loopback target which potentially sends a huge bulk 
>>>> of
>>>> pages attached to the bio's bvec this results in a kernel panic because of
>>>> array out of bounds accesses in zram_decompress_page().
>>>
>>> First of all, thanks for the report and fix up!
>>> Unfortunately, I'm not familiar with that interface of block layer.
>>>
>>> It seems this is a material for stable so I want to understand it clear.
>>> Could you say more specific things to educate me?
>>>
>>> What scenario/When/How it is problem?  It will help for me to understand!
>>>
> 
> Thanks for the quick response!
> 
>> The problem is that zram as it currently stands can only handle bios
>> where each bvec contains a single page (or, to be precise, a chunk of
>> data with a length of a page).
> 
> Right.
> 
>>
>> This is not an automatic guarantee from the block layer (who is free to
>> send us bios with arbitrary-sized bvecs), so we need to set the queue
>> limits to ensure that.
> 
> What does it mean "bios with arbitrary-sized bvecs"?
> What kinds of scenario is it used/useful?
> 
Each bio contains a list of bvecs, each of which points to a specific
memory area:

struct bio_vec {
struct page *bv_page;
unsigned intbv_len;
unsigned intbv_offset;
};

The trick now is that while 'bv_page' does point to a page, the memory
area pointed to might in fact be contiguous (if several pages are
adjacent). Hence we might be getting a bio_vec where bv_len is _larger_
than a page.

Hence the check for 'is_partial_io' in zram_drv.c (which just does a
test 'if bv_len != PAGE_SIZE) is in fact wrong, as it would trigger for
partial I/O (ie if the overall length of the bio_vec is _smaller_ than a
page), but also for multipage bvecs (where the length of the bio_vec is
_larger_ than a page).

So rather than fixing the bio scanning loop in zram it's easier to set
the queue limits correctly so that 'is_partial_io' does the correct
thing and the overall logic in zram doesn't need to be altered.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses

2017-03-06 Thread Hannes Reinecke
On 03/07/2017 06:22 AM, Minchan Kim wrote:
> Hello Johannes,
> 
> On Mon, Mar 06, 2017 at 11:23:35AM +0100, Johannes Thumshirn wrote:
>> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
>> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
>> pages attached to the bio's bvec this results in a kernel panic because of
>> array out of bounds accesses in zram_decompress_page().
> 
> First of all, thanks for the report and fix up!
> Unfortunately, I'm not familiar with that interface of block layer.
> 
> It seems this is a material for stable so I want to understand it clear.
> Could you say more specific things to educate me?
> 
> What scenario/When/How it is problem?  It will help for me to understand!
> 
The problem is that zram as it currently stands can only handle bios
where each bvec contains a single page (or, to be precise, a chunk of
data with a length of a page).

This is not an automatic guarantee from the block layer (who is free to
send us bios with arbitrary-sized bvecs), so we need to set the queue
limits to ensure that.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses

2017-03-06 Thread Hannes Reinecke
On 03/06/2017 11:23 AM, Johannes Thumshirn wrote:
> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
> pages attached to the bio's bvec this results in a kernel panic because of
> array out of bounds accesses in zram_decompress_page().
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/block/zram/zram_drv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH BUGFIX] block: make elevator_get robust against cross blk/blk-mq choice

2017-02-13 Thread Hannes Reinecke
On 02/14/2017 08:07 AM, Omar Sandoval wrote:
> On Tue, Feb 14, 2017 at 07:58:22AM +0100, Hannes Reinecke wrote:
>> While we're at the topic:
>>
>> Can't we use the same names for legacy and mq scheduler?
>> It's quite an unnecessary complication to have
>> 'noop', 'deadline', and 'cfq' for legacy, but 'none' and 'mq-deadline'
>> for mq. If we could use 'noop' and 'deadline' for mq, too, the existing
>> settings or udev rules will continue to work and we wouldn't get any
>> annoying and pointless warnings here...
> 
> I mentioned this to Jens a little while ago but I didn't feel strongly
> enough to push the issue. I also like this idea -- it makes the
> transition to blk-mq a little more transparent.
> 
And saves us _a lot_ of support cases :-)

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH BUGFIX] block: make elevator_get robust against cross blk/blk-mq choice

2017-02-13 Thread Hannes Reinecke
On 02/13/2017 11:28 PM, Jens Axboe wrote:
> On 02/13/2017 03:09 PM, Omar Sandoval wrote:
>> On Mon, Feb 13, 2017 at 10:01:07PM +0100, Paolo Valente wrote:
>>> If, at boot, a legacy I/O scheduler is chosen for a device using blk-mq,
>>> or, viceversa, a blk-mq scheduler is chosen for a device using blk, then
>>> that scheduler is set and initialized without any check, driving the
>>> system into an inconsistent state. This commit addresses this issue by
>>> letting elevator_get fail for these wrong cross choices.
>>>
>>> Signed-off-by: Paolo Valente 
>>> ---
>>>  block/elevator.c | 26 ++
>>>  1 file changed, 18 insertions(+), 8 deletions(-)
>>
>> Hey, Paolo,
>>
>> How exactly are you triggering this? In __elevator_change(), we do check
>> for mq or not mq:
>>
>>  if (!e->uses_mq && q->mq_ops) {
>>  elevator_put(e);
>>  return -EINVAL;
>>  }
>>  if (e->uses_mq && !q->mq_ops) {
>>  elevator_put(e);
>>  return -EINVAL;
>>  }
>>
>> We don't ever appear to call elevator_init() with a specific scheduler
>> name, and for the default we switch off of q->mq_ops and use the
>> defaults from Kconfig:
>>
>>  if (q->mq_ops && q->nr_hw_queues == 1)
>>  e = elevator_get(CONFIG_DEFAULT_SQ_IOSCHED, false);
>>  else if (q->mq_ops)
>>  e = elevator_get(CONFIG_DEFAULT_MQ_IOSCHED, false);
>>  else
>>  e = elevator_get(CONFIG_DEFAULT_IOSCHED, false);
>>
>>  if (!e) {
>>  printk(KERN_ERR
>>  "Default I/O scheduler not found. " \
>>  "Using noop/none.\n");
>>  e = elevator_get("noop", false);
>>  }
>>
>> So I guess this could happen if someone manually changed those Kconfig
>> options, but I don't see what other case would make this happen, could
>> you please explain?
> 
> Was wondering the same - is it using the 'elevator=' boot parameter?
> Didn't look at that path just now, but that's the only one I could
> think of. If it is, I'd much prefer only using 'chosen_elevator' for
> the non-mq stuff, and the fix should be just that instead.
> 
[ .. ]
While we're at the topic:

Can't we use the same names for legacy and mq scheduler?
It's quite an unnecessary complication to have
'noop', 'deadline', and 'cfq' for legacy, but 'none' and 'mq-deadline'
for mq. If we could use 'noop' and 'deadline' for mq, too, the existing
settings or udev rules will continue to work and we wouldn't get any
annoying and pointless warnings here...

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


[PATCHv5 1/2] loop: Remove unused 'bdev' argument from loop_set_capacity

2017-02-10 Thread Hannes Reinecke
Signed-off-by: Hannes Reinecke 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Ming Lei 
---
 drivers/block/loop.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index f347285..8dae865 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1298,7 +1298,7 @@ static int loop_clr_fd(struct loop_device *lo)
return err;
 }
 
-static int loop_set_capacity(struct loop_device *lo, struct block_device *bdev)
+static int loop_set_capacity(struct loop_device *lo)
 {
if (unlikely(lo->lo_state != Lo_bound))
return -ENXIO;
@@ -1361,7 +1361,7 @@ static int lo_ioctl(struct block_device *bdev, fmode_t 
mode,
case LOOP_SET_CAPACITY:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
-   err = loop_set_capacity(lo, bdev);
+   err = loop_set_capacity(lo);
break;
case LOOP_SET_DIRECT_IO:
err = -EPERM;
-- 
1.8.5.6



[PATCHv5 2/2] loop: support 4k physical blocksize

2017-02-10 Thread Hannes Reinecke
When generating bootable VM images certain systems (most notably
s390x) require devices with 4k blocksize. This patch implements
a new flag 'LO_FLAGS_BLOCKSIZE' which will set the physical
blocksize to that of the underlying device, and allow to change
the logical blocksize for up to the physical blocksize.

Signed-off-by: Hannes Reinecke 
---
 drivers/block/loop.c  | 36 +++-
 drivers/block/loop.h  |  1 +
 include/uapi/linux/loop.h |  3 +++
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 8dae865..776b3c6 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -221,7 +221,8 @@ static void __loop_update_dio(struct loop_device *lo, bool 
dio)
 }
 
 static int
-figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit)
+figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit,
+loff_t logical_blocksize)
 {
loff_t size = get_size(offset, sizelimit, lo->lo_backing_file);
sector_t x = (sector_t)size;
@@ -233,6 +234,12 @@ static void __loop_update_dio(struct loop_device *lo, bool 
dio)
lo->lo_offset = offset;
if (lo->lo_sizelimit != sizelimit)
lo->lo_sizelimit = sizelimit;
+   if (lo->lo_flags & LO_FLAGS_BLOCKSIZE) {
+   lo->lo_logical_blocksize = logical_blocksize;
+   blk_queue_physical_block_size(lo->lo_queue, lo->lo_blocksize);
+   blk_queue_logical_block_size(lo->lo_queue,
+lo->lo_logical_blocksize);
+   }
set_capacity(lo->lo_disk, x);
bd_set_size(bdev, (loff_t)get_capacity(bdev->bd_disk) << 9);
/* let user-space know about the new size */
@@ -814,6 +821,7 @@ static void loop_config_discard(struct loop_device *lo)
struct file *file = lo->lo_backing_file;
struct inode *inode = file->f_mapping->host;
struct request_queue *q = lo->lo_queue;
+   int lo_bits = blksize_bits(lo->lo_logical_blocksize);
 
/*
 * We use punch hole to reclaim the free space used by the
@@ -833,7 +841,7 @@ static void loop_config_discard(struct loop_device *lo)
 
q->limits.discard_granularity = inode->i_sb->s_blocksize;
q->limits.discard_alignment = 0;
-   blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
+   blk_queue_max_discard_sectors(q, UINT_MAX >> lo_bits);
q->limits.discard_zeroes_data = 1;
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
 }
@@ -922,6 +930,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 
lo->use_dio = false;
lo->lo_blocksize = lo_blocksize;
+   lo->lo_logical_blocksize = 512;
lo->lo_device = bdev;
lo->lo_flags = lo_flags;
lo->lo_backing_file = file;
@@ -1087,6 +1096,7 @@ static int loop_clr_fd(struct loop_device *lo)
int err;
struct loop_func_table *xfer;
kuid_t uid = current_uid();
+   int lo_flags = lo->lo_flags;
 
if (lo->lo_encrypt_key_size &&
!uid_eq(lo->lo_key_owner, uid) &&
@@ -1116,9 +1126,24 @@ static int loop_clr_fd(struct loop_device *lo)
if (err)
return err;
 
+   if (info->lo_flags & LO_FLAGS_BLOCKSIZE) {
+   lo->lo_flags |= LO_FLAGS_BLOCKSIZE;
+   if (LO_INFO_BLOCKSIZE(info) != 512 &&
+   LO_INFO_BLOCKSIZE(info) != 1024 &&
+   LO_INFO_BLOCKSIZE(info) != 2048 &&
+   LO_INFO_BLOCKSIZE(info) != 4096)
+   return -EINVAL;
+   if (LO_INFO_BLOCKSIZE(info) > lo->lo_blocksize)
+   return -EINVAL;
+   }
+
if (lo->lo_offset != info->lo_offset ||
-   lo->lo_sizelimit != info->lo_sizelimit)
-   if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit))
+   lo->lo_sizelimit != info->lo_sizelimit ||
+   lo->lo_flags != lo_flags ||
+   ((lo->lo_flags & LO_FLAGS_BLOCKSIZE) &&
+lo->lo_logical_blocksize != LO_INFO_BLOCKSIZE(info)))
+   if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit,
+info->lo_init[0]))
return -EFBIG;
 
loop_config_discard(lo);
@@ -1303,7 +1328,8 @@ static int loop_set_capacity(struct loop_device *lo)
if (unlikely(lo->lo_state != Lo_bound))
return -ENXIO;
 
-   return figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit);
+   return figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit,
+   lo->lo_logical_blocksize);
 }
 
 static int loop_set_dio(struct loop_device *lo, u

[PATCHv5 0/2] loop: enable different logical blocksizes

2017-02-10 Thread Hannes Reinecke
Currently the loop driver just simulates 512-byte blocks. When
creating bootable images for virtual machines it might be required
to use a different physical blocksize (eg 4k for S/390 DASD), as
the some bootloaders (like lilo or zipl for S/390) need to know
the physical block addresses of the kernel and initrd.

With this patchset the loop driver will export the logical and
physical blocksize and the current LOOP_SET_STATUS64 ioctl is
extended to set the logical blocksize by re-using the existing
'init' fields, which are currently unused.

As usual, comments and reviews are welcome.

Changes to v1:
- Move LO_FLAGS_BLOCKSIZE definition
- Reshuffle patches
Changes to v2:
- Include reviews from Ming Lei
Changes to v3:
- Include reviews from Christoph
- Merge patches
Changes to v4:
- Add LO_INFO_BLOCKSIZE definition

Hannes Reinecke (2):
  loop: Remove unused 'bdev' argument from loop_set_capacity
  loop: support 4k physical blocksize

 drivers/block/loop.c  | 40 +---
 drivers/block/loop.h  |  1 +
 include/uapi/linux/loop.h |  3 +++
 3 files changed, 37 insertions(+), 7 deletions(-)

-- 
1.8.5.6



Re: remove the cmd_type field from struct request

2017-01-31 Thread Hannes Reinecke

On 01/31/2017 07:02 PM, Jens Axboe wrote:

On 01/31/2017 07:57 AM, Christoph Hellwig wrote:

[1] which were a pain in the ass to untangle and debug during development,
it's really time for it to die..


Outside of the patch series in question, how to we expedite the
euthanasia of IDE? What explicit features/support are we missing through
libata that would need to be added, before we can git rm drivers/ide/ ?

There is only a single driver (sgi_ide) which hasn't been moved over to 
libata. But this is for an old Itanium-based server only, and even SGI 
didn't press us to have this ported.
(And we have disabled the IDE drivers since SLES11, where we still 
support Itanium.)

So they can be safely assumed defunct at this time.

I'm all for removing them.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-01-31 Thread Hannes Reinecke
On 01/31/2017 01:31 AM, Bart Van Assche wrote:
> On Wed, 2017-01-18 at 10:48 +0100, Hannes Reinecke wrote:
>> @@ -1488,26 +1487,13 @@ static unsigned long disk_events_poll_jiffies(struct 
>> gendisk *disk)
>>  void disk_block_events(struct gendisk *disk)
>>  {
>> struct disk_events *ev = disk->ev;
>> -   unsigned long flags;
>> -   bool cancel;
>>  
>> if (!ev)
>> return;
>>  
>> -   /*
>> -* Outer mutex ensures that the first blocker completes canceling
>> -* the event work before further blockers are allowed to finish.
>> -*/
>> -   mutex_lock(&ev->block_mutex);
>> -
>> -   spin_lock_irqsave(&ev->lock, flags);
>> -   cancel = !ev->block++;
>> -   spin_unlock_irqrestore(&ev->lock, flags);
>> -
>> -   if (cancel)
>> +   if (atomic_inc_return(&ev->block) == 1)
>> cancel_delayed_work_sync(&disk->ev->dwork);
>>  
>> -   mutex_unlock(&ev->block_mutex);
>>  }
> 
> Hello Hannes,
> 
> I have already encountered a few times a deadlock that was caused by the
> event checking code so I agree with you that it would be a big step forward
> if such deadlocks wouldn't occur anymore. However, this patch realizes a
> change that has not been described in the patch description, namely that
> disk_block_events() calls are no longer serialized. Are you sure it is safe
> to drop the serialization of disk_block_events() calls?
> 
Well, this whole synchronization stuff it a bit weird; I so totally fail
to see the rationale for it.
But anyway, once we've converted ev->block to atomics I _think_ the
mutex_lock can remain; will be checking.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.com  +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [RFC PATCH] scsi, block: fix duplicate bdi name registration crashes

2017-01-29 Thread Hannes Reinecke
On 01/29/2017 05:58 AM, Dan Williams wrote:
> Warnings of the following form occur because scsi reuses a devt number
> while the block layer still has it referenced as the name of the bdi
> [1]:
> 
>  WARNING: CPU: 1 PID: 93 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
>  sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:192'
>  [..]
>  Call Trace:
>   dump_stack+0x86/0xc3
>   __warn+0xcb/0xf0
>   warn_slowpath_fmt+0x5f/0x80
>   ? kernfs_path_from_node+0x4f/0x60
>   sysfs_warn_dup+0x62/0x80
>   sysfs_create_dir_ns+0x77/0x90
>   kobject_add_internal+0xb2/0x350
>   kobject_add+0x75/0xd0
>   device_add+0x15a/0x650
>   device_create_groups_vargs+0xe0/0xf0
>   device_create_vargs+0x1c/0x20
>   bdi_register+0x90/0x240
>   ? lockdep_init_map+0x57/0x200
>   bdi_register_owner+0x36/0x60
>   device_add_disk+0x1bb/0x4e0
>   ? __pm_runtime_use_autosuspend+0x5c/0x70
>   sd_probe_async+0x10d/0x1c0
>   async_run_entry_fn+0x39/0x170
> 
> This is a brute-force fix to pass the devt release information from
> sd_probe() to the locations where we register the bdi,
> device_add_disk(), and unregister the bdi, blk_cleanup_queue().
> 
> Thanks to Omar for the quick reproducer script [2]. This patch survives
> where an unmodified kernel fails in a few seconds.
> 
> [1]: https://marc.info/?l=linux-scsi&m=147116857810716&w=4
> [2]: http://marc.info/?l=linux-block&m=148554717109098&w=2
> 
> Cc: James Bottomley 
> Cc: Bart Van Assche 
> Cc: "Martin K. Petersen" 
> Cc: Christoph Hellwig 
> Cc: Jens Axboe 
> Reported-by: Omar Sandoval 
> Signed-off-by: Dan Williams 
> ---
>  block/blk-core.c   |1 +
>  block/genhd.c  |7 +++
>  drivers/scsi/sd.c  |   41 +
>  include/linux/blkdev.h |1 +
>  include/linux/genhd.h  |   17 +
>  5 files changed, 59 insertions(+), 8 deletions(-)
> 
Please check the patchset from Jan Kara (cf 'BDI lifetime fix' on
linux-block), which attempts to solve the same problem.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


[PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-01-18 Thread Hannes Reinecke
When scheduling workqueue elements the callback function might be called
directly, so holding the event lock is potentially dangerous as it might
lead to a deadlock:

[  989.542827] INFO: task systemd-udevd:459 blocked for more than 480 seconds.
[  989.609721]   Not tainted 4.10.0-rc4+ #546
[  989.648545] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  989.716429] systemd-udevd   D13368   459  1 0x0004
[  989.716435] Call Trace:
[  989.716444]  __schedule+0x2f2/0xb10
[  989.716447]  schedule+0x3d/0x90
[  989.716449]  schedule_timeout+0x2fc/0x600
[  989.716451]  ? wait_for_completion+0xac/0x110
[  989.716456]  ? mark_held_locks+0x66/0x90
[  989.716458]  ? _raw_spin_unlock_irq+0x2c/0x40
[  989.716460]  ? trace_hardirqs_on_caller+0x111/0x1e0
[  989.716461]  wait_for_completion+0xb4/0x110
[  989.716464]  ? wake_up_q+0x80/0x80
[  989.716469]  flush_work+0x1ea/0x2a0
[  989.716470]  ? flush_work+0x24e/0x2a0
[  989.716472]  ? destroy_worker+0xd0/0xd0
[  989.716474]  __cancel_work_timer+0x11a/0x1e0
[  989.716476]  ? trace_hardirqs_on_caller+0x111/0x1e0
[  989.716477]  cancel_delayed_work_sync+0x13/0x20
[  989.716482]  disk_block_events+0x82/0x90
[  989.716487]  __blkdev_get+0x58/0x450
[  989.716488]  blkdev_get+0x1ce/0x340
[  989.716490]  ? _raw_spin_unlock+0x27/0x40
[  989.716492]  blkdev_open+0x5b/0x70
[  989.716501]  do_dentry_open+0x213/0x310
[  989.716505]  ? blkdev_get_by_dev+0x50/0x50
[  989.716507]  vfs_open+0x4f/0x80
[  989.716518]  ? may_open+0x9b/0x100
[  989.716521]  path_openat+0x48a/0xdc0
[  989.716527]  ? _crng_backtrack_protect+0x30/0x80
[  989.716530]  do_filp_open+0x7e/0xd0
[  989.716533]  ? _raw_spin_unlock+0x27/0x40
[  989.716537]  ? __alloc_fd+0xf7/0x210
[  989.716539]  do_sys_open+0x115/0x1f0
[  989.716542]  SyS_open+0x1e/0x20
[  989.716546]  entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Hannes Reinecke 

diff --git a/block/genhd.c b/block/genhd.c
index fcd6d4f..ae46caa 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1426,8 +1426,7 @@ struct disk_events {
struct gendisk  *disk;  /* the associated disk */
spinlock_t  lock;
 
-   struct mutexblock_mutex;/* protects blocking */
-   int block;  /* event blocking depth */
+   atomic_tblock;  /* event blocking depth */
unsigned intpending;/* events already sent out */
unsigned intclearing;   /* events being cleared */
 
@@ -1488,26 +1487,13 @@ static unsigned long disk_events_poll_jiffies(struct 
gendisk *disk)
 void disk_block_events(struct gendisk *disk)
 {
struct disk_events *ev = disk->ev;
-   unsigned long flags;
-   bool cancel;
 
if (!ev)
return;
 
-   /*
-* Outer mutex ensures that the first blocker completes canceling
-* the event work before further blockers are allowed to finish.
-*/
-   mutex_lock(&ev->block_mutex);
-
-   spin_lock_irqsave(&ev->lock, flags);
-   cancel = !ev->block++;
-   spin_unlock_irqrestore(&ev->lock, flags);
-
-   if (cancel)
+   if (atomic_inc_return(&ev->block) == 1)
cancel_delayed_work_sync(&disk->ev->dwork);
 
-   mutex_unlock(&ev->block_mutex);
 }
 
 static void __disk_unblock_events(struct gendisk *disk, bool check_now)
@@ -1516,23 +1502,18 @@ static void __disk_unblock_events(struct gendisk *disk, 
bool check_now)
unsigned long intv;
unsigned long flags;
 
-   spin_lock_irqsave(&ev->lock, flags);
-
-   if (WARN_ON_ONCE(ev->block <= 0))
-   goto out_unlock;
-
-   if (--ev->block)
-   goto out_unlock;
+   if (atomic_dec_return(&ev->block) > 0)
+   return;
 
+   spin_lock_irqsave(&ev->lock, flags);
intv = disk_events_poll_jiffies(disk);
+   spin_unlock_irqrestore(&ev->lock, flags);
if (check_now)
queue_delayed_work(system_freezable_power_efficient_wq,
&ev->dwork, 0);
else if (intv)
queue_delayed_work(system_freezable_power_efficient_wq,
&ev->dwork, intv);
-out_unlock:
-   spin_unlock_irqrestore(&ev->lock, flags);
 }
 
 /**
@@ -1572,10 +1553,10 @@ void disk_flush_events(struct gendisk *disk, unsigned 
int mask)
 
spin_lock_irq(&ev->lock);
ev->clearing |= mask;
-   if (!ev->block)
+   spin_unlock_irq(&ev->lock);
+   if (!atomic_read(&ev->block))
mod_delayed_work(system_freezable_power_efficient_wq,
&ev->dwork, 0);
-   spin_unlock_irq(&ev->lock);
 }
 
 /**
@@ -1666,12 +1647,11 @@ static void disk_check_events(struct disk_events *ev,
*cl

Re: [PATCH 0/2] virtio-scsi: Implement FC_HOST feature

2017-01-16 Thread Hannes Reinecke
On 01/16/2017 05:04 PM, Fam Zheng wrote:
> This series implements the proposed fc_host feature of virtio-scsi.
> 
> The first patch updates the data structure changes according to the spec
> proposal; the second patch actually implements the operations.
> 
> Fam Zheng (2):
>   virtio_scsi: Add fc_host definitions
>   virtio_scsi: Implement fc_host
> 
>  drivers/scsi/virtio_scsi.c   | 55 
> +++-
>  include/uapi/linux/virtio_scsi.h |  6 +
>  2 files changed, 60 insertions(+), 1 deletion(-)
> 
Hmm.

How it this supposed to work?
You do export the WWPN/WWNN of the associated host to the guest (nb:
will get interesting for non NPIV setups ...), but virtio scsi will
still do a LUN remapping.
IE the LUNs you see on the host will be different from the LUNs
presented to the guest.

This makes reliable operations very tricky.
Plus you don't _actually_ expose the FC host, but rather the WWPN of the
host presenting the LUN.
So how do you handle LUNs from different FC hosts on the guest?
>From what I've seen virtio will either presenting you with with one host
per LUN (so you'll end up with different SCSI hosts on the guest each
having the _same_ WWPN/WWNN), or a single host presenting all LUNs,
making the WWPN setting ... interesting.

Overall, I'm not overly happy with this approach.
You already added WWPN ids to the virtio transport, so why didn't you
update the LUN field, too, to avoid this ominous LUN remapping?
And we really should make sure to have a single FC host in the guest
presenting all LUNs.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-16 Thread Hannes Reinecke
On 01/13/2017 05:02 PM, Jens Axboe wrote:
> On 01/13/2017 09:00 AM, Jens Axboe wrote:
>> On 01/13/2017 08:59 AM, Hannes Reinecke wrote:
>>> On 01/13/2017 04:34 PM, Jens Axboe wrote:
>>>> On 01/13/2017 08:33 AM, Hannes Reinecke wrote:
>>> [ .. ]
>>>>> Ah, indeed.
>>>>> There is an ominous udev rule here, trying to switch to 'deadline'.
>>>>>
>>>>> # cat 60-ssd-scheduler.rules
>>>>> # do not edit this file, it will be overwritten on update
>>>>>
>>>>> ACTION!="add", GOTO="ssd_scheduler_end"
>>>>> SUBSYSTEM!="block", GOTO="ssd_scheduler_end"
>>>>>
>>>>> IMPORT{cmdline}="elevator"
>>>>> ENV{elevator}=="*?", GOTO="ssd_scheduler_end"
>>>>>
>>>>> KERNEL=="sd*[!0-9]", ATTR{queue/rotational}=="0",
>>>>> ATTR{queue/scheduler}="deadline"
>>>>>
>>>>> LABEL="ssd_scheduler_end"
>>>>>
>>>>> Still shouldn't crash the kernel, though ...
>>>>
>>>> Of course not, and it's not a given that it does, it could just be
>>>> triggering after the device load and failing like expected. But just in
>>>> case, can you try and disable that rule and see if it still crashes with
>>>> MQ_DEADLINE set as the default?
>>>>
>>> Yes, it does.
>>> Same stacktrace as before.
>>
>> Alright, that's as expected. I've tried with your rule and making
>> everything modular, but it still boots fine for me. Very odd. Can you
>> send me your .config? And are all the SCSI disks hanging off ahci? Or
>> sdb specifically, is that ahci or something else?
> 
> Also, would be great if you could pull:
> 
> git://git.kernel.dk/linux-block blk-mq-sched
> 
> into current 'master' and see if it still reproduces. I expect that it
> will, but just want to ensure that it's a problem in the current code
> base as well.
> 
Actually, it doesn't. Seems to have resolved itself with the latest drop.

However, not I've got a lockdep splat:

Jan 16 09:05:02 lammermuir kernel: [ cut here ]
Jan 16 09:05:02 lammermuir kernel: WARNING: CPU: 29 PID: 5860 at
kernel/locking/lockdep.c:3514 lock_release+0x2a7/0x490
Jan 16 09:05:02 lammermuir kernel: DEBUG_LOCKS_WARN_ON(depth <= 0)
Jan 16 09:05:02 lammermuir kernel: Modules linked in: raid0 mpt3sas
raid_class rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache e
Jan 16 09:05:02 lammermuir kernel:  fb_sys_fops ahci uhci_hcd ttm
ehci_pci libahci ehci_hcd serio_raw crc32c_intel drm libata usbcore hpsa
Jan 16 09:05:02 lammermuir kernel: CPU: 29 PID: 5860 Comm: fio Not
tainted 4.10.0-rc3+ #540
Jan 16 09:05:02 lammermuir kernel: Hardware name: HP ProLiant ML350p
Gen8, BIOS P72 09/08/2013
Jan 16 09:05:02 lammermuir kernel: Call Trace:
Jan 16 09:05:02 lammermuir kernel:  dump_stack+0x85/0xc9
Jan 16 09:05:02 lammermuir kernel:  __warn+0xd1/0xf0
Jan 16 09:05:02 lammermuir kernel:  ? aio_write+0x118/0x170
Jan 16 09:05:02 lammermuir kernel:  warn_slowpath_fmt+0x4f/0x60
Jan 16 09:05:02 lammermuir kernel:  lock_release+0x2a7/0x490
Jan 16 09:05:02 lammermuir kernel:  ? blkdev_write_iter+0x89/0xd0
Jan 16 09:05:02 lammermuir kernel:  aio_write+0x138/0x170
Jan 16 09:05:02 lammermuir kernel:  do_io_submit+0x4d2/0x8f0
Jan 16 09:05:02 lammermuir kernel:  ? do_io_submit+0x413/0x8f0
Jan 16 09:05:02 lammermuir kernel:  SyS_io_submit+0x10/0x20
Jan 16 09:05:02 lammermuir kernel:  entry_SYSCALL_64_fastpath+0x23/0xc6

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 08/10] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-13 Thread Hannes Reinecke
On 01/13/2017 05:41 PM, Omar Sandoval wrote:
> On Fri, Jan 13, 2017 at 12:15:17PM +0100, Hannes Reinecke wrote:
>> On 01/11/2017 10:40 PM, Jens Axboe wrote:
>>> This adds a set of hooks that intercepts the blk-mq path of
>>> allocating/inserting/issuing/completing requests, allowing
>>> us to develop a scheduler within that framework.
>>>
>>> We reuse the existing elevator scheduler API on the registration
>>> side, but augment that with the scheduler flagging support for
>>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>>> devices.
>>>
>>> We split driver and scheduler tags, so we can run the scheduling
>>> independent of device queue depth.
>>>
>>> Signed-off-by: Jens Axboe 
>> [ .. ]
>>> @@ -823,6 +847,35 @@ static inline unsigned int queued_to_index(unsigned 
>>> int queued)
>>> return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1);
>>>  }
>>>  
>>> +static bool blk_mq_get_driver_tag(struct request *rq,
>>> + struct blk_mq_hw_ctx **hctx, bool wait)
>>> +{
>>> +   struct blk_mq_alloc_data data = {
>>> +   .q = rq->q,
>>> +   .ctx = rq->mq_ctx,
>>> +   .hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu),
>>> +   .flags = wait ? 0 : BLK_MQ_REQ_NOWAIT,
>>> +   };
>>> +
>>> +   if (blk_mq_hctx_stopped(data.hctx))
>>> +   return false;
>>> +
>>> +   if (rq->tag != -1) {
>>> +done:
>>> +   if (hctx)
>>> +   *hctx = data.hctx;
>>> +   return true;
>>> +   }
>>> +
>>> +   rq->tag = blk_mq_get_tag(&data);
>>> +   if (rq->tag >= 0) {
>>> +   data.hctx->tags->rqs[rq->tag] = rq;
>>> +   goto done;
>>> +   }
>>> +
>>> +   return false;
>>> +}
>>> +
>> What happens with the existing request at 'rqs[rq->tag]' ?
>> Surely there is one already, right?
>> Things like '->init_request' assume a fully populated array, so moving
>> one entry to another location is ... interesting.
>>
>> I would have thought we need to do a request cloning here,
>> otherwise this would introduce a memory leak, right?
>> (Not to mention a potential double completion, as the request is now at
>> two positions in the array)
>>
>> Cheers,
>>
>> Hannes
> 
> The entries in tags->rqs aren't slab objects, they're pointers into
> pages allocated separately and tracked on tags->page_list. See
> blk_mq_alloc_rqs(). In blk_mq_free_rqs(), we free all of the pages on
> tags->page_list, so there shouldn't be a memory leak.
> 
> As for hctx->tags->rqs, entries are only overwritten when a scheduler is
> enabled. In that case, the rqs array is storing pointers to requests
> actually from hctx->sched_tags, so overwriting/leaking isn't an issue.

Ah. Thanks.
That explains it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/13/2017 04:34 PM, Jens Axboe wrote:
> On 01/13/2017 08:33 AM, Hannes Reinecke wrote:
[ .. ]
>> Ah, indeed.
>> There is an ominous udev rule here, trying to switch to 'deadline'.
>>
>> # cat 60-ssd-scheduler.rules
>> # do not edit this file, it will be overwritten on update
>>
>> ACTION!="add", GOTO="ssd_scheduler_end"
>> SUBSYSTEM!="block", GOTO="ssd_scheduler_end"
>>
>> IMPORT{cmdline}="elevator"
>> ENV{elevator}=="*?", GOTO="ssd_scheduler_end"
>>
>> KERNEL=="sd*[!0-9]", ATTR{queue/rotational}=="0",
>> ATTR{queue/scheduler}="deadline"
>>
>> LABEL="ssd_scheduler_end"
>>
>> Still shouldn't crash the kernel, though ...
> 
> Of course not, and it's not a given that it does, it could just be
> triggering after the device load and failing like expected. But just in
> case, can you try and disable that rule and see if it still crashes with
> MQ_DEADLINE set as the default?
> 
Yes, it does.
Same stacktrace as before.

Cheers

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/13/2017 04:23 PM, Jens Axboe wrote:
> On 01/13/2017 04:04 AM, Hannes Reinecke wrote:
>> On 01/13/2017 09:15 AM, Hannes Reinecke wrote:
>>> On 01/11/2017 10:39 PM, Jens Axboe wrote:
>>>> Another year, another posting of this patchset. The previous posting
>>>> was here:
>>>>
>>>> https://www.spinics.net/lists/kernel/msg2406106.html
>>>>
>>>> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
>>>>
>>>> I've reworked bits of this to get rid of the shadow requests, thanks
>>>> to Bart for the inspiration. The missing piece, for me, was the fact
>>>> that we have the tags->rqs[] indirection array already. I've done this
>>>> somewhat differently, though, by having the internal scheduler tag
>>>> map be allocated/torn down when an IO scheduler is attached or
>>>> detached. This also means that when we run without a scheduler, we
>>>> don't have to do double tag allocations, it'll work like before.
>>>>
>>>> The patchset applies on top of 4.10-rc3, or can be pulled here:
>>>>
>>>> git://git.kernel.dk/linux-block blk-mq-sched.6
>>>>
>>> Well ... something's wrong here on my machine:
>>>
>>> [   39.886886] [ cut here ]
>>> [   39.886895] WARNING: CPU: 9 PID: 62 at block/blk-mq.c:342
>>> __blk_mq_finish_request+0x124/0x140
>>> [   39.886895] Modules linked in: sd_mod ahci uhci_hcd ehci_pci
>>> mpt3sas(+) libahci ehci_hcd serio_raw crc32c_intel raid_class drm libata
>>> usbcore hpsa usb_common scsi_transport_sas sg dm_multipath dm_mod
>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
>>> [   39.886910] CPU: 9 PID: 62 Comm: kworker/u130:0 Not tainted
>>> 4.10.0-rc3+ #528
>>> [   39.886911] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
>>> [   39.886917] Workqueue: events_unbound async_run_entry_fn
>>> [   39.886918] Call Trace:
>>> [   39.886923]  dump_stack+0x85/0xc9
>>> [   39.886927]  __warn+0xd1/0xf0
>>> [   39.886928]  warn_slowpath_null+0x1d/0x20
>>> [   39.886930]  __blk_mq_finish_request+0x124/0x140
>>> [   39.886932]  blk_mq_finish_request+0x55/0x60
>>> [   39.886934]  blk_mq_sched_put_request+0x78/0x80
>>> [   39.886936]  blk_mq_free_request+0xe/0x10
>>> [   39.886938]  blk_put_request+0x25/0x60
>>> [   39.886944]  __scsi_execute.isra.24+0x104/0x160
>>> [   39.886946]  scsi_execute_req_flags+0x94/0x100
>>> [   39.886948]  scsi_report_opcode+0xab/0x100
>>>
>>> checking ...
>>>
>> Ah.
>> Seems like the elevator switch races with device setup:
>>
>>  1188.490326] [ cut here ]
>> [ 1188.490334] WARNING: CPU: 9 PID: 30155 at block/blk-mq.c:342
>> __blk_mq_finish_request+0x172/0x180
>> [ 1188.490335] Modules linked in: mpt3sas(+) raid_class rpcsec_gss_krb5
>> auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filt
>> er ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables
>> af_packet br_netfilter bridge stp llc iscsi_ibft iscs
>> i_boot_sysfs sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp
>> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_p
>> clmul tg3 ixgbe ghash_clmulni_intel pcbc ptp aesni_intel pps_core
>> aes_x86_64 ipmi_ssif hpilo hpwdt mdio libphy pcc_cpufreq cry
>> pto_simd glue_helper iTCO_wdt iTCO_vendor_support acpi_cpufreq tpm_tis
>> ipmi_si ipmi_devintf cryptd lpc_ich pcspkr ioatdma tpm_
>> tis_core thermal wmi shpchp dca ipmi_msghandler tpm fjes button sunrpc
>> btrfs xor sr_mod raid6_pq cdrom ehci_pci mgag200 i2c_al
>> go_bit drm_kms_helper syscopyarea sysfillrect uhci_hcd
>> [ 1188.490399]  sysimgblt fb_sys_fops sd_mod ahci ehci_hcd ttm libahci
>> crc32c_intel serio_raw drm libata usbcore usb_common hp
>> sa scsi_transport_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc
>> scsi_dh_alua autofs4
>> [ 1188.490411] CPU: 9 PID: 30155 Comm: kworker/u130:6 Not tainted
>> 4.10.0-rc3+ #535
>> [ 1188.490411] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
>> [ 1188.490425] Workqueue: events_unbound async_run_entry_fn
>> [ 1188.490427] Call Trace:
>> [ 1188.490433]  dump_stack+0x85/0xc9
>> [ 1188.490436]  __warn+0xd1/0xf0
>> [ 1188.490438]  warn_slowpath_null+0x1d/0x20
>> [ 1188.490440]  __blk_mq_finish_request+0x172/0x180
>> [ 1188.490442]  blk_mq_finish_request+0x55/0x60
>> [ 1188.490443]  blk_mq_sched_put_request+0x78/0x80
>> [ 1188.490445

Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/13/2017 12:04 PM, Hannes Reinecke wrote:
> On 01/13/2017 09:15 AM, Hannes Reinecke wrote:
>> On 01/11/2017 10:39 PM, Jens Axboe wrote:
>>> Another year, another posting of this patchset. The previous posting
>>> was here:
>>>
>>> https://www.spinics.net/lists/kernel/msg2406106.html
>>>
>>> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
>>>
>>> I've reworked bits of this to get rid of the shadow requests, thanks
>>> to Bart for the inspiration. The missing piece, for me, was the fact
>>> that we have the tags->rqs[] indirection array already. I've done this
>>> somewhat differently, though, by having the internal scheduler tag
>>> map be allocated/torn down when an IO scheduler is attached or
>>> detached. This also means that when we run without a scheduler, we
>>> don't have to do double tag allocations, it'll work like before.
>>>
>>> The patchset applies on top of 4.10-rc3, or can be pulled here:
>>>
>>> git://git.kernel.dk/linux-block blk-mq-sched.6
>>>
>> Well ... something's wrong here on my machine:
>>
[ .. ]

Turns out that selecting CONFIG_DEFAULT_MQ_DEADLINE is the culprit;
switching to CONFIG_DEFAULT_MQ_NONE and selecting mq-deadline after
booting manually makes the problem go away.

So there is a race condition during device init and switching the I/O
scheduler.

But the results from using mq-deadline are promising; the performance
drop I've seen on older hardware seems to be resolved:

mq iosched:
 seq read : io=13383MB, bw=228349KB/s, iops=57087
 rand read : io=12876MB, bw=219709KB/s, iops=54927
 seq write: io=14532MB, bw=247987KB/s, iops=61996
 rand write: io=13779MB, bw=235127KB/s, iops=58781
mq default:
 seq read : io=13056MB, bw=222588KB/s, iops=55647
 rand read : io=12908MB, bw=220069KB/s, iops=55017
 seq write: io=13986MB, bw=238444KB/s, iops=59611
 rand write: io=13733MB, bw=234128KB/s, iops=58532
sq default:
 seq read : io=10240MB, bw=194787KB/s, iops=48696
 rand read : io=10240MB, bw=191374KB/s, iops=47843
 seq write: io=10240MB, bw=245333KB/s, iops=61333
 rand write: io=10240MB, bw=228239KB/s, iops=57059

measured on mpt2sas with SSD devices.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 08/10] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-13 Thread Hannes Reinecke
On 01/11/2017 10:40 PM, Jens Axboe wrote:
> This adds a set of hooks that intercepts the blk-mq path of
> allocating/inserting/issuing/completing requests, allowing
> us to develop a scheduler within that framework.
> 
> We reuse the existing elevator scheduler API on the registration
> side, but augment that with the scheduler flagging support for
> the blk-mq interfce, and with a separate set of ops hooks for MQ
> devices.
> 
> We split driver and scheduler tags, so we can run the scheduling
> independent of device queue depth.
> 
> Signed-off-by: Jens Axboe 
[ .. ]
> @@ -823,6 +847,35 @@ static inline unsigned int queued_to_index(unsigned int 
> queued)
>   return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1);
>  }
>  
> +static bool blk_mq_get_driver_tag(struct request *rq,
> +   struct blk_mq_hw_ctx **hctx, bool wait)
> +{
> + struct blk_mq_alloc_data data = {
> + .q = rq->q,
> + .ctx = rq->mq_ctx,
> + .hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu),
> + .flags = wait ? 0 : BLK_MQ_REQ_NOWAIT,
> + };
> +
> + if (blk_mq_hctx_stopped(data.hctx))
> + return false;
> +
> + if (rq->tag != -1) {
> +done:
> + if (hctx)
> + *hctx = data.hctx;
> + return true;
> + }
> +
> + rq->tag = blk_mq_get_tag(&data);
> + if (rq->tag >= 0) {
> + data.hctx->tags->rqs[rq->tag] = rq;
> + goto done;
> + }
> +
> + return false;
> +}
> +
What happens with the existing request at 'rqs[rq->tag]' ?
Surely there is one already, right?
Things like '->init_request' assume a fully populated array, so moving
one entry to another location is ... interesting.

I would have thought we need to do a request cloning here,
otherwise this would introduce a memory leak, right?
(Not to mention a potential double completion, as the request is now at
two positions in the array)

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/13/2017 09:15 AM, Hannes Reinecke wrote:
> On 01/11/2017 10:39 PM, Jens Axboe wrote:
>> Another year, another posting of this patchset. The previous posting
>> was here:
>>
>> https://www.spinics.net/lists/kernel/msg2406106.html
>>
>> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
>>
>> I've reworked bits of this to get rid of the shadow requests, thanks
>> to Bart for the inspiration. The missing piece, for me, was the fact
>> that we have the tags->rqs[] indirection array already. I've done this
>> somewhat differently, though, by having the internal scheduler tag
>> map be allocated/torn down when an IO scheduler is attached or
>> detached. This also means that when we run without a scheduler, we
>> don't have to do double tag allocations, it'll work like before.
>>
>> The patchset applies on top of 4.10-rc3, or can be pulled here:
>>
>> git://git.kernel.dk/linux-block blk-mq-sched.6
>>
> Well ... something's wrong here on my machine:
> 
> [   39.886886] [ cut here ]
> [   39.886895] WARNING: CPU: 9 PID: 62 at block/blk-mq.c:342
> __blk_mq_finish_request+0x124/0x140
> [   39.886895] Modules linked in: sd_mod ahci uhci_hcd ehci_pci
> mpt3sas(+) libahci ehci_hcd serio_raw crc32c_intel raid_class drm libata
> usbcore hpsa usb_common scsi_transport_sas sg dm_multipath dm_mod
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
> [   39.886910] CPU: 9 PID: 62 Comm: kworker/u130:0 Not tainted
> 4.10.0-rc3+ #528
> [   39.886911] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
> [   39.886917] Workqueue: events_unbound async_run_entry_fn
> [   39.886918] Call Trace:
> [   39.886923]  dump_stack+0x85/0xc9
> [   39.886927]  __warn+0xd1/0xf0
> [   39.886928]  warn_slowpath_null+0x1d/0x20
> [   39.886930]  __blk_mq_finish_request+0x124/0x140
> [   39.886932]  blk_mq_finish_request+0x55/0x60
> [   39.886934]  blk_mq_sched_put_request+0x78/0x80
> [   39.886936]  blk_mq_free_request+0xe/0x10
> [   39.886938]  blk_put_request+0x25/0x60
> [   39.886944]  __scsi_execute.isra.24+0x104/0x160
> [   39.886946]  scsi_execute_req_flags+0x94/0x100
> [   39.886948]  scsi_report_opcode+0xab/0x100
> 
> checking ...
> 
Ah.
Seems like the elevator switch races with device setup:

 1188.490326] [ cut here ]
[ 1188.490334] WARNING: CPU: 9 PID: 30155 at block/blk-mq.c:342
__blk_mq_finish_request+0x172/0x180
[ 1188.490335] Modules linked in: mpt3sas(+) raid_class rpcsec_gss_krb5
auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filt
er ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables
af_packet br_netfilter bridge stp llc iscsi_ibft iscs
i_boot_sysfs sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_p
clmul tg3 ixgbe ghash_clmulni_intel pcbc ptp aesni_intel pps_core
aes_x86_64 ipmi_ssif hpilo hpwdt mdio libphy pcc_cpufreq cry
pto_simd glue_helper iTCO_wdt iTCO_vendor_support acpi_cpufreq tpm_tis
ipmi_si ipmi_devintf cryptd lpc_ich pcspkr ioatdma tpm_
tis_core thermal wmi shpchp dca ipmi_msghandler tpm fjes button sunrpc
btrfs xor sr_mod raid6_pq cdrom ehci_pci mgag200 i2c_al
go_bit drm_kms_helper syscopyarea sysfillrect uhci_hcd
[ 1188.490399]  sysimgblt fb_sys_fops sd_mod ahci ehci_hcd ttm libahci
crc32c_intel serio_raw drm libata usbcore usb_common hp
sa scsi_transport_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc
scsi_dh_alua autofs4
[ 1188.490411] CPU: 9 PID: 30155 Comm: kworker/u130:6 Not tainted
4.10.0-rc3+ #535
[ 1188.490411] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
[ 1188.490425] Workqueue: events_unbound async_run_entry_fn
[ 1188.490427] Call Trace:
[ 1188.490433]  dump_stack+0x85/0xc9
[ 1188.490436]  __warn+0xd1/0xf0
[ 1188.490438]  warn_slowpath_null+0x1d/0x20
[ 1188.490440]  __blk_mq_finish_request+0x172/0x180
[ 1188.490442]  blk_mq_finish_request+0x55/0x60
[ 1188.490443]  blk_mq_sched_put_request+0x78/0x80
[ 1188.490445]  blk_mq_free_request+0xe/0x10
[ 1188.490448]  blk_put_request+0x25/0x60
[ 1188.490453]  __scsi_execute.isra.24+0x104/0x160
[ 1188.490455]  scsi_execute_req_flags+0x94/0x100
[ 1188.490457]  scsi_report_opcode+0xab/0x100
[ 1188.490461]  sd_revalidate_disk+0xaef/0x1450 [sd_mod]
[ 1188.490464]  sd_probe_async+0xd1/0x1d0 [sd_mod]
[ 1188.490466]  async_run_entry_fn+0x37/0x150
[ 1188.490470]  process_one_work+0x1d0/0x660
[ 1188.490472]  ? process_one_work+0x151/0x660
[ 1188.490474]  worker_thread+0x12b/0x4a0
[ 1188.490475]  kthread+0x10c/0x140
[ 1188.490477]  ? process_one_work+0x660/0x660
[ 1188.490478]  ? kthread_create_on_node+0x40/0x40
[ 1188.490483]  ret_from_fork+0x2a/0x40
[ 1188.490484] ---[ end trace d5e3a32ac269fc2a ]---
[ 1188.490485] rq (487/52) rqs (-1/-1)
[ 1188.523518] sd 

Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/11/2017 10:39 PM, Jens Axboe wrote:
> Another year, another posting of this patchset. The previous posting
> was here:
> 
> https://www.spinics.net/lists/kernel/msg2406106.html
> 
> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
> 
> I've reworked bits of this to get rid of the shadow requests, thanks
> to Bart for the inspiration. The missing piece, for me, was the fact
> that we have the tags->rqs[] indirection array already. I've done this
> somewhat differently, though, by having the internal scheduler tag
> map be allocated/torn down when an IO scheduler is attached or
> detached. This also means that when we run without a scheduler, we
> don't have to do double tag allocations, it'll work like before.
> 
> The patchset applies on top of 4.10-rc3, or can be pulled here:
> 
> git://git.kernel.dk/linux-block blk-mq-sched.6
> 
Fun continues:

[   28.976708] ata3.00: configured for UDMA/100
[   28.987625] BUG: unable to handle kernel NULL pointer dereference at
0048
[   28.987632] IP: deadline_add_request+0x15/0x70
[   28.987633] PGD 0
[   28.987634]
[   28.987636] Oops:  [#1] SMP
[   28.987638] Modules linked in: ahci libahci libata uhci_hcd(+)
mgag200(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm drm tg3 libphy ehci_pci ehci_hcd usbcore usb_common
ixgbe mdio hpsa(+) dca ptp pps_core scsi_transport_sas fjes(+) sg
dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
[   28.987654] CPU: 0 PID: 268 Comm: kworker/u2:2 Not tainted
4.10.0-rc3+ #535
[   28.987655] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
[   28.987660] Workqueue: events_unbound async_run_entry_fn
[   28.987661] task: 880029391600 task.stack: c938c000
[   28.987663] RIP: 0010:deadline_add_request+0x15/0x70
[   28.987664] RSP: 0018:c938fb00 EFLAGS: 00010286
[   28.987665] RAX: 88003260c400 RBX:  RCX:

[   28.987666] RDX: c938fb68 RSI:  RDI:
8800293b9040
[   28.987666] RBP: c938fb18 R08: 0087668] R13:
88003260c400 R14:  R15: 
[   28.987670] FS:  () GS:880035c0()
knlGS:
[   28.987670] CS:  0010 DS:  ES:  CR0: 80050033
[   28.987671] CR2: 0048 CR3: 32b64000 CR4:
000406f0
[   28.987672] Call Trace:
[   28.987677]  blk_mq_sched_get_request+0x12e/0x310
[   28.987678]  ? blk_mq_sched_get_request+0x5/0x310
[   28.987681]  blk_mq_alloc_request+0x40/0x90
[   28.987684]  blk_get_request+0x35/0x110
[   28.987689]  __scsi_execute.isra.24+0x3c/0x160
[   28.987691]  scsi_execute_req_flags+0x94/0x100
[   28.987694]  scsi_probe_and_add_lun+0x207/0xd60
[   28.987699]  ? __pm_rme_resume+0x5c/0x80
[   28.987701]  __scsi_add_device+0x103/0x120
[   28.987709]  ata_scsi_scan_host+0xa3/0x1d0 [libata]
[   28.987716]  async_port_probe+0x43/0x60 [libata]
[   28.987718]  async_run_entry_fn+0x37/0x150
[   28.987722]  process_one_work+0x1d0/0x660

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/11/2017 10:39 PM, Jens Axboe wrote:
> Another year, another posting of this patchset. The previous posting
> was here:
> 
> https://www.spinics.net/lists/kernel/msg2406106.html
> 
> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
> 
> I've reworked bits of this to get rid of the shadow requests, thanks
> to Bart for the inspiration. The missing piece, for me, was the fact
> that we have the tags->rqs[] indirection array already. I've done this
> somewhat differently, though, by having the internal scheduler tag
> map be allocated/torn down when an IO scheduler is attached or
> detached. This also means that when we run without a scheduler, we
> don't have to do double tag allocations, it'll work like before.
> 
> The patchset applies on top of 4.10-rc3, or can be pulled here:
> 
> git://git.kernel.dk/linux-block blk-mq-sched.6
> 
Well ... something's wrong here on my machine:

[   39.886886] [ cut here ]
[   39.886895] WARNING: CPU: 9 PID: 62 at block/blk-mq.c:342
__blk_mq_finish_request+0x124/0x140
[   39.886895] Modules linked in: sd_mod ahci uhci_hcd ehci_pci
mpt3sas(+) libahci ehci_hcd serio_raw crc32c_intel raid_class drm libata
usbcore hpsa usb_common scsi_transport_sas sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
[   39.886910] CPU: 9 PID: 62 Comm: kworker/u130:0 Not tainted
4.10.0-rc3+ #528
[   39.886911] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
[   39.886917] Workqueue: events_unbound async_run_entry_fn
[   39.886918] Call Trace:
[   39.886923]  dump_stack+0x85/0xc9
[   39.886927]  __warn+0xd1/0xf0
[   39.886928]  warn_slowpath_null+0x1d/0x20
[   39.886930]  __blk_mq_finish_request+0x124/0x140
[   39.886932]  blk_mq_finish_request+0x55/0x60
[   39.886934]  blk_mq_sched_put_request+0x78/0x80
[   39.886936]  blk_mq_free_request+0xe/0x10
[   39.886938]  blk_put_request+0x25/0x60
[   39.886944]  __scsi_execute.isra.24+0x104/0x160
[   39.886946]  scsi_execute_req_flags+0x94/0x100
[   39.886948]  scsi_report_opcode+0xab/0x100

checking ...

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-24 Thread Hannes Reinecke

On 12/24/2016 11:07 AM, Christoph Hellwig wrote:

On Fri, Dec 23, 2016 at 11:42:45AM -0800, Linus Torvalds wrote:

Ugh. This patch is nasty.


It's the same SCSI has done for ages - except that is uses a separate
kmalloc for the sense buffer.


I think we should just fix blk_execute_rq() instead.


As you found out below it's not just blk_execute_rq, it's the whole
architecture of the BLOCK_PC code, which expects a caller provided
sense buffer.  But with the way blk-mq allocates request structures
we can actually fix it, but I first need to extent the way it allows
drivers to allocate private data to the old request code.  I've
actually already implemented that for SCSI long time ago, and have
started to life it to the block layer.


Would be cool to have a generic sense buffer.
I always found it slightly odd, pretending that 'struct request' is 
protocol-agnostic and refusing to add a sense data pointer, but at the 
same time having a field 'sense_len' (which gives the length of what 
exactly?).


Christoph, do you have a pointer to your patchset?
Not that I'll be able to do any meaningful work until next year, but 
having a look would be nice. Just to get a feeling where you want to 
head to; I might be able to work on this start of January.


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCH] scsi: do not requeue requests unaligned with device sector size

2016-12-21 Thread Hannes Reinecke
On 12/21/2016 08:50 AM, Christoph Hellwig wrote:
> On Tue, Dec 20, 2016 at 12:02:27AM -0200, Mauricio Faria de Oliveira wrote:
>> When a SCSI command (e.g., read operation) is partially completed
>> with good status and residual bytes (i.e., not all the bytes from
>> the specified transfer length were transferred) the SCSI midlayer
>> will update the request/bios with the completed bytes and requeue
>> the request in order to complete the remainder/pending bytes.
>>
>> However, when the device sector size is greater than the 512-byte
>> default/kernel sector size, alignment restrictions and validation
>> apply (both to the starting logical block address, and the number
>> of logical blocks to transfer) -- values must be multiples of the
>> device sector size, otherwise the kernel fails the request in the
>> preparation stage (e.g., sd_setup_read_write_cmnd() at sd.c file):
> 
> How do you even get an unaligned residual count?  Except for SES
> processor devices (which will only issue BLOCK_PC commands) this is
> not allowed by SPC:
> 
> "The residual count shall be reported in bytes if the peripheral device
>  type in the destination target descriptor is 03h (i.e., processor device),
>  and in destination device blocks for all other device type codes."

Which actually would be pretty much my objection, too.

This would only be applicable for 512e drives, where we _might_ end up
with a residual smaller than the physical sector size.
But that should be handled by firmware; after all, that's what the 'e'
implies, right?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 4/7] blk-flush: run the queue when inserting blk-mq flush

2016-12-08 Thread Hannes Reinecke

On 12/08/2016 09:13 PM, Jens Axboe wrote:

Currently we pass in to run the queue async, but don't flag the
queue to be run. We don't need to run it async here, but we should
run it. So fixup the parameters.

Signed-off-by: Jens Axboe 
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 1bdbb3d3e5f5..27a42dab5a36 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -426,7 +426,7 @@ void blk_insert_flush(struct request *rq)
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
if (q->mq_ops) {
-   blk_mq_insert_request(rq, false, false, true);
+   blk_mq_insert_request(rq, false, true, false);
} else
list_add_tail(&rq->queuelist, &q->queue_head);
    return;


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 2/7] blk-mq: abstract out blk_mq_dispatch_rq_list() helper

2016-12-08 Thread Hannes Reinecke

On 12/08/2016 09:13 PM, Jens Axboe wrote:

Takes a list of requests, and dispatches it. Moves any residual
requests to the dispatch list.

Signed-off-by: Jens Axboe 
---
 block/blk-mq.c | 85 --
 block/blk-mq.h |  1 +
 2 files changed, 48 insertions(+), 38 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 3/7] elevator: make the rqhash helpers exported

2016-12-08 Thread Hannes Reinecke

On 12/08/2016 09:13 PM, Jens Axboe wrote:

Signed-off-by: Jens Axboe 
---
 block/elevator.c | 8 
 include/linux/elevator.h | 5 +
 2 files changed, 9 insertions(+), 4 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: Regarding "scsi_lib: Decode T10 vendor IDs" d230823a1c4c3e97afd4c934b86b3975d5e20249

2016-11-25 Thread Hannes Reinecke
On 11/24/2016 04:48 PM, Sylvain Munaut wrote:
> Hi,
> 
> 
> Regarding this commit :
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d230823a1c4c3e97afd4c934b86b3975d5e20249
> 
> Looking at both the code and the comment, it seems to me you wanted
> 'T10' id to be a last resort if nothing else was available.
> 
> If that was the intent, it's not working ...
> 
> I have a Samsung 850 Pro SSD that has the NAA identifier after the T10
> one in the vpd_pg83 and so it's returning the T10 now because it's
> longer.
> 
Hmm.

> Did I misunderstand the intent ? Or is it a bug ? (for the latter, I
> can write a patch myself, I just didn't want to loose my time if I
> misunderstood).
> 
The T10 identifier should _always_ be the last resort; anything is
better than that :-)

But for NAA we have to check the length, as I've come across an array
which will (errorneously) report _two_ NAA identifiers, and only the
longer one of them will be providing the correct identification.
The other one is in fact the NAA for the target port, and thus identical
for all LUNs from that port :-(

So yeah, the intention was not to prefer a longer T10 to a NAA identifier.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.com  +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] scsi: hpsa: fix uninitialized variable access

2016-11-22 Thread Hannes Reinecke
On 11/22/2016 03:32 PM, Arnd Bergmann wrote:
> A bugfix has left the 'sd' variable uninitialized:
> 
> drivers/scsi/hpsa.c: In function 'hpsa_slave_alloc':
> drivers/scsi/hpsa.c:2033:5: error: 'sd' may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
> 
> This reverts back to calling lookup_hpsa_scsi_dev() for the
> HPSA_PHYSICAL_DEVICE_BUS case, but also keeps doing that when
> hpsa_find_device_by_sas_rphy() returns NULL, as is currently
> done.
> 
> The patch that caused this is marked for stable backports,
> so this one has to be backported on top as well.
> 
> Fixes: 4eb307f7b18d ("scsi: hpsa: use bus '3' for legacy HBA devices")
> Signed-off-by: Arnd Bergmann 
> ---
> I did not try hard to figure out what the correct behavior
> should be, so please treat this as a bugreport that might contain
> the right fix.
> ---
>  drivers/scsi/hpsa.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index ea64c01f3d42..d17ee63045c3 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -2029,7 +2029,10 @@ static int hpsa_slave_alloc(struct scsi_device *sdev)
>   sd->target = sdev_id(sdev);
>   sd->lun = sdev->lun;
>   }
> + } else {
> + sd = NULL;
>   }
> +
>   if (!sd)
>   sd = lookup_hpsa_scsi_dev(h, sdev_channel(sdev),
>   sdev_id(sdev), sdev->lun);
> 
Hmm.

I'd prefer this:

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 05f7782..ee6f852 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -2031,7 +2031,7 @@ static struct hpsa_scsi_dev_t
*lookup_hpsa_scsi_dev(struct ctlr_info *h,

 static int hpsa_slave_alloc(struct scsi_device *sdev)
 {
-   struct hpsa_scsi_dev_t *sd;
+   struct hpsa_scsi_dev_t *sd = NULL;
unsigned long flags;
struct ctlr_info *h;


Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.com  +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH/RFC] add "failfast" support for raid1/raid10.

2016-11-17 Thread Hannes Reinecke
(Seeing that it was me who initiated those patches I guess I should
speak up here)

On 11/18/2016 06:16 AM, NeilBrown wrote:
> Hi,
> 
>  I've been sitting on these patches for a while because although they
>  solve a real problem, it is a fairly limited use-case, and I don't
>  really like some of the details.
> 
>  So I'm posting them as RFC in the hope that a different perspective
>  might help me like them better, or find a better approach.
> 
[ .. ]
>  My two main concerns are:
>   - does this functionality have any use-case outside of mirrored
> storage arrays, and are there other storage arrays which
> occasionally inserted excessive latency (seems like a serious
> misfeature to me, but I know few of the details)?
Yes, there are.
I've come across some storage arrays which really take some liberty when
doing internal error recovery; some even take up to 20 minutes
before sending a command completion (the response was "there's nothing
in the SCSI spec which forbids us to do so")

>   - would it be at all possible to have "real" failfast functionality
> in the block layer?  I.e. something that is based on time rather
> than retry count.  Maybe in some cases a retry would be
> appropriate if the first failure was very fast.
> I.e. it would reduce timeouts and decide on retries based on
> elapsed time rather than number of attempts.
> With this would come the question of "how fast is fast" and I
> don't have a really good answer.  Maybe md would need to set a
> timeout, which it would double whenever it got failures on all
> drives.  Otherwise the timeout would drift towards (say) 10 times
> the typical response time.
> 
The current 'failfast' is rather a 'do not attempt error recovery' flag;
ie the SCSI stack should _not_ start error recovery but rather pass the
request upwards in case of failure.
Problem is that there is no real upper limit on the time error recovery
could take, and it's virtually impossible to give an I/O response time
guarantees once error recovery had been invoked.
And to make matters worse, in most cases error recovery won't work
_anyway_ if the transport is severed.
So this is more to do with error recovery, and not so much on the time
each request can/should spend on the fly.

The S/390 DASD case is even worse, as the DASD driver _by design_ will
always have to wait for an answer from the storage array. So if the link
to the array is severed you are in deep trouble, as you'll never get a
completion (or any status, for that matter) until the array is reconnected.

So while the FAILFAST flag is a mere convenience for SCSI, it's a
positive must for S/390 if you want to have a functional RAID.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH v4 07/15] scsi: fc: implement kref backed reference counting

2016-11-17 Thread Hannes Reinecke
On 11/17/2016 10:31 AM, Johannes Thumshirn wrote:
> Implement kref backed reference counting instead of rolling our own. This
> elimnates the need of the following fields in 'struct fc_bsg_job':
> * ref_cnt
> * state_flags
> * job_lock
> bringing us close to unification of 'struct fc_bsg_job' and 'struct bsg_job'.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/scsi/scsi_transport_fc.c | 40 
> +++-
>  include/scsi/scsi_transport_fc.h |  4 +---
>  2 files changed, 12 insertions(+), 32 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


<    1   2   3   4   5   6   7   8   9   10   >