date:20161019

Re: [PATCH 04/10] mm: replace get_user_pages_locked() write/force parameters with gup_flags

2016-10-19 Thread Jan Kara

On Tue 18-10-16 14:56:09, Lorenzo Stoakes wrote:
> On Tue, Oct 18, 2016 at 02:54:25PM +0200, Jan Kara wrote:
> > > @@ -1282,7 +1282,7 @@ long get_user_pages(unsigned long start, unsigned 
> > > long nr_pages,
> > >   int write, int force, struct page **pages,
> > >   struct vm_area_struct **vmas);
> > >  long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
> > > - int write, int force, struct page **pages, int *locked);
> > > + unsigned int gup_flags, struct page **pages, int *locked);
> >
> > Hum, the prototype is inconsistent with e.g. __get_user_pages_unlocked()
> > where gup_flags come after **pages argument. Actually it makes more sense
> > to have it before **pages so that input arguments come first and output
> > arguments second but I don't care that much. But it definitely should be
> > consistent...
> 
> It was difficult to decide quite how to arrange parameters as there was
> inconsitency with regards to parameter ordering already - for example
> __get_user_pages() places its flags argument before pages whereas, as you 
> note,
> __get_user_pages_unlocked() puts them afterwards.
> 
> I ended up compromising by trying to match the existing ordering of the 
> function
> as much as I could by replacing write, force pairs with gup_flags in the same
> location (with the exception of get_user_pages_unlocked() which I felt should
> match __get_user_pages_unlocked() in signature) or if there was already a
> gup_flags parameter as in the case of __get_user_pages_unlocked() I simply
> removed the write, force pair and left the flags as the last parameter.
> 
> I am happy to rearrange parameters as needed, however I am not sure if it'd be
> worthwhile for me to do so (I am keen to try to avoid adding too much noise 
> here
> :)
> 
> If we were to rearrange parameters for consistency I'd suggest adjusting
> __get_user_pages_unlocked() to put gup_flags before pages and do the same with
> get_user_pages_unlocked(), let me know what you think.

Yeah, ok. If the inconsistency is already there, just leave it for now.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 4/6] qedi: Add LL2 iSCSI interface for offload iSCSI.

2016-10-19 Thread Hannes Reinecke

On 10/19/2016 07:01 AM, manish.rangan...@cavium.com wrote:
> From: Manish Rangankar 
> 
> This patch adds support for iscsiuio interface using Light L2 (LL2) qed
> interface.
> 
> Signed-off-by: Nilesh Javali 
> Signed-off-by: Adheer Chandravanshi 
> Signed-off-by: Chad Dupuis 
> Signed-off-by: Saurav Kashyap 
> Signed-off-by: Arun Easi 
> Signed-off-by: Manish Rangankar 
> ---
>  drivers/scsi/qedi/qedi.h  |  73 +
>  drivers/scsi/qedi/qedi_main.c | 357 
> ++
>  2 files changed, 430 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 06/10] mm: replace get_user_pages() write/force parameters with gup_flags

2016-10-19 Thread Jan Kara

On Thu 13-10-16 01:20:16, Lorenzo Stoakes wrote:
> This patch removes the write and force parameters from get_user_pages() and
> replaces them with a gup_flags parameter to make the use of FOLL_FORCE 
> explicit
> in callers as use of this flag can result in surprising behaviour (and hence
> bugs) within the mm subsystem.
> 
> Signed-off-by: Lorenzo Stoakes 

The patch looks good. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  arch/cris/arch-v32/drivers/cryptocop.c |  4 +---
>  arch/ia64/kernel/err_inject.c  |  2 +-
>  arch/x86/mm/mpx.c  |  5 ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  7 +--
>  drivers/gpu/drm/radeon/radeon_ttm.c|  3 ++-
>  drivers/gpu/drm/via/via_dmablit.c  |  4 ++--
>  drivers/infiniband/core/umem.c |  6 +-
>  drivers/infiniband/hw/mthca/mthca_memfree.c|  2 +-
>  drivers/infiniband/hw/qib/qib_user_pages.c |  3 ++-
>  drivers/infiniband/hw/usnic/usnic_uiom.c   |  5 -
>  drivers/media/v4l2-core/videobuf-dma-sg.c  |  7 +--
>  drivers/misc/mic/scif/scif_rma.c   |  3 +--
>  drivers/misc/sgi-gru/grufault.c|  2 +-
>  drivers/platform/goldfish/goldfish_pipe.c  |  3 ++-
>  drivers/rapidio/devices/rio_mport_cdev.c   |  3 ++-
>  .../vc04_services/interface/vchiq_arm/vchiq_2835_arm.c |  3 +--
>  .../vc04_services/interface/vchiq_arm/vchiq_arm.c  |  3 +--
>  drivers/virt/fsl_hypervisor.c  |  4 ++--
>  include/linux/mm.h |  2 +-
>  mm/gup.c   | 12 +++-
>  mm/mempolicy.c |  2 +-
>  mm/nommu.c | 18 
> --
>  22 files changed, 49 insertions(+), 54 deletions(-)
> 
> diff --git a/arch/cris/arch-v32/drivers/cryptocop.c 
> b/arch/cris/arch-v32/drivers/cryptocop.c
> index b5698c8..099e170 100644
> --- a/arch/cris/arch-v32/drivers/cryptocop.c
> +++ b/arch/cris/arch-v32/drivers/cryptocop.c
> @@ -2722,7 +2722,6 @@ static int cryptocop_ioctl_process(struct inode *inode, 
> struct file *filp, unsig
>   err = get_user_pages((unsigned long int)(oper.indata + prev_ix),
>noinpages,
>0,  /* read access only for in data */
> -  0, /* no force */
>inpages,
>NULL);
>  
> @@ -2736,8 +2735,7 @@ static int cryptocop_ioctl_process(struct inode *inode, 
> struct file *filp, unsig
>   if (oper.do_cipher){
>   err = get_user_pages((unsigned long int)oper.cipher_outdata,
>nooutpages,
> -  1, /* write access for out data */
> -  0, /* no force */
> +  FOLL_WRITE, /* write access for out data */
>outpages,
>NULL);
>   up_read(¤t->mm->mmap_sem);
> diff --git a/arch/ia64/kernel/err_inject.c b/arch/ia64/kernel/err_inject.c
> index 09f8457..5ed0ea9 100644
> --- a/arch/ia64/kernel/err_inject.c
> +++ b/arch/ia64/kernel/err_inject.c
> @@ -142,7 +142,7 @@ store_virtual_to_phys(struct device *dev, struct 
> device_attribute *attr,
>   u64 virt_addr=simple_strtoull(buf, NULL, 16);
>   int ret;
>  
> - ret = get_user_pages(virt_addr, 1, VM_READ, 0, NULL, NULL);
> + ret = get_user_pages(virt_addr, 1, FOLL_WRITE, NULL, NULL);
>   if (ret<=0) {
>  #ifdef ERR_INJ_DEBUG
>   printk("Virtual address %lx is not existing.\n",virt_addr);
> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> index 8047687..e4f8009 100644
> --- a/arch/x86/mm/mpx.c
> +++ b/arch/x86/mm/mpx.c
> @@ -544,10 +544,9 @@ static int mpx_resolve_fault(long __user *addr, int 
> write)
>  {
>   long gup_ret;
>   int nr_pages = 1;
> - int force = 0;
>  
> - gup_ret = get_user_pages((unsigned long)addr, nr_pages, write,
> - force, NULL, NULL);
> + gup_ret = get_user_pages((unsigned long)addr, nr_pages,
> + write ? FOLL_WRITE : 0, NULL, NULL);
>   /*
>* get_user_pages() returns number of pages gotten.
>* 0 means we failed to fault in and get anything,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 887483b..dcaf691 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -555,10 +555,13 @@ struct amdgpu_ttm_tt {
>  int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
>  {
>   struct amdgpu_ttm_tt *gtt = (voi

Re: [PATCH 07/10] mm: replace get_user_pages_remote() write/force parameters with gup_flags

2016-10-19 Thread Jan Kara

On Thu 13-10-16 01:20:17, Lorenzo Stoakes wrote:
> This patch removes the write and force parameters from get_user_pages_remote()
> and replaces them with a gup_flags parameter to make the use of FOLL_FORCE
> explicit in callers as use of this flag can result in surprising behaviour 
> (and
> hence bugs) within the mm subsystem.
> 
> Signed-off-by: Lorenzo Stoakes 

Looks good. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c   |  7 +--
>  drivers/gpu/drm/i915/i915_gem_userptr.c |  6 +-
>  drivers/infiniband/core/umem_odp.c  |  7 +--
>  fs/exec.c   |  9 +++--
>  include/linux/mm.h  |  2 +-
>  kernel/events/uprobes.c |  6 --
>  mm/gup.c| 22 +++---
>  mm/memory.c |  6 +-
>  security/tomoyo/domain.c|  2 +-
>  9 files changed, 40 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index 5ce3603..0370b84 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -748,19 +748,22 @@ static struct page **etnaviv_gem_userptr_do_get_pages(
>   int ret = 0, pinned, npages = etnaviv_obj->base.size >> PAGE_SHIFT;
>   struct page **pvec;
>   uintptr_t ptr;
> + unsigned int flags = 0;
>  
>   pvec = drm_malloc_ab(npages, sizeof(struct page *));
>   if (!pvec)
>   return ERR_PTR(-ENOMEM);
>  
> + if (!etnaviv_obj->userptr.ro)
> + flags |= FOLL_WRITE;
> +
>   pinned = 0;
>   ptr = etnaviv_obj->userptr.ptr;
>  
>   down_read(&mm->mmap_sem);
>   while (pinned < npages) {
>   ret = get_user_pages_remote(task, mm, ptr, npages - pinned,
> - !etnaviv_obj->userptr.ro, 0,
> - pvec + pinned, NULL);
> + flags, pvec + pinned, NULL);
>   if (ret < 0)
>   break;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c 
> b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index e537930..c6f780f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -508,6 +508,10 @@ __i915_gem_userptr_get_pages_worker(struct work_struct 
> *_work)
>   pvec = drm_malloc_gfp(npages, sizeof(struct page *), GFP_TEMPORARY);
>   if (pvec != NULL) {
>   struct mm_struct *mm = obj->userptr.mm->mm;
> + unsigned int flags = 0;
> +
> + if (!obj->userptr.read_only)
> + flags |= FOLL_WRITE;
>  
>   ret = -EFAULT;
>   if (atomic_inc_not_zero(&mm->mm_users)) {
> @@ -517,7 +521,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct 
> *_work)
>   (work->task, mm,
>obj->userptr.ptr + pinned * PAGE_SIZE,
>npages - pinned,
> -  !obj->userptr.read_only, 0,
> +  flags,
>pvec + pinned, NULL);
>   if (ret < 0)
>   break;
> diff --git a/drivers/infiniband/core/umem_odp.c 
> b/drivers/infiniband/core/umem_odp.c
> index 75077a0..1f0fe32 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -527,6 +527,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 
> user_virt, u64 bcnt,
>   u64 off;
>   int j, k, ret = 0, start_idx, npages = 0;
>   u64 base_virt_addr;
> + unsigned int flags = 0;
>  
>   if (access_mask == 0)
>   return -EINVAL;
> @@ -556,6 +557,9 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 
> user_virt, u64 bcnt,
>   goto out_put_task;
>   }
>  
> + if (access_mask & ODP_WRITE_ALLOWED_BIT)
> + flags |= FOLL_WRITE;
> +
>   start_idx = (user_virt - ib_umem_start(umem)) >> PAGE_SHIFT;
>   k = start_idx;
>  
> @@ -574,8 +578,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 
> user_virt, u64 bcnt,
>*/
>   npages = get_user_pages_remote(owning_process, owning_mm,
>   user_virt, gup_num_pages,
> - access_mask & ODP_WRITE_ALLOWED_BIT,
> - 0, local_page_list, NULL);
> + flags, local_page_list, NULL);
>   up_read(&owning_mm->mmap_sem);
>  
>   if (npages < 0)
> diff --git a/fs/exec.c b/fs/exec.c
> index 6fcfb3f..4e497b9 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -191,6 +191,7 @@ static struct page *get_arg_p

Re: [RFC 3/6] qedi: Add QLogic FastLinQ offload iSCSI driver framework.

2016-10-19 Thread Hannes Reinecke

On 10/19/2016 07:01 AM, manish.rangan...@cavium.com wrote:
> From: Manish Rangankar 
> 
> The QLogic FastLinQ Driver for iSCSI (qedi) is the iSCSI specific module
> for 41000 Series Converged Network Adapters by QLogic.
> 
> This patch consists of following changes:
>   - MAINTAINERS Makefile and Kconfig changes for qedi,
>   - PCI driver registration,
>   - iSCSI host level initialization,
>   - Debugfs and log level infrastructure.
> 
> Signed-off-by: Nilesh Javali 
> Signed-off-by: Adheer Chandravanshi 
> Signed-off-by: Chad Dupuis 
> Signed-off-by: Saurav Kashyap 
> Signed-off-by: Arun Easi 
> Signed-off-by: Manish Rangankar 
> ---
>  MAINTAINERS |6 +
>  drivers/net/ethernet/qlogic/Kconfig |   12 -
>  drivers/scsi/Kconfig|1 +
>  drivers/scsi/Makefile   |1 +
>  drivers/scsi/qedi/Kconfig   |   10 +
>  drivers/scsi/qedi/Makefile  |5 +
>  drivers/scsi/qedi/qedi.h|  286 +++
>  drivers/scsi/qedi/qedi_dbg.c|  143 
>  drivers/scsi/qedi/qedi_dbg.h|  144 
>  drivers/scsi/qedi/qedi_debugfs.c|  244 ++
>  drivers/scsi/qedi/qedi_hsi.h|   52 ++
>  drivers/scsi/qedi/qedi_main.c   | 1550 
> +++
>  drivers/scsi/qedi/qedi_sysfs.c  |   52 ++
>  drivers/scsi/qedi/qedi_version.h|   14 +
>  14 files changed, 2508 insertions(+), 12 deletions(-)
>  create mode 100644 drivers/scsi/qedi/Kconfig
>  create mode 100644 drivers/scsi/qedi/Makefile
>  create mode 100644 drivers/scsi/qedi/qedi.h
>  create mode 100644 drivers/scsi/qedi/qedi_dbg.c
>  create mode 100644 drivers/scsi/qedi/qedi_dbg.h
>  create mode 100644 drivers/scsi/qedi/qedi_debugfs.c
>  create mode 100644 drivers/scsi/qedi/qedi_hsi.h
>  create mode 100644 drivers/scsi/qedi/qedi_main.c
>  create mode 100644 drivers/scsi/qedi/qedi_sysfs.c
>  create mode 100644 drivers/scsi/qedi/qedi_version.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5e925a2..906d05f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9909,6 +9909,12 @@ F: drivers/net/ethernet/qlogic/qed/
>  F:   include/linux/qed/
>  F:   drivers/net/ethernet/qlogic/qede/
>  
> +QLOGIC QL41xxx ISCSI DRIVER
> +M:   qlogic-storage-upstr...@cavium.com
> +L:   linux-scsi@vger.kernel.org
> +S:   Supported
> +F:   drivers/scsi/qedi/
> +
>  QNX4 FILESYSTEM
>  M:   Anders Larsen 
>  W:   http://www.alarsen.net/linux/qnx4fs/
> diff --git a/drivers/net/ethernet/qlogic/Kconfig 
> b/drivers/net/ethernet/qlogic/Kconfig
> index bad4fae..28b4366 100644
> --- a/drivers/net/ethernet/qlogic/Kconfig
> +++ b/drivers/net/ethernet/qlogic/Kconfig
> @@ -121,16 +121,4 @@ config INFINIBAND_QEDR
>  config QED_ISCSI
>   bool
>  
> -config QEDI
> - tristate "QLogic QED 25/40/100Gb iSCSI driver"
> - depends on QED
> - select QED_LL2
> - select QED_ISCSI
> - default n
> - ---help---
> -   This provides a temporary node that allows the compilation
> -   and logical testing of the hardware offload iSCSI support
> -   for QLogic QED. This would be replaced by the 'real' option
> -   once the QEDI driver is added [+relocated].
> -
>  endif # NET_VENDOR_QLOGIC
Huh? You just introduce this one in patch 1/6.
Please fold them together so that this can be omitted.

> diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
> index 3e2bdb9..5cf03db 100644
> --- a/drivers/scsi/Kconfig
> +++ b/drivers/scsi/Kconfig
> @@ -1254,6 +1254,7 @@ config SCSI_QLOGICPTI
>  
>  source "drivers/scsi/qla2xxx/Kconfig"
>  source "drivers/scsi/qla4xxx/Kconfig"
> +source "drivers/scsi/qedi/Kconfig"
>  
>  config SCSI_LPFC
>   tristate "Emulex LightPulse Fibre Channel Support"
> diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
> index 38d938d..da9e312 100644
> --- a/drivers/scsi/Makefile
> +++ b/drivers/scsi/Makefile
> @@ -132,6 +132,7 @@ obj-$(CONFIG_PS3_ROM) += ps3rom.o
>  obj-$(CONFIG_SCSI_CXGB3_ISCSI)   += libiscsi.o libiscsi_tcp.o cxgbi/
>  obj-$(CONFIG_SCSI_CXGB4_ISCSI)   += libiscsi.o libiscsi_tcp.o cxgbi/
>  obj-$(CONFIG_SCSI_BNX2_ISCSI)+= libiscsi.o bnx2i/
> +obj-$(CONFIG_QEDI)  += libiscsi.o qedi/
>  obj-$(CONFIG_BE2ISCSI)   += libiscsi.o be2iscsi/
>  obj-$(CONFIG_SCSI_ESAS2R)+= esas2r/
>  obj-$(CONFIG_SCSI_PMCRAID)   += pmcraid.o
> diff --git a/drivers/scsi/qedi/Kconfig b/drivers/scsi/qedi/Kconfig
> new file mode 100644
> index 000..23ca8a2
> --- /dev/null
> +++ b/drivers/scsi/qedi/Kconfig
> @@ -0,0 +1,10 @@
> +config QEDI
> + tristate "QLogic QEDI 25/40/100Gb iSCSI Initiator Driver Support"
> + depends on PCI && SCSI
> + depends on QED
> + select SCSI_ISCSI_ATTRS
> + select QED_LL2
> + select QED_ISCSI
> + ---help---
> + This driver supports iSCSI offload for the QLogic FastLinQ
> + 41000 Series Converged Network Adapters.
> diff --git a/drivers/scsi/qedi/Makefile b/drivers/scsi/qedi/Makefile
> new file mode 100644
> index

Re: [PATCH 08/10] mm: replace __access_remote_vm() write parameter with gup_flags

2016-10-19 Thread Michal Hocko

On Wed 19-10-16 09:40:45, Lorenzo Stoakes wrote:
> On Wed, Oct 19, 2016 at 10:13:52AM +0200, Michal Hocko wrote:
> > On Wed 19-10-16 09:59:03, Jan Kara wrote:
> > > On Thu 13-10-16 01:20:18, Lorenzo Stoakes wrote:
> > > > This patch removes the write parameter from __access_remote_vm() and 
> > > > replaces it
> > > > with a gup_flags parameter as use of this function previously _implied_
> > > > FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
> > > >
> > > > We make this explicit as use of FOLL_FORCE can result in surprising 
> > > > behaviour
> > > > (and hence bugs) within the mm subsystem.
> > > >
> > > > Signed-off-by: Lorenzo Stoakes 
> > >
> > > So I'm not convinced this (and the following two patches) is actually
> > > helping much. By grepping for FOLL_FORCE we will easily see that any 
> > > caller
> > > of access_remote_vm() gets that semantics and can thus continue search
> >
> > I am really wondering. Is there anything inherent that would require
> > FOLL_FORCE for access_remote_vm? I mean FOLL_FORCE is a really
> > non-trivial thing. It doesn't obey vma permissions so we should really
> > minimize its usage. Do all of those users really need FOLL_FORCE?
> 
> I wonder about this also, for example by accessing /proc/self/mem you trigger
> access_remote_vm() and consequently get_user_pages_remote() meaning FOLL_FORCE
> is implied and you can use /proc/self/mem to override any VMA permissions. I

yes this is the desirable and expected behavior. 

> wonder if this is desirable behaviour or whether this ought to be limited to
> ptrace system calls. Regardless, by making the flag more visible it makes it
> easier to see that this is happening.

mem_open already enforces PTRACE_MODE_ATTACH

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] mm: replace __access_remote_vm() write parameter with gup_flags

2016-10-19 Thread Michal Hocko

On Wed 19-10-16 09:59:03, Jan Kara wrote:
> On Thu 13-10-16 01:20:18, Lorenzo Stoakes wrote:
> > This patch removes the write parameter from __access_remote_vm() and 
> > replaces it
> > with a gup_flags parameter as use of this function previously _implied_
> > FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
> > 
> > We make this explicit as use of FOLL_FORCE can result in surprising 
> > behaviour
> > (and hence bugs) within the mm subsystem.
> > 
> > Signed-off-by: Lorenzo Stoakes 
> 
> So I'm not convinced this (and the following two patches) is actually
> helping much. By grepping for FOLL_FORCE we will easily see that any caller
> of access_remote_vm() gets that semantics and can thus continue search

I am really wondering. Is there anything inherent that would require
FOLL_FORCE for access_remote_vm? I mean FOLL_FORCE is a really
non-trivial thing. It doesn't obey vma permissions so we should really
minimize its usage. Do all of those users really need FOLL_FORCE?

Anyway I would rather see the flag explicit and used at more places than
hidden behind a helper function.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 10/11] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code

2016-10-19 Thread Christoph Hellwig

Looks fine,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Device or HBA QD throttling creates holes in Sequential work load

2016-10-19 Thread Kashyap Desai

Hi,

I am doing some performance tuning in MR driver to understand how sdev
queue depth and hba queue depth play role in IO submission from above
layer.

I have 24 JBOD connected to MR 12GB controller and I can see performance
for 4K Sequential work load as below.

HBA QD for MR controller is 4065
Per device QD is set to 32

queue depth from  256 reports 300K IOPS
queue depth from  128 reports 330K IOPS
queue depth from  64 reports 360K IOPS
queue depth from  32 reports 510K IOPS


In MR driver I added debug print and confirm that more IO come to driver
as random IO whenever I have  queue depth more than 32.
I have debug using scsi logging level and blktrace as well. Below is
snippet of logs using scsi logging level.  In summary, if SML do flow
control of IO due to Device QD or HBA QD,  IO coming to LLD is more random
pattern.

I see IO coming to driver is not sequential.

[79546.912041] sd 18:2:21:0: [sdy] tag#854 CDB: Write(10) 2a 00 00 03 c0
3b 00 00 01 00
[79546.912049] sd 18:2:21:0: [sdy] tag#855 CDB: Write(10) 2a 00 00 03 c0
3c 00 00 01 00
[79546.912053] sd 18:2:21:0: [sdy] tag#886 CDB: Write(10) 2a 00 00 03 c0
5b 00 00 01 00 <- After 3c it jumps to 5b. Sequence are overlapped. Due to
sdev QD throttling.
[79546.912056] sd 18:2:21:0: [sdy] tag#887 CDB: Write(10) 2a 00 00 03 c0
5c 00 00 01 00
[79546.912250] sd 18:2:21:0: [sdy] tag#856 CDB: Write(10) 2a 00 00 03 c0
3d 00 00 01 00
[79546.912257] sd 18:2:21:0: [sdy] tag#888 CDB: Write(10) 2a 00 00 03 c0
5d 00 00 01 00
[79546.912259] sd 18:2:21:0: [sdy] tag#857 CDB: Write(10) 2a 00 00 03 c0
3e 00 00 01 00
[79546.912268] sd 18:2:21:0: [sdy] tag#858 CDB: Write(10) 2a 00 00 03 c0
3f 00 00 01 00


If scsi_request_fn() breaks due to unavailability of device queue (due to
below check), will there be any side defect as I observe ?

if (!scsi_dev_queue_ready(q, sdev))
break;

If I reduce HBA QD and make sure IO from above layer is throttled due to
HBA QD, there is a same impact.  MR driver use host wide shared tag map.

Can someone help me if this can be tunable in LLD providing additional
settings or it is expected behavior ? Problem I am facing is, I am not
able to figure out optimal device queue depth for different configuration
and work load.

` Kashyap
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 09/11] SRP transport, scsi-mq: Wait for .queue_rq() if necessary

2016-10-19 Thread Christoph Hellwig

Looks fine,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 5/6] qedi: Add support for iSCSI session management.

2016-10-19 Thread Hannes Reinecke

On 10/19/2016 07:01 AM, manish.rangan...@cavium.com wrote:
> From: Manish Rangankar 
> 
> This patch adds support for iscsi_transport LLD Login,
> Logout, NOP-IN/NOP-OUT, Async, Reject PDU processing
> and Firmware async event handling support.
> 
> Signed-off-by: Nilesh Javali 
> Signed-off-by: Adheer Chandravanshi 
> Signed-off-by: Chad Dupuis 
> Signed-off-by: Saurav Kashyap 
> Signed-off-by: Arun Easi 
> Signed-off-by: Manish Rangankar 
> ---
>  drivers/scsi/qedi/qedi_fw.c| 1123 
>  drivers/scsi/qedi/qedi_gbl.h   |   67 ++
>  drivers/scsi/qedi/qedi_iscsi.c | 1604 
> 
>  drivers/scsi/qedi/qedi_iscsi.h |  228 ++
>  drivers/scsi/qedi/qedi_main.c  |  164 
>  5 files changed, 3186 insertions(+)
>  create mode 100644 drivers/scsi/qedi/qedi_fw.c
>  create mode 100644 drivers/scsi/qedi/qedi_gbl.h
>  create mode 100644 drivers/scsi/qedi/qedi_iscsi.c
>  create mode 100644 drivers/scsi/qedi/qedi_iscsi.h
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
> new file mode 100644
> index 000..a820785
> --- /dev/null
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -0,0 +1,1123 @@
> +/*
> + * QLogic iSCSI Offload Driver
> + * Copyright (c) 2016 Cavium Inc.
> + *
> + * This software is available under the terms of the GNU General Public 
> License
> + * (GPL) Version 2, available from the file COPYING in the main directory of
> + * this source tree.
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "qedi.h"
> +#include "qedi_iscsi.h"
> +#include "qedi_gbl.h"
> +
> +static int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
> +struct iscsi_task *mtask);
> +
> +void qedi_iscsi_unmap_sg_list(struct qedi_cmd *cmd)
> +{
> + struct scsi_cmnd *sc = cmd->scsi_cmd;
> +
> + if (cmd->io_tbl.sge_valid && sc) {
> + scsi_dma_unmap(sc);
> + cmd->io_tbl.sge_valid = 0;
> + }
> +}
> +
> +static void qedi_process_logout_resp(struct qedi_ctx *qedi,
> +  union iscsi_cqe *cqe,
> +  struct iscsi_task *task,
> +  struct qedi_conn *qedi_conn)
> +{
> + struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
> + struct iscsi_logout_rsp *resp_hdr;
> + struct iscsi_session *session = conn->session;
> + struct iscsi_logout_response_hdr *cqe_logout_response;
> + struct qedi_cmd *cmd;
> +
> + cmd = (struct qedi_cmd *)task->dd_data;
> + cqe_logout_response = &cqe->cqe_common.iscsi_hdr.logout_response;
> + spin_lock(&session->back_lock);
> + resp_hdr = (struct iscsi_logout_rsp *)&qedi_conn->gen_pdu.resp_hdr;
> + memset(resp_hdr, 0, sizeof(struct iscsi_hdr));
> + resp_hdr->opcode = cqe_logout_response->opcode;
> + resp_hdr->flags = cqe_logout_response->flags;
> + resp_hdr->hlength = 0;
> +
> + resp_hdr->itt = build_itt(cqe->cqe_solicited.itid, conn->session->age);
> + resp_hdr->statsn = cpu_to_be32(cqe_logout_response->stat_sn);
> + resp_hdr->exp_cmdsn = cpu_to_be32(cqe_logout_response->exp_cmd_sn);
> + resp_hdr->max_cmdsn = cpu_to_be32(cqe_logout_response->max_cmd_sn);
> +
> + resp_hdr->t2wait = cpu_to_be32(cqe_logout_response->time2wait);
> + resp_hdr->t2retain = cpu_to_be32(cqe_logout_response->time2retain);
> +
> + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID,
> +   "Freeing tid=0x%x for cid=0x%x\n",
> +   cmd->task_id, qedi_conn->iscsi_conn_id);
> +
> + if (likely(cmd->io_cmd_in_list)) {
> + cmd->io_cmd_in_list = false;
> + list_del_init(&cmd->io_cmd);
> + qedi_conn->active_cmd_count--;
> + } else {
> + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
> +   "Active cmd list node already deleted, tid=0x%x, 
> cid=0x%x, io_cmd_node=%p\n",
> +   cmd->task_id, qedi_conn->iscsi_conn_id,
> +   &cmd->io_cmd);
> + }
> +
> + cmd->state = RESPONSE_RECEIVED;
> + qedi_clear_task_idx(qedi, cmd->task_id);
> + __iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr, NULL, 0);
> +
> + spin_unlock(&session->back_lock);
> +}
> +
> +static void qedi_process_text_resp(struct qedi_ctx *qedi,
> +union iscsi_cqe *cqe,
> +struct iscsi_task *task,
> +struct qedi_conn *qedi_conn)
> +{
> + struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
> + struct iscsi_session *session = conn->session;
> + struct iscsi_task_context *task_ctx;
> + struct iscsi_text_rsp *resp_hdr_ptr;
> + struct iscsi_text_response_hdr *cqe_text_response;
> + struct qedi_cmd *cmd;
> + int pld_len;
> + u32 *tmp;
> +
> + cmd = (struct qedi_cmd *)task->dd_data;
> + task_ctx = (struct iscsi_task_context *)qedi_get_task_mem(&qedi->tasks,
> +

Re: [RFC 1/6] qed: Add support for hardware offloaded iSCSI.

2016-10-19 Thread Johannes Thumshirn

Hi Manish,

Some initital comments

On Wed, Oct 19, 2016 at 01:01:08AM -0400, manish.rangan...@cavium.com wrote:
> From: Yuval Mintz 
> 
> This adds the backbone required for the various HW initalizations
> which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ
> 4 line of adapters - FW notification, resource initializations, etc.
> 
> Signed-off-by: Arun Easi 
> Signed-off-by: Yuval Mintz 
> ---
>  drivers/net/ethernet/qlogic/Kconfig|   15 +
>  drivers/net/ethernet/qlogic/qed/Makefile   |1 +
>  drivers/net/ethernet/qlogic/qed/qed.h  |8 +-
>  drivers/net/ethernet/qlogic/qed/qed_dev.c  |   15 +
>  drivers/net/ethernet/qlogic/qed/qed_int.h  |1 -
>  drivers/net/ethernet/qlogic/qed/qed_iscsi.c| 1310 
> 
>  drivers/net/ethernet/qlogic/qed/qed_iscsi.h|   52 +
>  drivers/net/ethernet/qlogic/qed/qed_l2.c   |1 -
>  drivers/net/ethernet/qlogic/qed/qed_ll2.c  |   35 +-
>  drivers/net/ethernet/qlogic/qed/qed_main.c |2 -
>  drivers/net/ethernet/qlogic/qed/qed_mcp.h  |6 -
>  drivers/net/ethernet/qlogic/qed/qed_reg_addr.h |2 +
>  drivers/net/ethernet/qlogic/qed/qed_spq.c  |   15 +
>  include/linux/qed/qed_if.h |2 +
>  include/linux/qed/qed_iscsi_if.h   |  249 +
>  15 files changed, 1692 insertions(+), 22 deletions(-)
>  create mode 100644 drivers/net/ethernet/qlogic/qed/qed_iscsi.c
>  create mode 100644 drivers/net/ethernet/qlogic/qed/qed_iscsi.h
>  create mode 100644 include/linux/qed/qed_iscsi_if.h
> 
> diff --git a/drivers/net/ethernet/qlogic/Kconfig 
> b/drivers/net/ethernet/qlogic/Kconfig
> index 0df1391f9..bad4fae 100644
> --- a/drivers/net/ethernet/qlogic/Kconfig
> +++ b/drivers/net/ethernet/qlogic/Kconfig
> @@ -118,4 +118,19 @@ config INFINIBAND_QEDR
> for QLogic QED. This would be replaced by the 'real' option
> once the QEDR driver is added [+relocated].
>  
> +config QED_ISCSI
> + bool
> +
> +config QEDI
> + tristate "QLogic QED 25/40/100Gb iSCSI driver"
> + depends on QED
> + select QED_LL2
> + select QED_ISCSI
> + default n
> + ---help---
> +   This provides a temporary node that allows the compilation
> +   and logical testing of the hardware offload iSCSI support
> +   for QLogic QED. This would be replaced by the 'real' option
> +   once the QEDI driver is added [+relocated].
> +
>  endif # NET_VENDOR_QLOGIC
> diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
> b/drivers/net/ethernet/qlogic/qed/Makefile
> index cda0af7..b76669c 100644
> --- a/drivers/net/ethernet/qlogic/qed/Makefile
> +++ b/drivers/net/ethernet/qlogic/qed/Makefile
> @@ -6,3 +6,4 @@ qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o 
> qed_init_ops.o \
>  qed-$(CONFIG_QED_SRIOV) += qed_sriov.o qed_vf.o
>  qed-$(CONFIG_QED_LL2) += qed_ll2.o
>  qed-$(CONFIG_INFINIBAND_QEDR) += qed_roce.o
> +qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o
> diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
> b/drivers/net/ethernet/qlogic/qed/qed.h
> index 653bb57..a61b1c0 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed.h
> @@ -35,6 +35,7 @@
>  
>  #define QED_WFQ_UNIT 100
>  
> +#define ISCSI_BDQ_ID(_port_id) (_port_id)

This looks a bit odd to me.

[...]

>  #endif
> + if (IS_ENABLED(CONFIG_QEDI) &&
> + p_hwfn->hw_info.personality == QED_PCI_ISCSI)
> + qed_iscsi_free(p_hwfn, p_hwfn->p_iscsi_info);


Why not introduce a small helper like:
static inline bool qed_is_iscsi_personality()
{
return IS_ENABLED(CONFIG_QEDI) && p_hwfn->hw_info.personality ==
QED_PCI_ISCSI;
}

>   qed_iov_free(p_hwfn);

[...]

> +
> + if (!GET_FIELD(p_ramrod->iscsi.flags,
> +ISCSI_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B)) {
> + p_tcp = &p_ramrod->tcp;
> + ucval = p_conn->local_mac[1];
> + ((u8 *)(&p_tcp->local_mac_addr_hi))[0] = ucval;
> + ucval = p_conn->local_mac[0];
> + ((u8 *)(&p_tcp->local_mac_addr_hi))[1] = ucval;
> + ucval = p_conn->local_mac[3];
> + ((u8 *)(&p_tcp->local_mac_addr_mid))[0] = ucval;
> + ucval = p_conn->local_mac[2];
> + ((u8 *)(&p_tcp->local_mac_addr_mid))[1] = ucval;
> + ucval = p_conn->local_mac[5];
> + ((u8 *)(&p_tcp->local_mac_addr_lo))[0] = ucval;
> + ucval = p_conn->local_mac[4];
> + ((u8 *)(&p_tcp->local_mac_addr_lo))[1] = ucval;
> + ucval = p_conn->remote_mac[1];
> + ((u8 *)(&p_tcp->remote_mac_addr_hi))[0] = ucval;
> + ucval = p_conn->remote_mac[0];
> + ((u8 *)(&p_tcp->remote_mac_addr_hi))[1] = ucval;
> + ucval = p_conn->remote_mac[3];
> + ((u8 *)(&p_tcp->remote_mac_addr_mid))[0] = ucval;
> + ucval = p_conn->remote_mac[2];
> +

Re: [PATCH v3 11/11] nvme: Fix a race condition

2016-10-19 Thread Christoph Hellwig

Hi Bart,

this looks great!

Reviewed-by: Christoph Hellwig 

Some minor nitpicks below:

>  void nvme_requeue_req(struct request *req)
>  {
> + blk_mq_requeue_request(req, true);
>  }
>  EXPORT_SYMBOL_GPL(nvme_requeue_req);

Please just remove the nvme_requeue_req wrapper.

>  
> @@ -2074,11 +2068,14 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
>  void nvme_stop_queues(struct nvme_ctrl *ctrl)
>  {
>   struct nvme_ns *ns;
> + struct request_queue *q;
>  
>   mutex_lock(&ctrl->namespaces_mutex);
>   list_for_each_entry(ns, &ctrl->namespaces, list) {
> + q = ns->queue;
> + blk_mq_cancel_requeue_work(q);
> + blk_mq_stop_hw_queues(q);
> + blk_mq_quiesce_queue(q);
>   }

I'd keep the q declaration in the minimal scope, e.g.

list_for_each_entry(ns, &ctrl->namespaces, list) {
struct request_queue *q = ns->queue;

blk_mq_cancel_requeue_work(q);
blk_mq_stop_hw_queues(q);
blk_mq_quiesce_queue(q);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 01/11] blk-mq: Do not invoke .queue_rq() for a stopped queue

2016-10-19 Thread Christoph Hellwig

Good catch,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 02/11] blk-mq: Introduce blk_mq_hctx_stopped()

2016-10-19 Thread Christoph Hellwig

On Tue, Oct 18, 2016 at 02:49:09PM -0700, Bart Van Assche wrote:
> Multiple functions test the BLK_MQ_S_STOPPED bit so introduce
> a helper function that performs this test.

Looks sensible.  Any reason to have it in the public blk-mq.h instead
of the private one, though?  I see that dm is using it with this patch,
but that usage should go away once your full series is merged, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 03/11] blk-mq: Introduce blk_mq_queue_stopped()

2016-10-19 Thread Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 05/11] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()

2016-10-19 Thread Christoph Hellwig

On Tue, Oct 18, 2016 at 02:51:02PM -0700, Bart Van Assche wrote:
> Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
> are followed by kicking the requeue list. Hence add an argument to
> these two functions that allows to kick the requeue list. This was
> proposed by Christoph Hellwig.
> 
> Signed-off-by: Bart Van Assche 

Thanks Bart, this looks fine to me:

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 04/11] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-19 Thread Christoph Hellwig

> +/**
> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have 
> finished
> + *
> + * Note: this function does not prevent that the struct request end_io()
> + * callback function is invoked. Additionally, it is not prevented that
> + * new queue_rq() calls occur unless the queue has been stopped first.
> + */
> +void blk_mq_quiesce_queue(struct request_queue *q)

If this is intended to be a kerneldoc comment you need to document the 'q'
parameter.  If not you should drop the magic "/**" marker.

> +static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +{
> + int srcu_idx;
> +
> + WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> + cpu_online(hctx->next_cpu));
> +
> + if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
> + rcu_read_lock();
> + blk_mq_process_rq_list(hctx);
> + rcu_read_unlock();
> + } else {
> + srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
> + blk_mq_process_rq_list(hctx);
> + srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
> + }
> +}

Can you document these synchronization changes in detail in the changelog?

> +static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
> +   struct request *rq, blk_qc_t *cookie)
> +{
> + if (blk_mq_hctx_stopped(hctx) ||
> + blk_mq_direct_issue_request(rq, cookie) != 0)
> + blk_mq_insert_request(rq, false, true, true);
> +}

Any reason not to merge this function with blk_mq_direct_issue_request?

Otherwise this change looks fine to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 6/7] sd: Implement support for ZBC devices

2016-10-19 Thread Jeff Moyer

"Martin K. Petersen"  writes:

>> "Jeff" == Jeff Moyer  writes:
>
> Jeff,
>
> Jeff> Are power of 2 zone sizes required by the standard?  I see why
> Jeff> you've done this, but I wonder if we're artificially limiting the
> Jeff> implementation, and whether there will be valid devices on the
> Jeff> market that simply won't work with Linux because of this.
>
> Standards are deliberately written to be permissive. But Linux doesn't
> support arbitrary sector sizes either even though the spec allows it. We
> always pick a reasonably sane subset of features to implement and this
> case is no different.
>
> After some discussion we decided to rip out all the complexity that was
> required to facilitate crazy drive layouts. As a result, the code is now
> in a state where we can actually merge it. The hope is that by picking a
> specific configuration subset and widely advertising it we can influence
> the market.
>
> Also, I am not aware of anybody actually asking the drive vendors to
> support crazy zone configurations.

That's fine with me.  I didn't see any mention of this design decision
in the patch logs, though, and given that I'm not intimately involved, I
asked the question.

Thanks,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 6/7] sd: Implement support for ZBC devices

2016-10-19 Thread Jeff Moyer

Shaun Tancheff  writes:

> On Tue, Oct 18, 2016 at 11:58 AM, Jeff Moyer  wrote:
>> Damien Le Moal  writes:
>>
>>> + if (!is_power_of_2(zone_blocks)) {
>>> + if (sdkp->first_scan)
>>> + sd_printk(KERN_NOTICE, sdkp,
>>> +   "Devices with non power of 2 zone "
>>> +   "size are not supported\n");
>>> + return -ENODEV;
>>> + }
>>
>> Are power of 2 zone sizes required by the standard?  I see why you've
>> done this, but I wonder if we're artificially limiting the
>> implementation, and whether there will be valid devices on the market
>> that simply won't work with Linux because of this.
>
> The standard does not require power of 2 zones.
> That said, I am not aware of any current (or planned) devices other
> than a power of 2.
> Common zone sizes I am aware of: 256MiB, 128MiB and 1GiB.
>
> Also note that we are excluding the runt zone from the power of 2 expectation.
>
> So conforming devices should (excluding a runt zone):
>   - Have zones of the same size.
>   - Choose a zone size that is a power of 2.

OK, thanks for following up.  As long as the drive vendors are on board,
I guess we're okay.  :)

-Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] mm: replace __access_remote_vm() write parameter with gup_flags

2016-10-19 Thread Lorenzo Stoakes

On Wed, Oct 19, 2016 at 10:52:05AM +0200, Michal Hocko wrote:
> yes this is the desirable and expected behavior.
>
> > wonder if this is desirable behaviour or whether this ought to be limited to
> > ptrace system calls. Regardless, by making the flag more visible it makes it
> > easier to see that this is happening.
>
> mem_open already enforces PTRACE_MODE_ATTACH

Ah I missed this, that makes a lot of sense, thanks!

I still wonder whether other invocations of access_remote_vm() in fs/proc/base.c
(the principle caller of this function) need FOLL_FORCE, for example the various
calls that simply read data from other processes, so I think the point stands
about keeping this explicit.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] mm: replace __access_remote_vm() write parameter with gup_flags

2016-10-19 Thread Jan Kara

On Thu 13-10-16 01:20:18, Lorenzo Stoakes wrote:
> This patch removes the write parameter from __access_remote_vm() and replaces 
> it
> with a gup_flags parameter as use of this function previously _implied_
> FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
> 
> We make this explicit as use of FOLL_FORCE can result in surprising behaviour
> (and hence bugs) within the mm subsystem.
> 
> Signed-off-by: Lorenzo Stoakes 

So I'm not convinced this (and the following two patches) is actually
helping much. By grepping for FOLL_FORCE we will easily see that any caller
of access_remote_vm() gets that semantics and can thus continue search
accordingly (it is much simpler than searching for all get_user_pages()
users and extracting from parameter lists what they actually pass as
'force' argument). Sure it makes somewhat more visible to callers of
access_remote_vm() that they get FOLL_FORCE semantics but OTOH it also
opens a space for issues where a caller of access_remote_vm() actually
wants FOLL_FORCE (and currently all of them want it) and just mistakenly
does not set it. All in all I'd prefer to keep access_remote_vm() and
friends as is...

Honza

> ---
>  mm/memory.c | 23 +++
>  mm/nommu.c  |  9 ++---
>  2 files changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 20a9adb..79ebed3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3869,14 +3869,11 @@ EXPORT_SYMBOL_GPL(generic_access_phys);
>   * given task for page fault accounting.
>   */
>  static int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
> - unsigned long addr, void *buf, int len, int write)
> + unsigned long addr, void *buf, int len, unsigned int gup_flags)
>  {
>   struct vm_area_struct *vma;
>   void *old_buf = buf;
> - unsigned int flags = FOLL_FORCE;
> -
> - if (write)
> - flags |= FOLL_WRITE;
> + int write = gup_flags & FOLL_WRITE;
>  
>   down_read(&mm->mmap_sem);
>   /* ignore errors, just check how much was successfully transferred */
> @@ -3886,7 +3883,7 @@ static int __access_remote_vm(struct task_struct *tsk, 
> struct mm_struct *mm,
>   struct page *page = NULL;
>  
>   ret = get_user_pages_remote(tsk, mm, addr, 1,
> - flags, &page, &vma);
> + gup_flags, &page, &vma);
>   if (ret <= 0) {
>  #ifndef CONFIG_HAVE_IOREMAP_PROT
>   break;
> @@ -3945,7 +3942,12 @@ static int __access_remote_vm(struct task_struct *tsk, 
> struct mm_struct *mm,
>  int access_remote_vm(struct mm_struct *mm, unsigned long addr,
>   void *buf, int len, int write)
>  {
> - return __access_remote_vm(NULL, mm, addr, buf, len, write);
> + unsigned int flags = FOLL_FORCE;
> +
> + if (write)
> + flags |= FOLL_WRITE;
> +
> + return __access_remote_vm(NULL, mm, addr, buf, len, flags);
>  }
>  
>  /*
> @@ -3958,12 +3960,17 @@ int access_process_vm(struct task_struct *tsk, 
> unsigned long addr,
>  {
>   struct mm_struct *mm;
>   int ret;
> + unsigned int flags = FOLL_FORCE;
>  
>   mm = get_task_mm(tsk);
>   if (!mm)
>   return 0;
>  
> - ret = __access_remote_vm(tsk, mm, addr, buf, len, write);
> + if (write)
> + flags |= FOLL_WRITE;
> +
> + ret = __access_remote_vm(tsk, mm, addr, buf, len, flags);
> +
>   mmput(mm);
>  
>   return ret;
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 70cb844..bde7df3 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -1809,9 +1809,10 @@ void filemap_map_pages(struct fault_env *fe,
>  EXPORT_SYMBOL(filemap_map_pages);
>  
>  static int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
> - unsigned long addr, void *buf, int len, int write)
> + unsigned long addr, void *buf, int len, unsigned int gup_flags)
>  {
>   struct vm_area_struct *vma;
> + int write = gup_flags & FOLL_WRITE;
>  
>   down_read(&mm->mmap_sem);
>  
> @@ -1853,7 +1854,8 @@ static int __access_remote_vm(struct task_struct *tsk, 
> struct mm_struct *mm,
>  int access_remote_vm(struct mm_struct *mm, unsigned long addr,
>   void *buf, int len, int write)
>  {
> - return __access_remote_vm(NULL, mm, addr, buf, len, write);
> + return __access_remote_vm(NULL, mm, addr, buf, len,
> + write ? FOLL_WRITE : 0);
>  }
>  
>  /*
> @@ -1871,7 +1873,8 @@ int access_process_vm(struct task_struct *tsk, unsigned 
> long addr, void *buf, in
>   if (!mm)
>   return 0;
>  
> - len = __access_remote_vm(tsk, mm, addr, buf, len, write);
> + len = __access_remote_vm(tsk, mm, addr, buf, len,
> + write ? FOLL_WRITE : 0);
>  
>   mmput(mm);
>   return len;
> -- 
> 2.10.0
> 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsu

Re: [RFC 5/6] qedi: Add support for iSCSI session management.

2016-10-19 Thread Johannes Thumshirn

On Wed, Oct 19, 2016 at 01:01:12AM -0400, manish.rangan...@cavium.com wrote:
> From: Manish Rangankar 
> 
> This patch adds support for iscsi_transport LLD Login,
> Logout, NOP-IN/NOP-OUT, Async, Reject PDU processing
> and Firmware async event handling support.
> 
> Signed-off-by: Nilesh Javali 
> Signed-off-by: Adheer Chandravanshi 
> Signed-off-by: Chad Dupuis 
> Signed-off-by: Saurav Kashyap 
> Signed-off-by: Arun Easi 
> Signed-off-by: Manish Rangankar 
> ---

[...]

> +void qedi_iscsi_unmap_sg_list(struct qedi_cmd *cmd)
> +{
> + struct scsi_cmnd *sc = cmd->scsi_cmd;
> +
> + if (cmd->io_tbl.sge_valid && sc) {
> + scsi_dma_unmap(sc);
> + cmd->io_tbl.sge_valid = 0;
> + }
> +}

Maybe set sge_valid to 0 and then call scsi_dma_unmap(). I don't know if it's
really racy but it looks like it is.

[...]

> +static void qedi_process_text_resp(struct qedi_ctx *qedi,
> +union iscsi_cqe *cqe,
> +struct iscsi_task *task,
> +struct qedi_conn *qedi_conn)
> +{
> + struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
> + struct iscsi_session *session = conn->session;
> + struct iscsi_task_context *task_ctx;
> + struct iscsi_text_rsp *resp_hdr_ptr;
> + struct iscsi_text_response_hdr *cqe_text_response;
> + struct qedi_cmd *cmd;
> + int pld_len;
> + u32 *tmp;
> +
> + cmd = (struct qedi_cmd *)task->dd_data;
> + task_ctx = (struct iscsi_task_context *)qedi_get_task_mem(&qedi->tasks,
> +   cmd->task_id);

No need to cast here, qedi_get_task_mem() returns void *.

[...]

> + cqe_login_response = &cqe->cqe_common.iscsi_hdr.login_response;
> + task_ctx = (struct iscsi_task_context *)qedi_get_task_mem(&qedi->tasks,
> +   cmd->task_id);

Same here.

[...]

> + }
> +
> + pbl = (struct scsi_bd *)qedi->bdq_pbl;
> + pbl += (qedi->bdq_prod_idx % qedi->rq_num_entries);
> + pbl->address.hi =
> +   cpu_to_le32((u32)(((u64)(qedi->bdq[idx].buf_dma)) >> 32));
> + pbl->address.lo =
> + cpu_to_le32(((u32)(((u64)(qedi->bdq[idx].buf_dma)) &
> + 0x)));

Is this LISP or C?

> + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_CONN,
> +   "pbl [0x%p] pbl->address hi [0x%llx] lo [0x%llx] idx [%d]\n",
> +   pbl, pbl->address.hi, pbl->address.lo, idx);
> + pbl->opaque.hi = cpu_to_le32((u32)(((u64)0) >> 32));

Isn't this plain pbl->opaque.hi = 0; ?

> + pbl->opaque.lo = cpu_to_le32(((u32)(((u64)idx) & 0x)));
> +

[...]

> + switch (comp_type) {
> + case ISCSI_CQE_TYPE_SOLICITED:
> + case ISCSI_CQE_TYPE_SOLICITED_WITH_SENSE:
> + fw_task_ctx =
> +   (struct iscsi_task_context *)qedi_get_task_mem(&qedi->tasks,
> +   cqe->cqe_solicited.itid);

Again, no cast needed.

[...]

> + writel(*(u32 *)&dbell, qedi_conn->ep->p_doorbell);
> + /* Make sure fw idx is coherent */
> + wmb();
> + mmiowb();

Isn't either wmb() or mmiowb() enough?

[..]

> +
> + fw_task_ctx =
> +  (struct iscsi_task_context *)qedi_get_task_mem(&qedi->tasks, tid);

Cast again.

[...]

> + fw_task_ctx =
> +  (struct iscsi_task_context *)qedi_get_task_mem(&qedi->tasks, tid);

^^

[...]

> + fw_task_ctx =
> + (struct iscsi_task_context *)qedi_get_task_mem(&qedi->tasks, tid);


[...]

> + fw_task_ctx =
> +   (struct iscsi_task_context *)qedi_get_task_mem(&qedi->tasks, tid);
> +

[...]

> +
> + qedi = (struct qedi_ctx *)iscsi_host_priv(shost);

Same goes for iscsi_host_priv();

[...]

> + ret = wait_event_interruptible_timeout(qedi_ep->ofld_wait,
> +((qedi_ep->state ==
> + EP_STATE_OFLDCONN_FAILED) ||
> + (qedi_ep->state ==
> + EP_STATE_OFLDCONN_COMPL)),
> + msecs_to_jiffies(timeout_ms));

Maybe:
#define QEDI_OLDCON_STATE(q) ((q)->state == EP_STATE_OFLDCONN_FAILED || \
(q)->state == EP_STATE_OFLDCONN_COMPL)

ret = wait_event_interruptible_timeout(qedi_ep->ofld_wait,
QEDI_OLDCON_STATE(qedi_ep),
msec_to_jiffies(timeout_ms));

But that could be just me hating linewraps.

[...]

Thanks,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 860

Re: [PATCH v3 06/11] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code

2016-10-19 Thread Christoph Hellwig

This looks good:

Reviewed-by: Christoph Hellwig 

On Tue, Oct 18, 2016 at 02:51:33PM -0700, Bart Van Assche wrote:
>  static void dm_mq_start_queue(struct request_queue *q)
>  {
> - unsigned long flags;
> -
> - spin_lock_irqsave(q->queue_lock, flags);
> - queue_flag_clear(QUEUE_FLAG_STOPPED, q);
> - spin_unlock_irqrestore(q->queue_lock, flags);
> -
>   blk_mq_start_stopped_hw_queues(q, true);
>   blk_mq_kick_requeue_list(q);

FYI, I'm tempted to say we should always call blk_mq_kick_requeue_list
from blk_mq_start_stopped_hw_queues, but that's a separate issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 6/6] qedi: Add support for data path.

2016-10-19 Thread Hannes Reinecke

On 10/19/2016 07:01 AM, manish.rangan...@cavium.com wrote:
> From: Manish Rangankar 
> 
> This patch adds support for data path and TMF handling.
> 
> Signed-off-by: Nilesh Javali 
> Signed-off-by: Adheer Chandravanshi 
> Signed-off-by: Chad Dupuis 
> Signed-off-by: Saurav Kashyap 
> Signed-off-by: Arun Easi 
> Signed-off-by: Manish Rangankar 
> ---
>  drivers/scsi/qedi/qedi_fw.c| 1282 
> 
>  drivers/scsi/qedi/qedi_gbl.h   |6 +
>  drivers/scsi/qedi/qedi_iscsi.c |6 +
>  drivers/scsi/qedi/qedi_main.c  |4 +
>  4 files changed, 1298 insertions(+)
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
> index a820785..af1e14d 100644
> --- a/drivers/scsi/qedi/qedi_fw.c
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -147,6 +147,114 @@ static void qedi_process_text_resp(struct qedi_ctx 
> *qedi,
>   spin_unlock(&session->back_lock);
>  }
>  
> +static void qedi_tmf_resp_work(struct work_struct *work)
> +{
> + struct qedi_cmd *qedi_cmd =
> + container_of(work, struct qedi_cmd, tmf_work);
> + struct qedi_conn *qedi_conn = qedi_cmd->conn;
> + struct qedi_ctx *qedi = qedi_conn->qedi;
> + struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
> + struct iscsi_session *session = conn->session;
> + struct iscsi_tm_rsp *resp_hdr_ptr;
> + struct iscsi_cls_session *cls_sess;
> + int rval = 0;
> +
> + set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
> + resp_hdr_ptr =  (struct iscsi_tm_rsp *)qedi_cmd->tmf_resp_buf;
> + cls_sess = iscsi_conn_to_session(qedi_conn->cls_conn);
> +
> + iscsi_block_session(session->cls_session);
> + rval = qedi_cleanup_all_io(qedi, qedi_conn, qedi_cmd->task, true);
> + if (rval) {
> + clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
> + qedi_clear_task_idx(qedi, qedi_cmd->task_id);
> + iscsi_unblock_session(session->cls_session);
> + return;
> + }
> +
> + iscsi_unblock_session(session->cls_session);
> + qedi_clear_task_idx(qedi, qedi_cmd->task_id);
> +
> + spin_lock(&session->back_lock);
> + __iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
> + spin_unlock(&session->back_lock);
> + kfree(resp_hdr_ptr);
> + clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
> +}
> +
> +static void qedi_process_tmf_resp(struct qedi_ctx *qedi,
> +   union iscsi_cqe *cqe,
> +   struct iscsi_task *task,
> +   struct qedi_conn *qedi_conn)
> +
> +{
> + struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
> + struct iscsi_session *session = conn->session;
> + struct iscsi_tmf_response_hdr *cqe_tmp_response;
> + struct iscsi_tm_rsp *resp_hdr_ptr;
> + struct iscsi_tm *tmf_hdr;
> + struct qedi_cmd *qedi_cmd = NULL;
> + u32 *tmp;
> +
> + cqe_tmp_response = &cqe->cqe_common.iscsi_hdr.tmf_response;
> +
> + qedi_cmd = task->dd_data;
> + qedi_cmd->tmf_resp_buf = kzalloc(sizeof(*resp_hdr_ptr), GFP_KERNEL);
> + if (!qedi_cmd->tmf_resp_buf) {
> + QEDI_ERR(&qedi->dbg_ctx,
> +  "Failed to allocate resp buf, cid=0x%x\n",
> +   qedi_conn->iscsi_conn_id);
> + return;
> + }
> +
> + spin_lock(&session->back_lock);
> + resp_hdr_ptr =  (struct iscsi_tm_rsp *)qedi_cmd->tmf_resp_buf;
> + memset(resp_hdr_ptr, 0, sizeof(struct iscsi_tm_rsp));
> +
> + /* Fill up the header */
> + resp_hdr_ptr->opcode = cqe_tmp_response->opcode;
> + resp_hdr_ptr->flags = cqe_tmp_response->hdr_flags;
> + resp_hdr_ptr->response = cqe_tmp_response->hdr_response;
> + resp_hdr_ptr->hlength = 0;
> +
> + hton24(resp_hdr_ptr->dlength,
> +(cqe_tmp_response->hdr_second_dword &
> + ISCSI_TMF_RESPONSE_HDR_DATA_SEG_LEN_MASK));
> + tmp = (u32 *)resp_hdr_ptr->dlength;
> + resp_hdr_ptr->itt = build_itt(cqe->cqe_solicited.itid,
> +   conn->session->age);
> + resp_hdr_ptr->statsn = cpu_to_be32(cqe_tmp_response->stat_sn);
> + resp_hdr_ptr->exp_cmdsn  = cpu_to_be32(cqe_tmp_response->exp_cmd_sn);
> + resp_hdr_ptr->max_cmdsn = cpu_to_be32(cqe_tmp_response->max_cmd_sn);
> +
> + tmf_hdr = (struct iscsi_tm *)qedi_cmd->task->hdr;
> +
> + if (likely(qedi_cmd->io_cmd_in_list)) {
> + qedi_cmd->io_cmd_in_list = false;
> + list_del_init(&qedi_cmd->io_cmd);
> + qedi_conn->active_cmd_count--;
> + }
> +
> + if (((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> +   ISCSI_TM_FUNC_LOGICAL_UNIT_RESET) ||
> + ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> +   ISCSI_TM_FUNC_TARGET_WARM_RESET) ||
> + ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> +   ISCSI_TM_FUNC_TARGET_COLD_RESET)) {
> + INIT_WORK(&qedi_cmd->tmf_work, qedi_tmf_resp

[Bug 178381] New: Suspend to RAM test failed while CONFIG_SCSI_MQ_DEFAULT is set

2016-10-19 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=178381

Bug ID: 178381
   Summary: Suspend to RAM test failed while
CONFIG_SCSI_MQ_DEFAULT is set
   Product: SCSI Drivers
   Version: 2.5
Kernel Version: 4.8.2
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: scsi_drivers-ot...@kernel-bugs.osdl.org
  Reporter: chintz...@gmail.com
Regression: No

Hi, 

OS is Fedora 24 + Linux kernel 4.8.2 with SCSI_MQ is enabled.

I executed below commands to do suspend to RAM test (S3 mode).

$ echo platform > /sys/power/pm_test;echo mem > /sys/power/state

It can resume correctly while only one disk is installed.

However, it can't resume correctly while there are multiple disks.
the system is hanging up.  


I set hung_task_timeout_secs=120 to get the below message.


[ 1280.985118] call :00:01.1+ returned 0 after 5 usecs
[ 1479.326798] INFO: task kworker/u8:2:1772 blocked for more than 120 seconds.
[ 1479.333754] Tainted: G U 4.8.2 #1
[ 1479.338718] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1479.346533] kworker/u8:2 D 8802596f3bf8 0 1772 2 0x
[ 1479.353622] Workqueue: events_unbound async_run_entry_fn
[ 1479.358951] 8802596f3bf8 00ff8802596f3bd8 880250e71e00
880263dd5a00
[ 1479.366428] 88026dd19500 8802596f4000 7fff
88025f5e2260
[ 1479.373908] 880250e71e00 880260af0740 8802596f3c10
817f5cc5
[ 1479.381385] Call Trace:
[ 1479.383838] [] schedule+0x35/0x80
[ 1479.388800] [] schedule_timeout+0x2c4/0x440
[ 1479.394627] [] ? __enqueue_entity+0x6c/0x70
[ 1479.400454] [] ? enqueue_entity+0x2e8/0x8e0
[ 1479.406279] [] wait_for_completion+0xe1/0x120
[ 1479.412276] [] ? wake_up_q+0x80/0x80
[ 1479.417497] [] ? dpm_wait+0x40/0x40
[ 1479.422629] [] dpm_wait+0x32/0x40
[ 1479.427587] [] dpm_wait_fn+0x11/0x20
[ 1479.432805] [] device_for_each_child+0x50/0x90
[ 1479.438890] [] __device_suspend+0x51/0x380
[ 1479.444642] [] async_suspend+0x1f/0xa0
[ 1479.450035] [] async_run_entry_fn+0x39/0x140
[ 1479.455950] [] process_one_work+0x184/0x430
[ 1479.461776] [] worker_thread+0x4e/0x480
[ 1479.467255] [] ? process_one_work+0x430/0x430
[ 1479.473269] [] kthread+0xd8/0xf0
[ 1479.478144] [] ret_from_fork+0x1f/0x40
[ 1479.483535] [] ? kthread_worker_fn+0x180/0x180
[ 1479.489621] INFO: task md0_resync:2133 blocked for more than 120 seconds.
[ 1479.496400] Tainted: G U 4.8.2 #1
[ 1479.501358] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1479.509173] md0_resync D 88025717bbc8 0 2133 2 0x
[ 1479.516274] 88025717bbc8 00ff88025717bb78 880263f69e00
880263dd5a00
[ 1479.523766] 58ec83c0 88025717c000 88025717bc48
880258662088
[ 1479.531264] 880258662070 880250a6f300 88025717bbe0
817f5cc5
[ 1479.538753] Call Trace:
[ 1479.541203] [] schedule+0x35/0x80
[ 1479.546163] [] raid1_sync_request+0x2da/0xba0 [raid1]
[ 1479.552865] [] ? prepare_to_wait_event+0xf0/0xf0
[ 1479.559122] [] md_do_sync+0x8bb/0xec0
[ 1479.564428] [] ? prepare_to_wait_event+0xf0/0xf0
[ 1479.570688] [] ? check_preempt_curr+0x7e/0x90
[ 1479.576686] [] ? kernel_sigaction+0x43/0xe0
[ 1479.582538] [] md_thread+0x139/0x150
[ 1479.587757] [] ? find_pers+0x70/0x70
[ 1479.592978] [] kthread+0xd8/0xf0
[ 1479.597863] [] ret_from_fork+0x1f/0x40
[ 1479.603254] [] ? kthread_worker_fn+0x180/0x180
[ 1479.609350] INFO: task ext4lazyinit:2167 blocked for more than 120 seconds.
[ 1479.616297] Tainted: G U 4.8.2 #1
[ 1479.621256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1479.629071] ext4lazyinit D 8802530e7be8 0 2167 2 0x
[ 1479.636161] 8802530e7be8 041a2d80 880253cf
880263dd5a00
[ 1479.643637] 00042008 8802530e8000 88026dd99500
7fff
[ 1479.651114] 880253cf 1000 8802530e7c00
817f5cc5
[ 1479.658587] Call Trace:
[ 1479.661034] [] schedule+0x35/0x80
[ 1479.665995] [] schedule_timeout+0x2c4/0x440
[ 1479.671818] [] ? md_make_request+0xf6/0x230
[ 1479.677645] [] ? ktime_get+0x41/0xb0
[ 1479.682864] [] io_schedule_timeout+0xa4/0x110
[ 1479.688863] [] wait_for_completion_io+0xe1/0x120
[ 1479.695122] [] ? wake_up_q+0x80/0x80
[ 1479.700340] [] submit_bio_wait+0x65/0x90
[ 1479.705905] [] blkdev_issue_zeroout+0x172/0x1e0
[ 1479.712073] [] ext4_init_inode_table+0x18d/0x390
[ 1479.718343] [] ext4_lazyinit_thread+0x136/0x330
[ 1479.724514] [] ? init_once+0x80/0x80
[ 1479.729732] [] kthread+0xd8/0xf0
[ 1479.734607] [] ret_from_fork+0x1f/0x40
[ 1479.73] [] ? kthread_worker_fn+0x180/0x180
[ 1479.746082] INFO: task test.sh:2189 blocked for more than 120 seconds.
[ 1479.752599] Tainted: G U 4.8.2 #1
[ 1479.757557] "echo 0 >

Re: [RFC 1/6] qed: Add support for hardware offloaded iSCSI.

2016-10-19 Thread Hannes Reinecke

On 10/19/2016 07:01 AM, manish.rangan...@cavium.com wrote:
> From: Yuval Mintz 
> 
> This adds the backbone required for the various HW initalizations
> which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ
> 4 line of adapters - FW notification, resource initializations, etc.
> 
> Signed-off-by: Arun Easi 
> Signed-off-by: Yuval Mintz 
> ---
>  drivers/net/ethernet/qlogic/Kconfig|   15 +
>  drivers/net/ethernet/qlogic/qed/Makefile   |1 +
>  drivers/net/ethernet/qlogic/qed/qed.h  |8 +-
>  drivers/net/ethernet/qlogic/qed/qed_dev.c  |   15 +
>  drivers/net/ethernet/qlogic/qed/qed_int.h  |1 -
>  drivers/net/ethernet/qlogic/qed/qed_iscsi.c| 1310 
> 
>  drivers/net/ethernet/qlogic/qed/qed_iscsi.h|   52 +
>  drivers/net/ethernet/qlogic/qed/qed_l2.c   |1 -
>  drivers/net/ethernet/qlogic/qed/qed_ll2.c  |   35 +-
>  drivers/net/ethernet/qlogic/qed/qed_main.c |2 -
>  drivers/net/ethernet/qlogic/qed/qed_mcp.h  |6 -
>  drivers/net/ethernet/qlogic/qed/qed_reg_addr.h |2 +
>  drivers/net/ethernet/qlogic/qed/qed_spq.c  |   15 +
>  include/linux/qed/qed_if.h |2 +
>  include/linux/qed/qed_iscsi_if.h   |  249 +
>  15 files changed, 1692 insertions(+), 22 deletions(-)
>  create mode 100644 drivers/net/ethernet/qlogic/qed/qed_iscsi.c
>  create mode 100644 drivers/net/ethernet/qlogic/qed/qed_iscsi.h
>  create mode 100644 include/linux/qed/qed_iscsi_if.h
> 
> diff --git a/drivers/net/ethernet/qlogic/Kconfig 
> b/drivers/net/ethernet/qlogic/Kconfig
> index 0df1391f9..bad4fae 100644
> --- a/drivers/net/ethernet/qlogic/Kconfig
> +++ b/drivers/net/ethernet/qlogic/Kconfig
> @@ -118,4 +118,19 @@ config INFINIBAND_QEDR
> for QLogic QED. This would be replaced by the 'real' option
> once the QEDR driver is added [+relocated].
>  
> +config QED_ISCSI
> + bool
> +
> +config QEDI
> + tristate "QLogic QED 25/40/100Gb iSCSI driver"
> + depends on QED
> + select QED_LL2
> + select QED_ISCSI
> + default n
> + ---help---
> +   This provides a temporary node that allows the compilation
> +   and logical testing of the hardware offload iSCSI support
> +   for QLogic QED. This would be replaced by the 'real' option
> +   once the QEDI driver is added [+relocated].
> +
>  endif # NET_VENDOR_QLOGIC
> diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
> b/drivers/net/ethernet/qlogic/qed/Makefile
> index cda0af7..b76669c 100644
> --- a/drivers/net/ethernet/qlogic/qed/Makefile
> +++ b/drivers/net/ethernet/qlogic/qed/Makefile
> @@ -6,3 +6,4 @@ qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o 
> qed_init_ops.o \
>  qed-$(CONFIG_QED_SRIOV) += qed_sriov.o qed_vf.o
>  qed-$(CONFIG_QED_LL2) += qed_ll2.o
>  qed-$(CONFIG_INFINIBAND_QEDR) += qed_roce.o
> +qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o
> diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
> b/drivers/net/ethernet/qlogic/qed/qed.h
> index 653bb57..a61b1c0 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed.h
> @@ -35,6 +35,7 @@
>  
>  #define QED_WFQ_UNIT 100
>  
> +#define ISCSI_BDQ_ID(_port_id) (_port_id)
>  #define QED_WID_SIZE(1024)
>  #define QED_PF_DEMS_SIZE(4)
>  
> @@ -167,6 +168,7 @@ enum QED_RESOURCES {
>   QED_ILT,
>   QED_LL2_QUEUE,
>   QED_RDMA_STATS_QUEUE,
> + QED_CMDQS_CQS,
>   QED_MAX_RESC,
>  };
>  
> @@ -379,6 +381,7 @@ struct qed_hwfn {
>   boolusing_ll2;
>   struct qed_ll2_info *p_ll2_info;
>   struct qed_rdma_info*p_rdma_info;
> + struct qed_iscsi_info   *p_iscsi_info;
>   struct qed_pf_paramspf_params;
>  
>   bool b_rdma_enabled_in_prs;
> @@ -578,6 +581,8 @@ struct qed_dev {
>   /* Linux specific here */
>   struct  qede_dev*edev;
>   struct  pci_dev *pdev;
> + u32 flags;
> +#define QED_FLAG_STORAGE_STARTED (BIT(0))
>   int msg_enable;
>  
>   struct pci_params   pci_params;
> @@ -591,6 +596,7 @@ struct qed_dev {
>   union {
>   struct qed_common_cb_ops*common;
>   struct qed_eth_cb_ops   *eth;
> + struct qed_iscsi_cb_ops *iscsi;
>   } protocol_ops;
>   void*ops_cookie;
>  
> @@ -600,7 +606,7 @@ struct qed_dev {
>   struct qed_cb_ll2_info  *ll2;
>   u8  ll2_mac_address[ETH_ALEN];
>  #endif
> -
> + DECLARE_HASHTABLE(connections, 10);
>   const struct firmware   *firmware;
>  
>   u32 rdma_max_sge;
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
> b/drivers/net/ethernet/qlogic/qed/qed_dev.c
> index 754f6a9..a4234c0 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
> +++ b/drivers/net/ethernet

Re: [PATCH 04/10] mm: replace get_user_pages_locked() write/force parameters with gup_flags

2016-10-19 Thread Jan Kara

On Thu 13-10-16 01:20:14, Lorenzo Stoakes wrote:
> This patch removes the write and force parameters from get_user_pages_locked()
> and replaces them with a gup_flags parameter to make the use of FOLL_FORCE
> explicit in callers as use of this flag can result in surprising behaviour 
> (and
> hence bugs) within the mm subsystem.
> 
> Signed-off-by: Lorenzo Stoakes 

After our discussion the patch looks good to me. You can add:

Reviewed-by: Jan Kara 

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 07/11] dm: Fix a race condition related to stopping and starting queues

2016-10-19 Thread Christoph Hellwig

Looks good:

Reviewed-by: Christoph Hellwig 

FYI, I wonder how many of the blk_mq_stop_hw_queues do not need
the quiesce call.  In the long run it might be better to have
blk_mq_stop_hw_queues to stop an quiesce, and have a
__blk_mq_stop_hw_queues variant to just stop.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 3/6] qedi: Add QLogic FastLinQ offload iSCSI driver framework.

2016-10-19 Thread Johannes Thumshirn

On Wed, Oct 19, 2016 at 01:01:10AM -0400, manish.rangan...@cavium.com wrote:
> From: Manish Rangankar 
> 
> The QLogic FastLinQ Driver for iSCSI (qedi) is the iSCSI specific module
> for 41000 Series Converged Network Adapters by QLogic.
> 
> This patch consists of following changes:
>   - MAINTAINERS Makefile and Kconfig changes for qedi,
>   - PCI driver registration,
>   - iSCSI host level initialization,
>   - Debugfs and log level infrastructure.
> 
> Signed-off-by: Nilesh Javali 
> Signed-off-by: Adheer Chandravanshi 
> Signed-off-by: Chad Dupuis 
> Signed-off-by: Saurav Kashyap 
> Signed-off-by: Arun Easi 
> Signed-off-by: Manish Rangankar 
> ---

[...]

> +static inline void *qedi_get_task_mem(struct qed_iscsi_tid *info, u32 tid)
> +{
> + return (void *)(info->blocks[tid / info->num_tids_per_block] +
> + (tid % info->num_tids_per_block) * info->size);
> +}

Unnecessary cast here.


[...]

> +void
> +qedi_dbg_err(struct qedi_dbg_ctx *qedi, const char *func, u32 line,
> +  const char *fmt, ...)
> +{
> + va_list va;
> + struct va_format vaf;
> + char nfunc[32];
> +
> + memset(nfunc, 0, sizeof(nfunc));
> + memcpy(nfunc, func, sizeof(nfunc) - 1);
> +
> + va_start(va, fmt);
> +
> + vaf.fmt = fmt;
> + vaf.va = &va;
> +
> + if (likely(qedi) && likely(qedi->pdev))
> + pr_crit("[%s]:[%s:%d]:%d: %pV", dev_name(&qedi->pdev->dev),
> + nfunc, line, qedi->host_no, &vaf);
> + else
> + pr_crit("[:00:00.0]:[%s:%d]: %pV", nfunc, line, &vaf);

pr_crit, seriously?

[...]

> +static void qedi_int_fp(struct qedi_ctx *qedi)
> +{
> + struct qedi_fastpath *fp;
> + int id;
> +
> + memset((void *)qedi->fp_array, 0, MIN_NUM_CPUS_MSIX(qedi) *
> +sizeof(*qedi->fp_array));
> + memset((void *)qedi->sb_array, 0, MIN_NUM_CPUS_MSIX(qedi) *
> +sizeof(*qedi->sb_array));

I don't think the cast is necessary here.

[...]

> +static int qedi_setup_cid_que(struct qedi_ctx *qedi)
> +{
> + int i;
> +
> + qedi->cid_que.cid_que_base = kmalloc((qedi->max_active_conns *
> +   sizeof(u32)), GFP_KERNEL);
> + if (!qedi->cid_que.cid_que_base)
> + return -ENOMEM;
> +
> + qedi->cid_que.conn_cid_tbl = kmalloc((qedi->max_active_conns *
> +   sizeof(struct qedi_conn *)),
> +  GFP_KERNEL);

Please use kmalloc_array() here.

[...]

> +/* MSI-X fastpath handler code */
> +static irqreturn_t qedi_msix_handler(int irq, void *dev_id)
> +{
> + struct qedi_fastpath *fp = dev_id;
> + struct qedi_ctx *qedi = fp->qedi;
> + bool wake_io_thread = true;
> +
> + qed_sb_ack(fp->sb_info, IGU_INT_DISABLE, 0);
> +
> +process_again:
> + wake_io_thread = qedi_process_completions(fp);
> + if (wake_io_thread) {
> + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_DISC,
> +   "process already running\n");
> + }
> +
> + if (qedi_fp_has_work(fp) == 0)
> + qed_sb_update_sb_idx(fp->sb_info);
> +
> + /* Check for more work */
> + rmb();
> +
> + if (qedi_fp_has_work(fp) == 0)
> + qed_sb_ack(fp->sb_info, IGU_INT_ENABLE, 1);
> + else
> + goto process_again;
> +
> + return IRQ_HANDLED;
> +}

You might want to consider workqueues here.

[...]

> +static int qedi_alloc_itt(struct qedi_ctx *qedi)
> +{
> + qedi->itt_map = kzalloc((sizeof(struct qedi_itt_map) *
> + MAX_ISCSI_TASK_ENTRIES), GFP_KERNEL);

that screams for kcalloc()

> + if (!qedi->itt_map) {
> + QEDI_ERR(&qedi->dbg_ctx,
> +  "Unable to allocate itt map array memory\n");
> + return -ENOMEM;
> + }
> + return 0;
> +}
> +
> +static void qedi_free_itt(struct qedi_ctx *qedi)
> +{
> + kfree(qedi->itt_map);
> +}
> +
> +static struct qed_ll2_cb_ops qedi_ll2_cb_ops = {
> + .rx_cb = qedi_ll2_rx,
> + .tx_cb = NULL,
> +};
> +
> +static int qedi_percpu_io_thread(void *arg)
> +{
> + struct qedi_percpu_s *p = arg;
> + struct qedi_work *work, *tmp;
> + unsigned long flags;
> + LIST_HEAD(work_list);
> +
> + set_user_nice(current, -20);
> +
> + while (!kthread_should_stop()) {
> + spin_lock_irqsave(&p->p_work_lock, flags);
> + while (!list_empty(&p->work_list)) {
> + list_splice_init(&p->work_list, &work_list);
> + spin_unlock_irqrestore(&p->p_work_lock, flags);
> +
> + list_for_each_entry_safe(work, tmp, &work_list, list) {
> + list_del_init(&work->list);
> + qedi_fp_process_cqes(work->qedi, &work->cqe,
> +  work->que_idx);
> + kfree(work);
> + }
> + spin_lock_irqsa

Re: [PATCH 08/10] mm: replace __access_remote_vm() write parameter with gup_flags

2016-10-19 Thread Lorenzo Stoakes

On Wed, Oct 19, 2016 at 10:13:52AM +0200, Michal Hocko wrote:
> On Wed 19-10-16 09:59:03, Jan Kara wrote:
> > On Thu 13-10-16 01:20:18, Lorenzo Stoakes wrote:
> > > This patch removes the write parameter from __access_remote_vm() and 
> > > replaces it
> > > with a gup_flags parameter as use of this function previously _implied_
> > > FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
> > >
> > > We make this explicit as use of FOLL_FORCE can result in surprising 
> > > behaviour
> > > (and hence bugs) within the mm subsystem.
> > >
> > > Signed-off-by: Lorenzo Stoakes 
> >
> > So I'm not convinced this (and the following two patches) is actually
> > helping much. By grepping for FOLL_FORCE we will easily see that any caller
> > of access_remote_vm() gets that semantics and can thus continue search
>
> I am really wondering. Is there anything inherent that would require
> FOLL_FORCE for access_remote_vm? I mean FOLL_FORCE is a really
> non-trivial thing. It doesn't obey vma permissions so we should really
> minimize its usage. Do all of those users really need FOLL_FORCE?

I wonder about this also, for example by accessing /proc/self/mem you trigger
access_remote_vm() and consequently get_user_pages_remote() meaning FOLL_FORCE
is implied and you can use /proc/self/mem to override any VMA permissions. I
wonder if this is desirable behaviour or whether this ought to be limited to
ptrace system calls. Regardless, by making the flag more visible it makes it
easier to see that this is happening.

Note that this /proc/self/mem behaviour is what triggered the bug that brought
about this discussion in the first place -
https://marc.info/?l=linux-mm&m=147363447105059 - as using /proc/self/mem this
way on PROT_NONE memory broke some assumptions.

On the other hand I see your point Jan in that you know any caller of these
functions will have FOLL_FORCE implied, and you have to decide to stop passing
the flag at _some_ point in the stack, however it seems fairly sane to have that
stopping point exist outside of exported gup functions so the gup interface
forces explicitness but callers can wrap things as they need.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 2/6] qed: Add iSCSI out of order packet handling.

2016-10-19 Thread Johannes Thumshirn

On Wed, Oct 19, 2016 at 01:01:09AM -0400, manish.rangan...@cavium.com wrote:
> From: Yuval Mintz 
> 
> This patch adds out of order packet handling for hardware offloaded
> iSCSI. Out of order packet handling requires driver buffer allocation
> and assistance.
> 
> Signed-off-by: Arun Easi 
> Signed-off-by: Yuval Mintz 
> ---

[...]

> + if (IS_ENABLED(CONFIG_QEDI) &&
> + p_ll2_conn->conn_type == QED_LL2_TYPE_ISCSI_OOO) {

If you're going to implement the qed_is_iscsi_personallity() helper, please
consider a qed_ll2_is_iscsi_() as well.

> + struct qed_ooo_buffer *p_buffer;

[...]

> + while (cq_new_idx != cq_old_idx) {
> + struct core_rx_fast_path_cqe *p_cqe_fp;
> +
> + cqe = qed_chain_consume(&p_rx->rcq_chain);
> + cq_old_idx = qed_chain_get_cons_idx(&p_rx->rcq_chain);
> + cqe_type = cqe->rx_cqe_sp.type;
> +
> + if (cqe_type != CORE_RX_CQE_TYPE_REGULAR) {
> + DP_NOTICE(p_hwfn,
> +   "Got a non-regular LB LL2 completion [type 
> 0x%02x]\n",
> +   cqe_type);
> + return -EINVAL;
> + }
> + p_cqe_fp = &cqe->rx_cqe_fp;
> +
> + placement_offset = p_cqe_fp->placement_offset;
> + parse_flags = le16_to_cpu(p_cqe_fp->parse_flags.flags);
> + packet_length = le16_to_cpu(p_cqe_fp->packet_length);
> + vlan = le16_to_cpu(p_cqe_fp->vlan);
> + iscsi_ooo = (struct ooo_opaque *)&p_cqe_fp->opaque_data;
> + qed_ooo_save_history_entry(p_hwfn, p_hwfn->p_ooo_info,
> +iscsi_ooo);
> + cid = le32_to_cpu(iscsi_ooo->cid);
> +
> + /* Process delete isle first */
> + if (iscsi_ooo->drop_size)
> + qed_ooo_delete_isles(p_hwfn, p_hwfn->p_ooo_info, cid,
> +  iscsi_ooo->drop_isle,
> +  iscsi_ooo->drop_size);
> +
> + if (iscsi_ooo->ooo_opcode == TCP_EVENT_NOP)
> + continue;
> +
> + /* Now process create/add/join isles */
> + if (list_empty(&p_rx->active_descq)) {
> + DP_NOTICE(p_hwfn,
> +   "LL2 OOO RX chain has no submitted 
> buffers\n");
> + return -EIO;
> + }
> +
> + p_pkt = list_first_entry(&p_rx->active_descq,
> +  struct qed_ll2_rx_packet, list_entry);
> +
> + if ((iscsi_ooo->ooo_opcode == TCP_EVENT_ADD_NEW_ISLE) ||
> + (iscsi_ooo->ooo_opcode == TCP_EVENT_ADD_ISLE_RIGHT) ||
> + (iscsi_ooo->ooo_opcode == TCP_EVENT_ADD_ISLE_LEFT) ||
> + (iscsi_ooo->ooo_opcode == TCP_EVENT_ADD_PEN) ||
> + (iscsi_ooo->ooo_opcode == TCP_EVENT_JOIN)) {
> + if (!p_pkt) {
> + DP_NOTICE(p_hwfn,
> +   "LL2 OOO RX packet is not valid\n");
> + return -EIO;
> + }
> + list_del(&p_pkt->list_entry);
> + p_buffer = (struct qed_ooo_buffer *)p_pkt->cookie;
> + p_buffer->packet_length = packet_length;
> + p_buffer->parse_flags = parse_flags;
> + p_buffer->vlan = vlan;
> + p_buffer->placement_offset = placement_offset;
> + qed_chain_consume(&p_rx->rxq_chain);
> + list_add_tail(&p_pkt->list_entry, &p_rx->free_descq);
> +
> + switch (iscsi_ooo->ooo_opcode) {
> + case TCP_EVENT_ADD_NEW_ISLE:
> + qed_ooo_add_new_isle(p_hwfn,
> +  p_hwfn->p_ooo_info,
> +  cid,
> +  iscsi_ooo->ooo_isle,
> +  p_buffer);
> + break;
> + case TCP_EVENT_ADD_ISLE_RIGHT:
> + qed_ooo_add_new_buffer(p_hwfn,
> +p_hwfn->p_ooo_info,
> +cid,
> +iscsi_ooo->ooo_isle,
> +p_buffer,
> +QED_OOO_RIGHT_BUF);
> + break;
> + case TCP_EVENT_ADD_ISLE_LEFT:
> + qed_ooo_add_new_buffer(p_hwfn,
> +p_hwfn->p_ooo_info,
> +cid,
> +

Re: [PATCH v3 02/11] blk-mq: Introduce blk_mq_hctx_stopped()

2016-10-19 Thread Bart Van Assche


On 10/19/2016 06:19 AM, Christoph Hellwig wrote:

On Tue, Oct 18, 2016 at 02:49:09PM -0700, Bart Van Assche wrote:

Multiple functions test the BLK_MQ_S_STOPPED bit so introduce
a helper function that performs this test.


Looks sensible.  Any reason to have it in the public blk-mq.h instead
of the private one, though?  I see that dm is using it with this patch,
but that usage should go away once your full series is merged, right?


Hello Christoph,

Moving the blk_mq_hctx_stopped() declaration from the public to the 
private blk-mq.h header file should be possible. I will look into this.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] mm: replace __access_remote_vm() write parameter with gup_flags

2016-10-19 Thread Michal Hocko

On Wed 19-10-16 10:06:46, Lorenzo Stoakes wrote:
> On Wed, Oct 19, 2016 at 10:52:05AM +0200, Michal Hocko wrote:
> > yes this is the desirable and expected behavior.
> >
> > > wonder if this is desirable behaviour or whether this ought to be limited 
> > > to
> > > ptrace system calls. Regardless, by making the flag more visible it makes 
> > > it
> > > easier to see that this is happening.
> >
> > mem_open already enforces PTRACE_MODE_ATTACH
> 
> Ah I missed this, that makes a lot of sense, thanks!
> 
> I still wonder whether other invocations of access_remote_vm() in 
> fs/proc/base.c
> (the principle caller of this function) need FOLL_FORCE, for example the 
> various
> calls that simply read data from other processes, so I think the point stands
> about keeping this explicit.

I do agree. Making them explicit will help to clean them up later,
should there be a need.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/10] mm: adjust get_user_pages* functions to explicitly pass FOLL_* flags

2016-10-19 Thread Lorenzo Stoakes

On Tue, Oct 18, 2016 at 05:30:50PM +0200, Michal Hocko wrote:
> I am wondering whether we can go further. E.g. it is not really clear to
> me whether we need an explicit FOLL_REMOTE when we can in fact check
> mm != current->mm and imply that. Maybe there are some contexts which
> wouldn't work, I haven't checked.

This flag is set even when /proc/self/mem is used. I've not looked deeply into
this flag but perhaps accessing your own memory this way can be considered
'remote' since you're not accessing it directly. On the other hand, perhaps this
is just mistaken in this case?

> I guess there is more work in that area and I do not want to impose all
> that work on you, but I couldn't resist once I saw you playing in that
> area ;) Definitely a good start!

Thanks, I am more than happy to go as far down this rabbit hole as is helpful,
no imposition at all :)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/10] mm: adjust get_user_pages* functions to explicitly pass FOLL_* flags

2016-10-19 Thread Michal Hocko

On Wed 19-10-16 09:58:15, Lorenzo Stoakes wrote:
> On Tue, Oct 18, 2016 at 05:30:50PM +0200, Michal Hocko wrote:
> > I am wondering whether we can go further. E.g. it is not really clear to
> > me whether we need an explicit FOLL_REMOTE when we can in fact check
> > mm != current->mm and imply that. Maybe there are some contexts which
> > wouldn't work, I haven't checked.
> 
> This flag is set even when /proc/self/mem is used. I've not looked deeply into
> this flag but perhaps accessing your own memory this way can be considered
> 'remote' since you're not accessing it directly. On the other hand, perhaps 
> this
> is just mistaken in this case?

My understanding of the flag is quite limited as well. All I know it is
related to protection keys and it is needed to bypass protection check.
See arch_vma_access_permitted. See also 1b2ee1266ea6 ("mm/core: Do not
enforce PKEY permissions on remote mm access").

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 08/11] SRP transport: Move queuecommand() wait code to SCSI core

2016-10-19 Thread Christoph Hellwig

Looks fine,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/10] mm: replace get_vaddr_frames() write/force parameters with gup_flags

2016-10-19 Thread Jan Kara

On Thu 13-10-16 01:20:15, Lorenzo Stoakes wrote:
> This patch removes the write and force parameters from get_vaddr_frames() and
> replaces them with a gup_flags parameter to make the use of FOLL_FORCE 
> explicit
> in callers as use of this flag can result in surprising behaviour (and hence
> bugs) within the mm subsystem.
> 
> Signed-off-by: Lorenzo Stoakes 

Looks good. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  drivers/gpu/drm/exynos/exynos_drm_g2d.c|  3 ++-
>  drivers/media/platform/omap/omap_vout.c|  2 +-
>  drivers/media/v4l2-core/videobuf2-memops.c |  6 +-
>  include/linux/mm.h |  2 +-
>  mm/frame_vector.c  | 13 ++---
>  5 files changed, 11 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
> b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index aa92dec..fbd13fa 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -488,7 +488,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
> drm_device *drm_dev,
>   goto err_free;
>   }
>  
> - ret = get_vaddr_frames(start, npages, true, true, g2d_userptr->vec);
> + ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> + g2d_userptr->vec);
>   if (ret != npages) {
>   DRM_ERROR("failed to get user pages from userptr.\n");
>   if (ret < 0)
> diff --git a/drivers/media/platform/omap/omap_vout.c 
> b/drivers/media/platform/omap/omap_vout.c
> index e668dde..a31b95c 100644
> --- a/drivers/media/platform/omap/omap_vout.c
> +++ b/drivers/media/platform/omap/omap_vout.c
> @@ -214,7 +214,7 @@ static int omap_vout_get_userptr(struct videobuf_buffer 
> *vb, u32 virtp,
>   if (!vec)
>   return -ENOMEM;
>  
> - ret = get_vaddr_frames(virtp, 1, true, false, vec);
> + ret = get_vaddr_frames(virtp, 1, FOLL_WRITE, vec);
>   if (ret != 1) {
>   frame_vector_destroy(vec);
>   return -EINVAL;
> diff --git a/drivers/media/v4l2-core/videobuf2-memops.c 
> b/drivers/media/v4l2-core/videobuf2-memops.c
> index 3c3b517..1cd322e 100644
> --- a/drivers/media/v4l2-core/videobuf2-memops.c
> +++ b/drivers/media/v4l2-core/videobuf2-memops.c
> @@ -42,6 +42,10 @@ struct frame_vector *vb2_create_framevec(unsigned long 
> start,
>   unsigned long first, last;
>   unsigned long nr;
>   struct frame_vector *vec;
> + unsigned int flags = FOLL_FORCE;
> +
> + if (write)
> + flags |= FOLL_WRITE;
>  
>   first = start >> PAGE_SHIFT;
>   last = (start + length - 1) >> PAGE_SHIFT;
> @@ -49,7 +53,7 @@ struct frame_vector *vb2_create_framevec(unsigned long 
> start,
>   vec = frame_vector_create(nr);
>   if (!vec)
>   return ERR_PTR(-ENOMEM);
> - ret = get_vaddr_frames(start & PAGE_MASK, nr, write, true, vec);
> + ret = get_vaddr_frames(start & PAGE_MASK, nr, flags, vec);
>   if (ret < 0)
>   goto out_destroy;
>   /* We accept only complete set of PFNs */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 27ab538..5ff084f6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1305,7 +1305,7 @@ struct frame_vector {
>  struct frame_vector *frame_vector_create(unsigned int nr_frames);
>  void frame_vector_destroy(struct frame_vector *vec);
>  int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> -  bool write, bool force, struct frame_vector *vec);
> +  unsigned int gup_flags, struct frame_vector *vec);
>  void put_vaddr_frames(struct frame_vector *vec);
>  int frame_vector_to_pages(struct frame_vector *vec);
>  void frame_vector_to_pfns(struct frame_vector *vec);
> diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> index 81b6749..db77dcb 100644
> --- a/mm/frame_vector.c
> +++ b/mm/frame_vector.c
> @@ -11,10 +11,7 @@
>   * get_vaddr_frames() - map virtual addresses to pfns
>   * @start:   starting user address
>   * @nr_frames:   number of pages / pfns from start to map
> - * @write:   whether pages will be written to by the caller
> - * @force:   whether to force write access even if user mapping is
> - *   readonly. See description of the same argument of
> - get_user_pages().
> + * @gup_flags:   flags modifying lookup behaviour
>   * @vec: structure which receives pages / pfns of the addresses mapped.
>   *   It should have space for at least nr_frames entries.
>   *
> @@ -34,23 +31,17 @@
>   * This function takes care of grabbing mmap_sem as necessary.
>   */
>  int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> -  bool write, bool force, struct frame_vector *vec)
> +  unsigned int gup_flags, struct frame_vector *vec)
>  {
>   struct mm_struct *mm = current->mm;
>   struct vm_area_struct *vma;
>

Re: [PATCH v3 04/11] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-19 Thread Bart Van Assche


On 10/19/2016 06:23 AM, Christoph Hellwig wrote:

+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Additionally, it is not prevented that
+ * new queue_rq() calls occur unless the queue has been stopped first.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)


If this is intended to be a kerneldoc comment you need to document the 'q'
parameter.  If not you should drop the magic "/**" marker.


Good catch. I will document the 'q' parameter.


+static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+{
+   int srcu_idx;
+
+   WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
+   cpu_online(hctx->next_cpu));
+
+   if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
+   rcu_read_lock();
+   blk_mq_process_rq_list(hctx);
+   rcu_read_unlock();
+   } else {
+   srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
+   blk_mq_process_rq_list(hctx);
+   srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
+   }
+}


Can you document these synchronization changes in detail in the changelog?


Sure, I will do that.


+static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
+ struct request *rq, blk_qc_t *cookie)
+{
+   if (blk_mq_hctx_stopped(hctx) ||
+   blk_mq_direct_issue_request(rq, cookie) != 0)
+   blk_mq_insert_request(rq, false, true, true);
+}


Any reason not to merge this function with blk_mq_direct_issue_request?


That sounds like a good idea to me. I will make the proposed change.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/20] libfc: Replace ->rport_logoff callback with function call

2016-10-19 Thread Chad Dupuis



On Tue, 18 Oct 2016, 8:01am -, Hannes Reinecke wrote:

> The ->rport_logoff callback only ever had one implementation,
> so we can as well call it directly and drop the callback.
> 
> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/scsi/bnx2fc/bnx2fc_tgt.c |  3 +--
>  drivers/scsi/fcoe/fcoe_ctlr.c|  8 
>  drivers/scsi/libfc/fc_disc.c |  8 
>  drivers/scsi/libfc/fc_lport.c| 10 +-
>  drivers/scsi/libfc/fc_rport.c|  8 +++-
>  include/scsi/libfc.h |  9 +
>  6 files changed, 18 insertions(+), 28 deletions(-)
> 

Straightforward conversion.

Reviewed-by: Chad Dupuis 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 2/6] qed: Add iSCSI out of order packet handling.

2016-10-19 Thread Hannes Reinecke

On 10/19/2016 07:01 AM, manish.rangan...@cavium.com wrote:
> From: Yuval Mintz 
> 
> This patch adds out of order packet handling for hardware offloaded
> iSCSI. Out of order packet handling requires driver buffer allocation
> and assistance.
> 
> Signed-off-by: Arun Easi 
> Signed-off-by: Yuval Mintz 
> ---
>  drivers/net/ethernet/qlogic/qed/Makefile   |   2 +-
>  drivers/net/ethernet/qlogic/qed/qed.h  |   1 +
>  drivers/net/ethernet/qlogic/qed/qed_dev.c  |  14 +-
>  drivers/net/ethernet/qlogic/qed/qed_ll2.c  | 559 
> +++--
>  drivers/net/ethernet/qlogic/qed/qed_ll2.h  |   9 +
>  drivers/net/ethernet/qlogic/qed/qed_ooo.c  | 510 ++
>  drivers/net/ethernet/qlogic/qed/qed_ooo.h  | 116 ++
>  drivers/net/ethernet/qlogic/qed/qed_roce.c |   1 +
>  drivers/net/ethernet/qlogic/qed/qed_spq.c  |   9 +
>  9 files changed, 1195 insertions(+), 26 deletions(-)
>  create mode 100644 drivers/net/ethernet/qlogic/qed/qed_ooo.c
>  create mode 100644 drivers/net/ethernet/qlogic/qed/qed_ooo.h
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
> b/drivers/net/ethernet/qlogic/qed/Makefile
> index b76669c..9121bf0 100644
> --- a/drivers/net/ethernet/qlogic/qed/Makefile
> +++ b/drivers/net/ethernet/qlogic/qed/Makefile
> @@ -6,4 +6,4 @@ qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o 
> qed_init_ops.o \
>  qed-$(CONFIG_QED_SRIOV) += qed_sriov.o qed_vf.o
>  qed-$(CONFIG_QED_LL2) += qed_ll2.o
>  qed-$(CONFIG_INFINIBAND_QEDR) += qed_roce.o
> -qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o
> +qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o qed_ooo.o
> diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
> b/drivers/net/ethernet/qlogic/qed/qed.h
> index a61b1c0..e5626ae 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed.h
> +++ b/drivers/net/ethernet/qlogic/qed/qed.h
> @@ -380,6 +380,7 @@ struct qed_hwfn {
>   /* Protocol related */
>   boolusing_ll2;
>   struct qed_ll2_info *p_ll2_info;
> + struct qed_ooo_info *p_ooo_info;
>   struct qed_rdma_info*p_rdma_info;
>   struct qed_iscsi_info   *p_iscsi_info;
>   struct qed_pf_paramspf_params;
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
> b/drivers/net/ethernet/qlogic/qed/qed_dev.c
> index a4234c0..060e9a4 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
> @@ -32,6 +32,7 @@
>  #include "qed_iscsi.h"
>  #include "qed_ll2.h"
>  #include "qed_mcp.h"
> +#include "qed_ooo.h"
>  #include "qed_reg_addr.h"
>  #include "qed_sp.h"
>  #include "qed_sriov.h"
> @@ -157,8 +158,10 @@ void qed_resc_free(struct qed_dev *cdev)
>   qed_ll2_free(p_hwfn, p_hwfn->p_ll2_info);
>  #endif
>   if (IS_ENABLED(CONFIG_QEDI) &&
> - p_hwfn->hw_info.personality == QED_PCI_ISCSI)
> + p_hwfn->hw_info.personality == QED_PCI_ISCSI) {
>   qed_iscsi_free(p_hwfn, p_hwfn->p_iscsi_info);
> + qed_ooo_free(p_hwfn, p_hwfn->p_ooo_info);
> + }
>   qed_iov_free(p_hwfn);
>   qed_dmae_info_free(p_hwfn);
>   qed_dcbx_info_free(p_hwfn, p_hwfn->p_dcbx_info);
> @@ -416,6 +419,7 @@ int qed_qm_reconf(struct qed_hwfn *p_hwfn, struct qed_ptt 
> *p_ptt)
>  int qed_resc_alloc(struct qed_dev *cdev)
>  {
>   struct qed_iscsi_info *p_iscsi_info;
> + struct qed_ooo_info *p_ooo_info;
>  #ifdef CONFIG_QED_LL2
>   struct qed_ll2_info *p_ll2_info;
>  #endif
> @@ -543,6 +547,10 @@ int qed_resc_alloc(struct qed_dev *cdev)
>   if (!p_iscsi_info)
>   goto alloc_no_mem;
>   p_hwfn->p_iscsi_info = p_iscsi_info;
> + p_ooo_info = qed_ooo_alloc(p_hwfn);
> + if (!p_ooo_info)
> + goto alloc_no_mem;
> + p_hwfn->p_ooo_info = p_ooo_info;
>   }
>  
>   /* DMA info initialization */
> @@ -598,8 +606,10 @@ void qed_resc_setup(struct qed_dev *cdev)
>   qed_ll2_setup(p_hwfn, p_hwfn->p_ll2_info);
>  #endif
>   if (IS_ENABLED(CONFIG_QEDI) &&
> - p_hwfn->hw_info.personality == QED_PCI_ISCSI)
> + p_hwfn->hw_info.personality == QED_PCI_ISCSI) {
>   qed_iscsi_setup(p_hwfn, p_hwfn->p_iscsi_info);
> + qed_ooo_setup(p_hwfn, p_hwfn->p_ooo_info);
> + }
>   }
>  }
>  
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c 
> b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> index e67f3c9..4ce12e9 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> @@ -36,6 +36,7 @@
>  #include "qed_int.h"
>  #include "qed_ll2.h"
>  #include "qed_mcp.h"
> +#include "qed_ooo.h"
>  #include "qed_reg_addr.h"
>  #include "qed

Re: [PATCH 13/20] libfc: Remove fc_rport_init()

2016-10-19 Thread Chad Dupuis



On Tue, 18 Oct 2016, 8:01am -, Hannes Reinecke wrote:

> Function is empty now and can be removed.
> 
> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/scsi/bnx2fc/bnx2fc_fcoe.c |  1 -
>  drivers/scsi/fcoe/fcoe_ctlr.c |  1 -
>  drivers/scsi/libfc/fc_rport.c | 10 --
>  include/scsi/libfc.h  |  1 -
>  4 files changed, 13 deletions(-)
> 
> diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c 
> b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
> index f9ddb61..0990130 100644
> --- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
> +++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
> @@ -970,7 +970,6 @@ static int bnx2fc_libfc_config(struct fc_lport *lport)
>   sizeof(struct libfc_function_template));
>   fc_elsct_init(lport);
>   fc_exch_init(lport);
> - fc_rport_init(lport);
>   fc_disc_init(lport);
>   fc_disc_config(lport, lport);
>   return 0;
> diff --git a/drivers/scsi/fcoe/fcoe_ctlr.c b/drivers/scsi/fcoe/fcoe_ctlr.c
> index 12efc1d..cea57e2 100644
> --- a/drivers/scsi/fcoe/fcoe_ctlr.c
> +++ b/drivers/scsi/fcoe/fcoe_ctlr.c
> @@ -3235,7 +3235,6 @@ int fcoe_libfc_config(struct fc_lport *lport, struct 
> fcoe_ctlr *fip,
>   fc_exch_init(lport);
>   fc_elsct_init(lport);
>   fc_lport_init(lport);
> - fc_rport_init(lport);
>   fc_disc_init(lport);
>   fcoe_ctlr_mode_set(lport, fip, fip->mode);
>   return 0;
> diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c
> index 6e50226..110a707 100644
> --- a/drivers/scsi/libfc/fc_rport.c
> +++ b/drivers/scsi/libfc/fc_rport.c
> @@ -2182,16 +2182,6 @@ void fc_rport_flush_queue(void)
>  EXPORT_SYMBOL(fc_rport_flush_queue);
>  
>  /**
> - * fc_rport_init() - Initialize the remote port layer for a local port
> - * @lport: The local port to initialize the remote port layer for
> - */
> -int fc_rport_init(struct fc_lport *lport)
> -{
> - return 0;
> -}
> -EXPORT_SYMBOL(fc_rport_init);
> -
> -/**
>   * fc_rport_fcp_prli() - Handle incoming PRLI for the FCP initiator.
>   * @rdata: remote port private
>   * @spp_len: service parameter page length
> diff --git a/include/scsi/libfc.h b/include/scsi/libfc.h
> index 57630c5..a776901 100644
> --- a/include/scsi/libfc.h
> +++ b/include/scsi/libfc.h
> @@ -978,7 +978,6 @@ static inline bool fc_fcp_is_read(const struct fc_fcp_pkt 
> *fsp)
>  /*
>   * REMOTE PORT LAYER
>   */
> -int fc_rport_init(struct fc_lport *);
>  void fc_rport_terminate_io(struct fc_rport *);
>  struct fc_rport_priv *fc_rport_lookup(const struct fc_lport *lport,
> u32 port_id);
> 

Reviewed-by: Chad Dupuis 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/10] mm: adjust get_user_pages* functions to explicitly pass FOLL_* flags

2016-10-19 Thread Dave Hansen

On 10/19/2016 02:07 AM, Michal Hocko wrote:
> On Wed 19-10-16 09:58:15, Lorenzo Stoakes wrote:
>> On Tue, Oct 18, 2016 at 05:30:50PM +0200, Michal Hocko wrote:
>>> I am wondering whether we can go further. E.g. it is not really clear to
>>> me whether we need an explicit FOLL_REMOTE when we can in fact check
>>> mm != current->mm and imply that. Maybe there are some contexts which
>>> wouldn't work, I haven't checked.
>>
>> This flag is set even when /proc/self/mem is used. I've not looked deeply 
>> into
>> this flag but perhaps accessing your own memory this way can be considered
>> 'remote' since you're not accessing it directly. On the other hand, perhaps 
>> this
>> is just mistaken in this case?
> 
> My understanding of the flag is quite limited as well. All I know it is
> related to protection keys and it is needed to bypass protection check.
> See arch_vma_access_permitted. See also 1b2ee1266ea6 ("mm/core: Do not
> enforce PKEY permissions on remote mm access").

Yeah, we need the flag to tell us when PKEYs should be applied or not.
The current task's PKRU (pkey rights register) should really only be
used to impact access to the task's memory, but has no bearing on how a
given task should access remote memory.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware

2016-10-19 Thread Ching Huang

Hi Tomas,

SCSI command checking in queuecommand function of arcmsr can be removed safely. 
Now driver can pass all scsi command to controller firmware.

diff -uprN a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
--- a/drivers/scsi/arcmsr/arcmsr_hba.c  2016-10-19 16:18:45.0 +0800
+++ b/drivers/scsi/arcmsr/arcmsr_hba.c  2016-10-19 16:31:59.665524699 +0800
@@ -2623,13 +2623,6 @@ static int arcmsr_queue_command_lck(stru
cmd->scsi_done = done;
cmd->host_scribble = NULL;
cmd->result = 0;
-   if ((scsicmd == SYNCHRONIZE_CACHE) ||(scsicmd == SEND_DIAGNOSTIC)){
-   if(acb->devstate[target][lun] == ARECA_RAID_GONE) {
-   cmd->result = (DID_NO_CONNECT << 16);
-   }
-   cmd->scsi_done(cmd);
-   return 0;
-   }
if (target == 16) {
/* virtual device for iop message transfer */
arcmsr_handle_virtual_command(acb, cmd);

Thanks
Ching
On Tue, 2016-10-18 at 15:56 +0200, Tomas Henzl wrote:
> Hi,
> 
> similar suspicious code path can be found in the queuecommand functions in 
> other drivers too
> these are -
> pmcraid.c
> arcmsr_hba.c
> cc-ing maintainers - 
> (but both drivers seem to be unmaintained for a while -
> I've added Ching for arcmsr and Raghava for pmcraid)
> 
> please read this thread and verify that your driver and firmware is safe.
> 
> It is expected that a raid card controls the disk write cache (switches it 
> off)
> but this might not be true for all modes of operation a raid adapter handles
> - pass through/non-RAID etc. In such case when disk write cache is enabled
> and your driver skips the SYNCHRONIZE_CACHE command a FS corruption
> is very much possible during unexpected power loss or even a clean shutdown.
> 
> tomash
> 
> On 17.10.2016 19:51, James Bottomley wrote:
> > On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote:
> >>> -Original Message-
> >>> From: James Bottomley [mailto:j...@linux.vnet.ibm.com]
> >>> Sent: Monday, October 17, 2016 10:50 PM
> >>> To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; 
> >>> linux-scsi@vger.kernel.org
> >>> Cc: martin.peter...@oracle.com; the...@redhat.com;
> >>> kashyap.de...@broadcom.com; Christoph Hellwig; Martin K. Petersen;
> >>> Jeff
> >>> Moyer; Gris Ge; Ewan Milne; Jens Axboe
> >>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE
> >>> command
> >>> to firmware
> >>>
> >>> On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote:
>  On 10/17/2016 12:20 PM, James Bottomley wrote:
> > On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote:
> >> On 10/17/2016 07:34 AM, Hannes Reinecke wrote:
> >>> On 10/17/2016 12:24 PM, Sumit Saxena wrote:
>  megaraid_sas driver returns SYNCHRONIZE_CACHE command
>  back to
>  SCSI layer without sending it to firmware as firmware
>  takes
>  care of flushing cache.  This patch will change the
>  driver
>  behavior wrt SYNCHRONIZE_CACHE handling. If underlying
>  firmware has support to handle the SYNCHORNIZE_CACHE,
>  driver
>  will send it for firmware otherwise complete it back to
>  SCSI
>  layer with SUCCESS immediately.  If  Firmware  handle
>  SYNCHORNIZE_CACHE for both VD and JBOD
>  "canHandleSyncCache"
>  bit in scratch pad register(offset
>  0x00B4) will be set.
> 
>  This behavior can be controlled via module parameter and
>  user
>  can fallback to old behavior of returning
>  SYNCHRONIZE_CACHE by
>  driver only without sending it to firmware.
> 
>  Signed-off-by: Sumit Saxena 
>  Signed-off-by: Kashyap Desai 
>  ---
> drivers/scsi/megaraid/megaraid_sas.h|  3 +++
> drivers/scsi/megaraid/megaraid_sas_base.c   | 14
>  ++---
>  -
> drivers/scsi/megaraid/megaraid_sas_fusion.c |  3 +++
> drivers/scsi/megaraid/megaraid_sas_fusion.h |  2 ++
> 4 files changed, 14 insertions(+), 8 deletions(-)
> 
>  diff --git a/drivers/scsi/megaraid/megaraid_sas.h
>  b/drivers/scsi/megaraid/megaraid_sas.h
>  index ca86c88..43fd14f 100644
>  --- a/drivers/scsi/megaraid/megaraid_sas.h
>  +++ b/drivers/scsi/megaraid/megaraid_sas.h
>  @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT {
> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT14
> #define MR_MAX_MSIX_REG_ARRAY   16
> #define MR_RDPQ_MODE_OFFSET   0X0
>  0800
>  000
>  +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0
>  X010
>  
>  0
>  +
> /*
> * register set for both 1068 and 1078 controllers
> * structure extended for 1078 registers @@ -2140,6
>  +2142,7
> >>>

Re: iscsi_trx going into D state

2016-10-19 Thread Robert LeBlanc

Nicholas,

I didn't have high hopes for the patch because we were not seeing
TMR_ABORT_TASK (or 'abort') in dmesg or /var/log/messages, but it
seemed to help regardless. Our clients finally OOMed from the hung
sessions, so we are having to reboot them and we will do some more
testing. We haven't put the updated kernel on our clients yet. Our
clients have iSCSI root disks so I'm not sure if we can get a vmcore
on those, but we will do what we can to get you a vmcore from the
target if it happens again.

As far as our configuration: It is a superMicro box with 6 SAMSUNG
MZ7LM3T8HCJM-5 SSDs. Two are for root and four are in mdadm
RAID-10 for exporting via iSCSI/iSER. We have ZFS on top of the
RAID-10 for checksum and snapshots only and we export ZVols to the
clients (one or more per VM on the client). We do not persist the
export info (targetcli saveconfig), but regenerate it from scripts.
The client receives two or more of these exports and puts them in a
RAID-1 device. The exports are served by iSER one one port and also by
normal iSCSI on a different port for compatibility, but not normally
used. If you need more info about the config, please let me know. It
was kind of a vague request so I'm not sure what exactly is important
to you.

Thanks for helping us with this,
Robert LeBlanc

When we have problems, we usually see this in the logs:
Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login timeout on
Network Portal 0.0.0.0:3260
Oct 17 08:57:50 prv-0-12-sanstack kernel: Unexpected ret: -104 send data 48
Oct 17 08:57:50 prv-0-12-sanstack kernel: tx_data returned -32, expecting 48.
Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login negotiation failed.

I found some backtraces in the logs, not sure if this is helpful, this
is before your patch (your patch booted at Oct 18 10:36:59):
Oct 17 15:43:12 prv-0-12-sanstack kernel: INFO: rcu_sched
self-detected stall on CPU
Oct 17 15:43:12 prv-0-12-sanstack kernel: #0115-...: (41725 ticks this
GP) idle=b59/141/0 softirq=535/535 fqs=30992
Oct 17 15:43:12 prv-0-12-sanstack kernel: #011 (t=42006 jiffies g=1550
c=1549 q=0)
Oct 17 15:43:12 prv-0-12-sanstack kernel: Task dump for CPU 5:
Oct 17 15:43:12 prv-0-12-sanstack kernel: kworker/u68:2   R  running
task0 17967  2 0x0008
Oct 17 15:43:12 prv-0-12-sanstack kernel: Workqueue: isert_comp_wq
isert_cq_work [ib_isert]
Oct 17 15:43:12 prv-0-12-sanstack kernel: 883f4c0dca80
af8ca7a4 883f7fb43da8 810ac83f
Oct 17 15:43:12 prv-0-12-sanstack kernel: 0005
81adb680 883f7fb43dc0 810af179
Oct 17 15:43:12 prv-0-12-sanstack kernel: 0006
883f7fb43df0 810e1c10 883f7fb57b80
Oct 17 15:43:12 prv-0-12-sanstack kernel: Call Trace:
Oct 17 15:43:12 prv-0-12-sanstack kernel:   []
sched_show_task+0xaf/0x110
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
dump_cpu_task+0x39/0x40
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
rcu_dump_cpu_stacks+0x80/0xb0
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
rcu_check_callbacks+0x540/0x820
Oct 17 15:43:12 prv-0-12-sanstack kernel: [] ?
account_system_time+0x81/0x110
Oct 17 15:43:12 prv-0-12-sanstack kernel: [] ?
tick_sched_do_timer+0x50/0x50
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
update_process_times+0x39/0x60
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
tick_sched_handle.isra.17+0x25/0x60
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
tick_sched_timer+0x3d/0x70
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
__hrtimer_run_queues+0x102/0x290
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
hrtimer_interrupt+0xa8/0x1a0
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
local_apic_timer_interrupt+0x35/0x60
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
smp_apic_timer_interrupt+0x3d/0x50
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
apic_timer_interrupt+0x87/0x90
Oct 17 15:43:12 prv-0-12-sanstack kernel:   []
? console_unlock+0x41e/0x4e0
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
vprintk_emit+0x2fc/0x500
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
vprintk_default+0x1f/0x30
Oct 17 15:43:12 prv-0-12-sanstack kernel: [] printk+0x5d/0x74
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
transport_lookup_cmd_lun+0x1d1/0x200
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
iscsit_setup_scsi_cmd+0x230/0x540
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
isert_rx_do_work+0x3f3/0x7f0 [ib_isert]
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
isert_cq_work+0x184/0x770 [ib_isert]
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
process_one_work+0x14f/0x400
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
worker_thread+0x114/0x470
Oct 17 15:43:12 prv-0-12-sanstack kernel: [] ?
__schedule+0x34a/0x7f0
Oct 17 15:43:12 prv-0-12-sanstack kernel: [] ?
rescuer_thread+0x310/0x310
Oct 17 15:43:12 prv-0-12-sanstack kernel: [] kthread+0xd8/0xf0
Oct 17 15:43:12 prv-0-12-sanstack kernel: [] ?
kthread_park+0x60/0x60
Oct 17 15:43:12 prv-0-12-sanstack kernel: []
ret_from_fork+0x3f/0x70
Oct 17 15:43:12 prv-0-12-sanstack kernel: [] ?
kthread_park+0x60/0x60

Re: [PATCH 00/10] mm: adjust get_user_pages* functions to explicitly pass FOLL_* flags

2016-10-19 Thread Michal Hocko

On Wed 19-10-16 09:49:43, Dave Hansen wrote:
> On 10/19/2016 02:07 AM, Michal Hocko wrote:
> > On Wed 19-10-16 09:58:15, Lorenzo Stoakes wrote:
> >> On Tue, Oct 18, 2016 at 05:30:50PM +0200, Michal Hocko wrote:
> >>> I am wondering whether we can go further. E.g. it is not really clear to
> >>> me whether we need an explicit FOLL_REMOTE when we can in fact check
> >>> mm != current->mm and imply that. Maybe there are some contexts which
> >>> wouldn't work, I haven't checked.
> >>
> >> This flag is set even when /proc/self/mem is used. I've not looked deeply 
> >> into
> >> this flag but perhaps accessing your own memory this way can be considered
> >> 'remote' since you're not accessing it directly. On the other hand, 
> >> perhaps this
> >> is just mistaken in this case?
> > 
> > My understanding of the flag is quite limited as well. All I know it is
> > related to protection keys and it is needed to bypass protection check.
> > See arch_vma_access_permitted. See also 1b2ee1266ea6 ("mm/core: Do not
> > enforce PKEY permissions on remote mm access").
> 
> Yeah, we need the flag to tell us when PKEYs should be applied or not.
> The current task's PKRU (pkey rights register) should really only be
> used to impact access to the task's memory, but has no bearing on how a
> given task should access remote memory.

The question I had earlier was whether this has to be an explicit FOLL
flag used by g-u-p users or we can just use it internally when mm !=
current->mm

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware

2016-10-19 Thread Tomas Henzl

On 19.10.2016 11:50, Ching Huang wrote:
> Hi Tomas,
>
> SCSI command checking in queuecommand function of arcmsr can be removed 
> safely. 
> Now driver can pass all scsi command to controller firmware.

thanks for your quick reply. Is this (passing the SYNCHRONIZE_CACHE command
to firmware) safe for all arcmsr cards, even the oldest?

Please start with your patch a new thread and add your 'Signed-off-by:' line.

>
> diff -uprN a/drivers/scsi/arcmsr/arcmsr_hba.c 
> b/drivers/scsi/arcmsr/arcmsr_hba.c
> --- a/drivers/scsi/arcmsr/arcmsr_hba.c2016-10-19 16:18:45.0 
> +0800
> +++ b/drivers/scsi/arcmsr/arcmsr_hba.c2016-10-19 16:31:59.665524699 
> +0800
> @@ -2623,13 +2623,6 @@ static int arcmsr_queue_command_lck(stru
>   cmd->scsi_done = done;
>   cmd->host_scribble = NULL;
>   cmd->result = 0;
> - if ((scsicmd == SYNCHRONIZE_CACHE) ||(scsicmd == SEND_DIAGNOSTIC)){
> - if(acb->devstate[target][lun] == ARECA_RAID_GONE) {
> - cmd->result = (DID_NO_CONNECT << 16);
> - }
> - cmd->scsi_done(cmd);
> - return 0;
> - }
>   if (target == 16) {
>   /* virtual device for iop message transfer */
>   arcmsr_handle_virtual_command(acb, cmd);
>
> Thanks
> Ching
> On Tue, 2016-10-18 at 15:56 +0200, Tomas Henzl wrote:
>> Hi,
>>
>> similar suspicious code path can be found in the queuecommand functions in 
>> other drivers too
>> these are -
>> pmcraid.c
>> arcmsr_hba.c
>> cc-ing maintainers - 
>> (but both drivers seem to be unmaintained for a while -
>> I've added Ching for arcmsr and Raghava for pmcraid)
>>
>> please read this thread and verify that your driver and firmware is safe.
>>
>> It is expected that a raid card controls the disk write cache (switches it 
>> off)
>> but this might not be true for all modes of operation a raid adapter handles
>> - pass through/non-RAID etc. In such case when disk write cache is enabled
>> and your driver skips the SYNCHRONIZE_CACHE command a FS corruption
>> is very much possible during unexpected power loss or even a clean shutdown.
>>
>> tomash
>>
>> On 17.10.2016 19:51, James Bottomley wrote:
>>> On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote:
> -Original Message-
> From: James Bottomley [mailto:j...@linux.vnet.ibm.com]
> Sent: Monday, October 17, 2016 10:50 PM
> To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; 
> linux-scsi@vger.kernel.org
> Cc: martin.peter...@oracle.com; the...@redhat.com;
> kashyap.de...@broadcom.com; Christoph Hellwig; Martin K. Petersen;
> Jeff
> Moyer; Gris Ge; Ewan Milne; Jens Axboe
> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE
> command
> to firmware
>
> On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote:
>> On 10/17/2016 12:20 PM, James Bottomley wrote:
>>> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote:
 On 10/17/2016 07:34 AM, Hannes Reinecke wrote:
> On 10/17/2016 12:24 PM, Sumit Saxena wrote:
>> megaraid_sas driver returns SYNCHRONIZE_CACHE command
>> back to
>> SCSI layer without sending it to firmware as firmware
>> takes
>> care of flushing cache.  This patch will change the
>> driver
>> behavior wrt SYNCHRONIZE_CACHE handling. If underlying
>> firmware has support to handle the SYNCHORNIZE_CACHE,
>> driver
>> will send it for firmware otherwise complete it back to
>> SCSI
>> layer with SUCCESS immediately.  If  Firmware  handle
>> SYNCHORNIZE_CACHE for both VD and JBOD
>> "canHandleSyncCache"
>> bit in scratch pad register(offset
>> 0x00B4) will be set.
>>
>> This behavior can be controlled via module parameter and
>> user
>> can fallback to old behavior of returning
>> SYNCHRONIZE_CACHE by
>> driver only without sending it to firmware.
>>
>> Signed-off-by: Sumit Saxena 
>> Signed-off-by: Kashyap Desai 
>> ---
>>drivers/scsi/megaraid/megaraid_sas.h|  3 +++
>>drivers/scsi/megaraid/megaraid_sas_base.c   | 14
>> ++---
>> -
>>drivers/scsi/megaraid/megaraid_sas_fusion.c |  3 +++
>>drivers/scsi/megaraid/megaraid_sas_fusion.h |  2 ++
>>4 files changed, 14 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h
>> b/drivers/scsi/megaraid/megaraid_sas.h
>> index ca86c88..43fd14f 100644
>> --- a/drivers/scsi/megaraid/megaraid_sas.h
>> +++ b/drivers/scsi/megaraid/megaraid_sas.h
>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT {
>>#define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT14
>>#define MR_MAX_MSIX_REG_ARRAY   16
>>#define MR_RDPQ_MODE_OFFSET   0X0
>>

[PATCH RFC] fcping: allow pings using FC_BSG_RPT_ELS

2016-10-19 Thread Johannes Thumshirn

fcping uses FC-ELS to perform pings. FC ELS is a host as well as rport
service, so allow the rport version of the BSG ioctl() as well as the
host version.

Signed-off-by: Johannes Thumshirn 
---
 fcping.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/fcping.c b/fcping.c
index bf2bc0f..3bc7fe6 100644
--- a/fcping.c
+++ b/fcping.c
@@ -471,6 +471,7 @@ fp_find_hba(void)
char *endptr;
uint32_t fcid;
int hba_cnt;
+   int hostno, chan, num;
 
hba_cnt = get_number_of_adapters();
if (!hba_cnt)
@@ -495,6 +496,16 @@ fp_find_hba(void)
if (*endptr != '\0')
SA_LOG_EXIT("invalid hba name %s", fp_hba);
host = strdup(fp_hba);
+   } else if (sscanf(fp_hba, "rport-%d:%d-%d", &hostno, &chan, &num)) {
+   char *rport = strdup(fp_hba);
+   port_attrs = get_rport_attribs(rport);
+   if (!port_attrs) {
+   free(rport);
+   return 0;
+   }
+   strncpy(fp_dev, rport, sizeof(fp_dev));
+   free(rport);
+   return 1;
} else if (strstr(fp_hba, ":")) {
if (strlen(fp_hba) == strlen("xx:yy:aa:bb:cc:dd:ee:ff")) {
fc_wwn_t wwn1;
@@ -826,10 +837,16 @@ send_els_echo(int fp_fd, void *fp_buf, uint32_t fp_len,
 {
struct fc_bsg_request cdb;
char sense[MAX_SENSE_LEN];
+   char *rport;
struct sg_io_v4 sg_io;
int rc;
 
-   cdb.msgcode = FC_BSG_HST_ELS_NOLOGIN;
+   rport = get_rport_by_fcid(fp_did);
+   if (rport)
+   cdb.msgcode = FC_BSG_RPT_ELS;
+   else
+   cdb.msgcode = FC_BSG_HST_ELS_NOLOGIN;
+
cdb.rqst_data.h_els.command_code = ELS_ECHO;
hton24(cdb.rqst_data.h_els.port_id, fp_did);
 
-- 
2.10.0

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 01/10] phy: qcom-ufs: remove failure when rx/tx_iface_clk are absent

2016-10-19 Thread Vivek Gautam

Hi,


On Wed, Oct 19, 2016 at 2:48 AM, Stephen Boyd  wrote:
> On 10/18/2016 07:28 AM, Vivek Gautam wrote:
>> From: Yaniv Gardi 
>>
>> Since in future UFS Phy's the tx_iface_clk and rx_iface_clk
>> are no longer exist, we should not fail when their initialization
>> fail, but rather just report with debug message.
>>
>> Signed-off-by: Yaniv Gardi 
>> Signed-off-by: Vivek Gautam 
>> ---
>
> Shouldn't we have a different compatible string on future UFS phys so
> that we know which number of clks and what clks are required? That's how
> we typically handle clk configurations changing. Making them optional
> should really only be needed when they're really optional, i.e. things
> will work fine if they're there or not.

Correct. It makes sense to have different compatible strings for different
versions.
I will gather more information about previous versions that required
this clock, and update as suggested.


Regards
Vivek


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/10] mm: adjust get_user_pages* functions to explicitly pass FOLL_* flags

2016-10-19 Thread Dave Hansen

On 10/19/2016 10:01 AM, Michal Hocko wrote:
> The question I had earlier was whether this has to be an explicit FOLL
> flag used by g-u-p users or we can just use it internally when mm !=
> current->mm

The reason I chose not to do that was that deferred work gets run under
a basically random 'current'.  If we just use 'mm != current->mm', then
the deferred work will sometimes have pkeys enforced and sometimes not,
basically randomly.

We want to be consistent with whether they are enforced or not, so we
explicitly indicate that by calling the remote variant vs. plain.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/20] libfc callback cleanup

2016-10-19 Thread Chad Dupuis


On Tue, 18 Oct 2016, 8:01am -, Hannes Reinecke wrote:

> Hi all,
> 
> it has been bugging me for a while that libfc has tons of callbacks
> in an attempt to abstract things away. But as it turned out
> no-one ever used those, rendering them quite pointless.
> As I got some time to kill on the train back from Vienna I've decided
> to finally kill them and call the functions directly.
> 
> As usual, comments and reviews are welcome.
> 
> Hannes Reinecke (20):
>   libfc: Replace ->seq_els_rsp_send callback with function call
>   libfc: Replace ->lport_reset callback with function call
>   libfc: Replace ->lport_recv with function call
>   libfc: Replace ->exch_seq_send callback with function call
>   libfc: Replace ->rport_destroy callback with function call
>   libfc: Replace ->rport_lookup callback with function call
>   scsi_transport_fc: rename 'fc_rport_create' to 'fc_remote_port_create'
>   libfc: Replace ->rport_create callback with function call
>   libfc: Replace ->rport_login callback with function call
>   libfc: Replace ->rport_logoff callback with function call
>   libfc: Replace ->rport_recv_req callback with function call
>   libfc: Replace ->rport_flush_queue callback with function call
>   libfc: Remove fc_rport_init()
>   libfc: Replace ->seq_send callback with function call
>   libfc: Replace ->seq_exch_abort callback with function call
>   libfc: Replace ->exch_done callback with function call
>   libfc: Replace ->seq_start_next callback with function call
>   libfc: Replace ->seq_set_resp callback with direct function call
>   libfc: Replace ->seq_assign callback with function call
>   libfc: Replace ->seq_release callback with function call
> 
>  drivers/scsi/bnx2fc/bnx2fc_fcoe.c |   1 -
>  drivers/scsi/bnx2fc/bnx2fc_tgt.c  |   3 +-
>  drivers/scsi/fcoe/fcoe_ctlr.c |  35 ---
>  drivers/scsi/fnic/fnic_scsi.c |   2 +-
>  drivers/scsi/libfc/fc_disc.c  |  33 ---
>  drivers/scsi/libfc/fc_elsct.c |   2 +-
>  drivers/scsi/libfc/fc_exch.c  | 104 ++--
>  drivers/scsi/libfc/fc_fcp.c   |  45 -
>  drivers/scsi/libfc/fc_libfc.c |   2 +-
>  drivers/scsi/libfc/fc_lport.c |  62 ++--
>  drivers/scsi/libfc/fc_rport.c | 167 ++--
>  drivers/scsi/scsi_transport_fc.c  |   8 +-
>  drivers/target/tcm_fc/tfc_cmd.c   |  20 ++--
>  drivers/target/tcm_fc/tfc_io.c|   4 +-
>  include/scsi/libfc.h  | 195 
> ++
>  15 files changed, 254 insertions(+), 429 deletions(-)
> 

Hannes, thanks for doing this.  Should make the code much more readable. 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] ufs-qcom: phy/hcd: Refactoring phy clock handling

2016-10-19 Thread Vivek Gautam

Hi,


On Wed, Oct 19, 2016 at 1:43 AM, Subhash Jadavani
 wrote:
> On 2016-10-18 07:28, Vivek Gautam wrote:
>>
>> Add phy clock enable code to phy_power_on/off callbacks, and
>> remove explicit calls to enable these phy clocks from the
>> ufs-qcom hcd driver.
>>
>> Signed-off-by: Vivek Gautam 
>> ---
>>
>> Changes since v1:
>>  - staticized ufs_qcom_phy_enable(/disable)_ref_clk(),
>>  - staticized ufs_qcom_phy_enable(/disable)_iface_clk()
>>  - removed function declaration and export symbol for these APIs.

[snip]

>> --- a/drivers/scsi/ufs/ufs-qcom.c
>> +++ b/drivers/scsi/ufs/ufs-qcom.c
>> @@ -1112,17 +1112,6 @@ static int ufs_qcom_setup_clocks(struct ufs_hba
>> *hba, bool on)
>> return 0;
>>
>> if (on) {
>> -   err = ufs_qcom_phy_enable_iface_clk(host->generic_phy);
>> -   if (err)
>> -   goto out;
>> -
>> -   err = ufs_qcom_phy_enable_ref_clk(host->generic_phy);
>> -   if (err) {
>> -   dev_err(hba->dev, "%s enable phy ref clock failed,
>> err=%d\n",
>> -   __func__, err);
>> -   ufs_qcom_phy_disable_iface_clk(host->generic_phy);
>> -   goto out;
>> -   }
>
>
> Now that you are moving these ref clk enable/disable to phy_power_on/off and
> these phy_power_on/off are called only in runtime suspend/resume (3 seconds
> after last UFS access).
> Goal is to disable the phy reference clock during aggressive gating (10ms
> from last UFS access) so shouldn't we call the phy_power_on/off from these
> setup_clocks() function as well?
>

So setup_clocks() is called for aggressive clock gating as well ?
If that's the case then yes, we may need to call. But we should try to
understand here. The phy_power_off turns off all the clocks - reflclk,
and other interface clocks. Do we want all of them to be turned off ?

phy_power_off will also turn off the PHY. Do we want all this for aggressive
clock gating ?


[snip]


Regards
Vivek

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 1/3] block: Add iocontext priority to request

2016-10-19 Thread Tejun Heo

Hello,

On Mon, Oct 17, 2016 at 11:27:28AM -0700, Adam Manzanares wrote:
> Patch adds an association between iocontext ioprio and the ioprio of a
> request. This is done to enable request based drivers the ability to
> act on priority information stored in the request. An example being
> ATA devices that support command priorities. If the ATA driver discovers
> that the device supports command priorities and the request has valid
> priority information indicating the request is high priority, then a high
> priority command can be sent to the device. This should improve tail
> latencies for high priority IO on any device that queues requests
> internally and can make use of the priority information stored in the
> request.
> 
> The ioprio of the request is set in blk_rq_set_prio which takes the
> request and the ioc as arguments. If the ioc is valid in blk_rq_set_prio
> then the iopriority of the request is set as the iopriority of the ioc.
> In init_request_from_bio a check is made to see if the ioprio of the bio
> is valid and if so then the request prio comes from the bio.
> 
> Signed-off-by: Adam Manzananares 

Jens, if you're okay with it, I can route this through
libata/for-4.10, or this can be applied to block and libata tree can
pull from it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 1/3] block: Add iocontext priority to request

2016-10-19 Thread Jens Axboe


On 10/19/2016 12:05 PM, Tejun Heo wrote:

Hello,

On Mon, Oct 17, 2016 at 11:27:28AM -0700, Adam Manzanares wrote:

Patch adds an association between iocontext ioprio and the ioprio of a
request. This is done to enable request based drivers the ability to
act on priority information stored in the request. An example being
ATA devices that support command priorities. If the ATA driver discovers
that the device supports command priorities and the request has valid
priority information indicating the request is high priority, then a high
priority command can be sent to the device. This should improve tail
latencies for high priority IO on any device that queues requests
internally and can make use of the priority information stored in the
request.

The ioprio of the request is set in blk_rq_set_prio which takes the
request and the ioc as arguments. If the ioc is valid in blk_rq_set_prio
then the iopriority of the request is set as the iopriority of the ioc.
In init_request_from_bio a check is made to see if the ioprio of the bio
is valid and if so then the request prio comes from the bio.

Signed-off-by: Adam Manzananares 


Jens, if you're okay with it, I can route this through
libata/for-4.10, or this can be applied to block and libata tree can
pull from it.


I'm fine with that, add my Reviewed-by and funnel it through the libata
tree.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Adam Manzanares

The 10/19/2016 14:05, Tejun Heo wrote:
> On Mon, Oct 17, 2016 at 11:27:30AM -0700, Adam Manzanares wrote:
> > Add a sysfs entry to turn on priority information being passed
> > to a ATA device. By default this feature is turned off.
> > 
> > This patch depends on ata: Enabling ATA Command Priorities
> > 
> > Signed-off-by: Adam Manzanares 
> > ---
> >  drivers/ata/libahci.c |  1 +
> >  drivers/ata/libata-core.c |  2 +-
> >  drivers/ata/libata-scsi.c | 68 
> > +++
> >  include/linux/libata.h|  7 +
> >  4 files changed, 77 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
> > index 0d028ea..0e17285 100644
> > --- a/drivers/ata/libahci.c
> > +++ b/drivers/ata/libahci.c
> > @@ -140,6 +140,7 @@ EXPORT_SYMBOL_GPL(ahci_shost_attrs);
> >  struct device_attribute *ahci_sdev_attrs[] = {
> > &dev_attr_sw_activity,
> > &dev_attr_unload_heads,
> > +   &dev_attr_ncq_prio_on,
> 
> I'll rename it to ncq_prio_enable while applying but otherwise looks
> good to me.

Sounds good.

Take care,
Adam

> 
> Thanks.
> 
> -- 
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Tejun Heo

On Mon, Oct 17, 2016 at 11:27:30AM -0700, Adam Manzanares wrote:
> Add a sysfs entry to turn on priority information being passed
> to a ATA device. By default this feature is turned off.
> 
> This patch depends on ata: Enabling ATA Command Priorities
> 
> Signed-off-by: Adam Manzanares 
> ---
>  drivers/ata/libahci.c |  1 +
>  drivers/ata/libata-core.c |  2 +-
>  drivers/ata/libata-scsi.c | 68 
> +++
>  include/linux/libata.h|  7 +
>  4 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
> index 0d028ea..0e17285 100644
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -140,6 +140,7 @@ EXPORT_SYMBOL_GPL(ahci_shost_attrs);
>  struct device_attribute *ahci_sdev_attrs[] = {
>   &dev_attr_sw_activity,
>   &dev_attr_unload_heads,
> + &dev_attr_ncq_prio_on,

I'll rename it to ncq_prio_enable while applying but otherwise looks
good to me.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 2/3] ata: Enabling ATA Command Priorities

2016-10-19 Thread Tejun Heo

Hello,

I removed ata_ncq_prio_enabled() as it was colliding with the sysfs
file rename and these trivial test functions tend to obscure more than
help anything.

Thanks.

-- 8< --
>From 8e061784b51ec4a4efed0deaafb5bd9725bf5b06 Mon Sep 17 00:00:00 2001
From: Adam Manzanares 
Date: Mon, 17 Oct 2016 11:27:29 -0700
Subject: [PATCH 1/2] ata: Enabling ATA Command Priorities

This patch checks to see if an ATA device supports NCQ command priorities.
If so and the user has specified an iocontext that indicates
IO_PRIO_CLASS_RT then we build a tf with a high priority command.

This is done to improve the tail latency of commands that are high
priority by passing priority to the device.

tj: Removed trivial ata_ncq_prio_enabled() and open-coded the test.

Signed-off-by: Adam Manzanares 
Signed-off-by: Tejun Heo 
---
 drivers/ata/libata-core.c | 35 ++-
 drivers/ata/libata-scsi.c |  6 +-
 drivers/ata/libata.h  |  2 +-
 include/linux/ata.h   |  6 ++
 include/linux/libata.h|  3 +++
 5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 223a770..8346faf 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -739,6 +739,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct 
ata_device *dev)
  * @n_block: Number of blocks
  * @tf_flags: RW/FUA etc...
  * @tag: tag
+ * @class: IO priority class
  *
  * LOCKING:
  * None.
@@ -753,7 +754,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct 
ata_device *dev)
  */
 int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev,
u64 block, u32 n_block, unsigned int tf_flags,
-   unsigned int tag)
+   unsigned int tag, int class)
 {
tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
tf->flags |= tf_flags;
@@ -785,6 +786,12 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct 
ata_device *dev,
tf->device = ATA_LBA;
if (tf->flags & ATA_TFLAG_FUA)
tf->device |= 1 << 7;
+
+   if (dev->flags & ATA_DFLAG_NCQ_PRIO) {
+   if (class == IOPRIO_CLASS_RT)
+   tf->hob_nsect |= ATA_PRIO_HIGH <<
+ATA_SHIFT_PRIO;
+   }
} else if (dev->flags & ATA_DFLAG_LBA) {
tf->flags |= ATA_TFLAG_LBA;
 
@@ -2156,6 +2163,30 @@ static void ata_dev_config_ncq_non_data(struct 
ata_device *dev)
}
 }
 
+static void ata_dev_config_ncq_prio(struct ata_device *dev)
+{
+   struct ata_port *ap = dev->link->ap;
+   unsigned int err_mask;
+
+   err_mask = ata_read_log_page(dev,
+ATA_LOG_SATA_ID_DEV_DATA,
+ATA_LOG_SATA_SETTINGS,
+ap->sector_buf,
+1);
+   if (err_mask) {
+   ata_dev_dbg(dev,
+   "failed to get Identify Device data, Emask 0x%x\n",
+   err_mask);
+   return;
+   }
+
+   if (ap->sector_buf[ATA_LOG_NCQ_PRIO_OFFSET] & BIT(3))
+   dev->flags |= ATA_DFLAG_NCQ_PRIO;
+   else
+   ata_dev_dbg(dev, "SATA page does not support priority\n");
+
+}
+
 static int ata_dev_config_ncq(struct ata_device *dev,
   char *desc, size_t desc_sz)
 {
@@ -2205,6 +2236,8 @@ static int ata_dev_config_ncq(struct ata_device *dev,
ata_dev_config_ncq_send_recv(dev);
if (ata_id_has_ncq_non_data(dev->id))
ata_dev_config_ncq_non_data(dev);
+   if (ata_id_has_ncq_prio(dev->id))
+   ata_dev_config_ncq_prio(dev);
}
 
return 0;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 9cceb4a..2bccc3c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "libata.h"
 #include "libata-transport.h"
@@ -1755,6 +1756,8 @@ static unsigned int ata_scsi_rw_xlat(struct 
ata_queued_cmd *qc)
 {
struct scsi_cmnd *scmd = qc->scsicmd;
const u8 *cdb = scmd->cmnd;
+   struct request *rq = scmd->request;
+   int class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq));
unsigned int tf_flags = 0;
u64 block;
u32 n_block;
@@ -1821,7 +1824,8 @@ static unsigned int ata_scsi_rw_xlat(struct 
ata_queued_cmd *qc)
qc->nbytes = n_block * scmd->device->sector_size;
 
rc = ata_build_rw_tf(&qc->tf, qc->dev, block, n_block, tf_flags,
-qc->tag);
+qc->tag, class);
+
if (likely(rc == 0))
return 0;
 
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 3b301a4..8f3a559 1

Re: [PATCH v6 0/3] Enabling ATA Command Priorities

2016-10-19 Thread Tejun Heo

On Mon, Oct 17, 2016 at 11:27:27AM -0700, Adam Manzanares wrote:
> This patch builds ATA commands with high priority if the iocontext of a 
> process
> is set to real time. The goal of the patch is to improve tail latencies of 
> workloads that use higher queue depths. This requires setting the iocontext 
> ioprio on the request when it is initialized
> 
> This patch has been tested with an Ultrastar HE8 HDD and cuts the 
> the p99.99 tail latency of foreground IO from 2s down to 72ms when
> using the deadline scheduler. This patch works independently of the
> scheduler so it can be used with all of the currently available 
> request based schedulers. 
> 
> Foreground IO, for the previously described results, is an async fio job 
> submitting 4K read requests at a QD of 1 to the HDD. The foreground IO is set 
> with the iopriority class of real time. The background workload is another fio
> job submitting read requests at a QD of 32 to the same HDD with default 
> iopriority.
> 
> This feature is enabled for ATA devices by setting the ata ncq_prio_on device 
> attribute to 1. An ATA device is also checked to see if the device supports 
> per
> command priority.

Applied 1-3 to libata/for-4.10 w/ some modifications.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Tejun Heo

Hello,

Removed ata_ncq_prio_on() and renamed _on to _enable.  If I messed up
anything, please let me know.

Also, can you please send a follow-up patch to make the store function
reject prio enabling if the device doesn't support it?

Thanks.

-- 8< --
>From 84f95243b5439a20c33837075b88926bfa00c4ec Mon Sep 17 00:00:00 2001
From: Adam Manzanares 
Date: Mon, 17 Oct 2016 11:27:30 -0700
Subject: [PATCH 2/2] ata: ATA Command Priority Disabled By Default

Add a sysfs entry to turn on priority information being passed
to a ATA device. By default this feature is turned off.

This patch depends on ata: Enabling ATA Command Priorities

tj: Renamed ncq_prio_on to ncq_prio_enable and removed trivial
ata_ncq_prio_on() and open-coded the test.

Signed-off-by: Adam Manzanares 
Signed-off-by: Tejun Heo 
---
 drivers/ata/libahci.c |  1 +
 drivers/ata/libata-core.c |  3 ++-
 drivers/ata/libata-scsi.c | 68 +++
 include/linux/libata.h|  2 ++
 4 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 0d028ea..ee7db31 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -140,6 +140,7 @@ EXPORT_SYMBOL_GPL(ahci_shost_attrs);
 struct device_attribute *ahci_sdev_attrs[] = {
&dev_attr_sw_activity,
&dev_attr_unload_heads,
+   &dev_attr_ncq_prio_enable,
NULL
 };
 EXPORT_SYMBOL_GPL(ahci_sdev_attrs);
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 8346faf..b294339 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -787,7 +787,8 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct 
ata_device *dev,
if (tf->flags & ATA_TFLAG_FUA)
tf->device |= 1 << 7;
 
-   if (dev->flags & ATA_DFLAG_NCQ_PRIO) {
+   if ((dev->flags & ATA_DFLAG_NCQ_PRIO) &&
+   (dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLE)) {
if (class == IOPRIO_CLASS_RT)
tf->hob_nsect |= ATA_PRIO_HIGH <<
 ATA_SHIFT_PRIO;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 2bccc3c..87597a3 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -271,6 +271,73 @@ DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR,
ata_scsi_park_show, ata_scsi_park_store);
 EXPORT_SYMBOL_GPL(dev_attr_unload_heads);
 
+static ssize_t ata_ncq_prio_enable_show(struct device *device,
+   struct device_attribute *attr, char 
*buf)
+{
+   struct scsi_device *sdev = to_scsi_device(device);
+   struct ata_port *ap;
+   struct ata_device *dev;
+   bool ncq_prio_enable;
+   int rc = 0;
+
+   ap = ata_shost_to_port(sdev->host);
+
+   spin_lock_irq(ap->lock);
+   dev = ata_scsi_find_dev(ap, sdev);
+   if (!dev) {
+   rc = -ENODEV;
+   goto unlock;
+   }
+
+   ncq_prio_enable = dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLE;
+
+unlock:
+   spin_unlock_irq(ap->lock);
+
+   return rc ? rc : snprintf(buf, 20, "%u\n", ncq_prio_enable);
+}
+
+static ssize_t ata_ncq_prio_enable_store(struct device *device,
+struct device_attribute *attr,
+const char *buf, size_t len)
+{
+   struct scsi_device *sdev = to_scsi_device(device);
+   struct ata_port *ap;
+   struct ata_device *dev;
+   long int input;
+   unsigned long flags;
+   int rc;
+
+   rc = kstrtol(buf, 10, &input);
+   if (rc)
+   return rc;
+   if ((input < 0) || (input > 1))
+   return -EINVAL;
+
+   ap = ata_shost_to_port(sdev->host);
+
+   spin_lock_irqsave(ap->lock, flags);
+   dev = ata_scsi_find_dev(ap, sdev);
+   if (unlikely(!dev)) {
+   rc = -ENODEV;
+   goto unlock;
+   }
+
+   if (input)
+   dev->flags |= ATA_DFLAG_NCQ_PRIO_ENABLE;
+   else
+   dev->flags &= ~ATA_DFLAG_NCQ_PRIO_ENABLE;
+
+unlock:
+   spin_unlock_irqrestore(ap->lock, flags);
+
+   return rc ? rc : len;
+}
+
+DEVICE_ATTR(ncq_prio_enable, S_IRUGO | S_IWUSR,
+   ata_ncq_prio_enable_show, ata_ncq_prio_enable_store);
+EXPORT_SYMBOL_GPL(dev_attr_ncq_prio_enable);
+
 void ata_scsi_set_sense(struct ata_device *dev, struct scsi_cmnd *cmd,
u8 sk, u8 asc, u8 ascq)
 {
@@ -402,6 +469,7 @@ EXPORT_SYMBOL_GPL(dev_attr_sw_activity);
 
 struct device_attribute *ata_common_sdev_attrs[] = {
&dev_attr_unload_heads,
+   &dev_attr_ncq_prio_enable,
NULL
 };
 EXPORT_SYMBOL_GPL(ata_common_sdev_attrs);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 90b69a6..c170be5 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -167,6 +167,7 @@ enum {
ATA_DFLAG_UNLOCK_HPA= (1 << 18), /*

RE: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware

2016-10-19 Thread Raghava Aditya Renukunta

Hi Tomas,

> -Original Message-
> From: Tomas Henzl [mailto:the...@redhat.com]
> Sent: Wednesday, October 19, 2016 5:56 AM
> To: Ching Huang
> Cc: James Bottomley; Kashyap Desai; Ric Wheeler; Hannes Reinecke; Sumit
> Saxena; linux-scsi@vger.kernel.org; martin.peter...@oracle.com; Christoph
> Hellwig; Martin K. Petersen; Jeff Moyer; Gris Ge; Ewan Milne; Jens Axboe;
> Raghava Aditya Renukunta
> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE
> command to firmware
> 
> EXTERNAL EMAIL
> 
> 
> On 19.10.2016 11:50, Ching Huang wrote:
> > Hi Tomas,
> >
> > SCSI command checking in queuecommand function of arcmsr can be
> removed safely.
> > Now driver can pass all scsi command to controller firmware.
> 
> thanks for your quick reply. Is this (passing the SYNCHRONIZE_CACHE
> command
> to firmware) safe for all arcmsr cards, even the oldest?
> 
> Please start with your patch a new thread and add your 'Signed-off-by:' line.
> 
> >
> > diff -uprN a/drivers/scsi/arcmsr/arcmsr_hba.c
> b/drivers/scsi/arcmsr/arcmsr_hba.c
> > --- a/drivers/scsi/arcmsr/arcmsr_hba.c2016-10-19 16:18:45.0
> +0800
> > +++ b/drivers/scsi/arcmsr/arcmsr_hba.c2016-10-19 16:31:59.665524699
> +0800
> > @@ -2623,13 +2623,6 @@ static int arcmsr_queue_command_lck(stru
> >   cmd->scsi_done = done;
> >   cmd->host_scribble = NULL;
> >   cmd->result = 0;
> > - if ((scsicmd == SYNCHRONIZE_CACHE) ||(scsicmd ==
> SEND_DIAGNOSTIC)){
> > - if(acb->devstate[target][lun] == ARECA_RAID_GONE) {
> > - cmd->result = (DID_NO_CONNECT << 16);
> > - }
> > - cmd->scsi_done(cmd);
> > - return 0;
> > - }
> >   if (target == 16) {
> >   /* virtual device for iop message transfer */
> >   arcmsr_handle_virtual_command(acb, cmd);
> >
> > Thanks
> > Ching
> > On Tue, 2016-10-18 at 15:56 +0200, Tomas Henzl wrote:
> >> Hi,
> >>
> >> similar suspicious code path can be found in the queuecommand
> functions in other drivers too
> >> these are -
> >> pmcraid.c
> >> arcmsr_hba.c
> >> cc-ing maintainers -
> >> (but both drivers seem to be unmaintained for a while -
> >> I've added Ching for arcmsr and Raghava for pmcraid)

We do not support this card  anymore nor do we have the hardware to test
it out, but let me  try and procure the hardware for testing although 
chances are very slim.

Regards,
Raghava Aditya

> >> please read this thread and verify that your driver and firmware is safe.
> >>
> >> It is expected that a raid card controls the disk write cache (switches it 
> >> off)
> >> but this might not be true for all modes of operation a raid adapter
> handles
> >> - pass through/non-RAID etc. In such case when disk write cache is
> enabled
> >> and your driver skips the SYNCHRONIZE_CACHE command a FS corruption
> >> is very much possible during unexpected power loss or even a clean
> shutdown.
> >>
> >> tomash
> >>
> >> On 17.10.2016 19:51, James Bottomley wrote:
> >>> On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote:
> > -Original Message-
> > From: James Bottomley [mailto:j...@linux.vnet.ibm.com]
> > Sent: Monday, October 17, 2016 10:50 PM
> > To: Ric Wheeler; Hannes Reinecke; Sumit Saxena;
> > linux-scsi@vger.kernel.org
> > Cc: martin.peter...@oracle.com; the...@redhat.com;
> > kashyap.de...@broadcom.com; Christoph Hellwig; Martin K.
> Petersen;
> > Jeff
> > Moyer; Gris Ge; Ewan Milne; Jens Axboe
> > Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE
> > command
> > to firmware
> >
> > On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote:
> >> On 10/17/2016 12:20 PM, James Bottomley wrote:
> >>> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote:
>  On 10/17/2016 07:34 AM, Hannes Reinecke wrote:
> > On 10/17/2016 12:24 PM, Sumit Saxena wrote:
> >> megaraid_sas driver returns SYNCHRONIZE_CACHE command
> >> back to
> >> SCSI layer without sending it to firmware as firmware
> >> takes
> >> care of flushing cache.  This patch will change the
> >> driver
> >> behavior wrt SYNCHRONIZE_CACHE handling. If underlying
> >> firmware has support to handle the SYNCHORNIZE_CACHE,
> >> driver
> >> will send it for firmware otherwise complete it back to
> >> SCSI
> >> layer with SUCCESS immediately.  If  Firmware  handle
> >> SYNCHORNIZE_CACHE for both VD and JBOD
> >> "canHandleSyncCache"
> >> bit in scratch pad register(offset
> >> 0x00B4) will be set.
> >>
> >> This behavior can be controlled via module parameter and
> >> user
> >> can fallback to old behavior of returning
> >> SYNCHRONIZE_CACHE by
> >> driver only without sending it to firmware.
> >>
> >> Signed-off-by: Sumit Saxena 
> >> Signed-off-by: Kashyap Desai
> 
> >>>

Re: [PATCH 08/10] ufs-qcom: phy/hcd: Refactoring phy clock handling

2016-10-19 Thread Subhash Jadavani


On 2016-10-19 10:45, Vivek Gautam wrote:

Hi,


On Wed, Oct 19, 2016 at 1:43 AM, Subhash Jadavani
 wrote:

On 2016-10-18 07:28, Vivek Gautam wrote:


Add phy clock enable code to phy_power_on/off callbacks, and
remove explicit calls to enable these phy clocks from the
ufs-qcom hcd driver.

Signed-off-by: Vivek Gautam 
---

Changes since v1:
 - staticized ufs_qcom_phy_enable(/disable)_ref_clk(),
 - staticized ufs_qcom_phy_enable(/disable)_iface_clk()
 - removed function declaration and export symbol for these APIs.


[snip]


--- a/drivers/scsi/ufs/ufs-qcom.c
+++ b/drivers/scsi/ufs/ufs-qcom.c
@@ -1112,17 +1112,6 @@ static int ufs_qcom_setup_clocks(struct 
ufs_hba

*hba, bool on)
return 0;

if (on) {
-   err = 
ufs_qcom_phy_enable_iface_clk(host->generic_phy);

-   if (err)
-   goto out;
-
-   err = ufs_qcom_phy_enable_ref_clk(host->generic_phy);
-   if (err) {
-   dev_err(hba->dev, "%s enable phy ref clock 
failed,

err=%d\n",
-   __func__, err);
-   
ufs_qcom_phy_disable_iface_clk(host->generic_phy);

-   goto out;
-   }



Now that you are moving these ref clk enable/disable to 
phy_power_on/off and
these phy_power_on/off are called only in runtime suspend/resume (3 
seconds

after last UFS access).
Goal is to disable the phy reference clock during aggressive gating 
(10ms
from last UFS access) so shouldn't we call the phy_power_on/off from 
these

setup_clocks() function as well?



So setup_clocks() is called for aggressive clock gating as well ?
If that's the case then yes, we may need to call. But we should try to
understand here. The phy_power_off turns off all the clocks - reflclk,
and other interface clocks. Do we want all of them to be turned off ?


Yes, we want to turn off the ref clock (& other clocks) during 
aggressive gating.




phy_power_off will also turn off the PHY. Do we want all this for 
aggressive

clock gating ?


Yes, PHY rails can be powered off both during the aggressive clk gating 
and runtime suspend. But as the regulator on/off latencies could be 
higher (especially for the shared rail), we were turning them off doing 
it in runtime suspend only (via phy_power_off()).
Now that phy_power_on/off is managing both clocks and regulators, and we 
will really want to turn off the ref clocks during clock gating, there 
is no option but to call phy_power_on/off during aggressive gating.





[snip]


Regards
Vivek


--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Adam Manzanares

The 10/19/2016 14:38, Tejun Heo wrote:
> Hello,
> 
> Removed ata_ncq_prio_on() and renamed _on to _enable.  If I messed up
> anything, please let me know.
> 

Works as expected. Thanks for cleaning this up.

> Also, can you please send a follow-up patch to make the store function
> reject prio enabling if the device doesn't support it?
> 

I'll send something out shortly.

Take care,
Adam

> Thanks.
> 
> -- 8< --
> From 84f95243b5439a20c33837075b88926bfa00c4ec Mon Sep 17 00:00:00 2001
> From: Adam Manzanares 
> Date: Mon, 17 Oct 2016 11:27:30 -0700
> Subject: [PATCH 2/2] ata: ATA Command Priority Disabled By Default
> 
> Add a sysfs entry to turn on priority information being passed
> to a ATA device. By default this feature is turned off.
> 
> This patch depends on ata: Enabling ATA Command Priorities
> 
> tj: Renamed ncq_prio_on to ncq_prio_enable and removed trivial
> ata_ncq_prio_on() and open-coded the test.
> 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Tejun Heo 
> ---
>  drivers/ata/libahci.c |  1 +
>  drivers/ata/libata-core.c |  3 ++-
>  drivers/ata/libata-scsi.c | 68 
> +++
>  include/linux/libata.h|  2 ++
>  4 files changed, 73 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
> index 0d028ea..ee7db31 100644
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -140,6 +140,7 @@ EXPORT_SYMBOL_GPL(ahci_shost_attrs);
>  struct device_attribute *ahci_sdev_attrs[] = {
>   &dev_attr_sw_activity,
>   &dev_attr_unload_heads,
> + &dev_attr_ncq_prio_enable,
>   NULL
>  };
>  EXPORT_SYMBOL_GPL(ahci_sdev_attrs);
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 8346faf..b294339 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -787,7 +787,8 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct 
> ata_device *dev,
>   if (tf->flags & ATA_TFLAG_FUA)
>   tf->device |= 1 << 7;
>  
> - if (dev->flags & ATA_DFLAG_NCQ_PRIO) {
> + if ((dev->flags & ATA_DFLAG_NCQ_PRIO) &&
> + (dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLE)) {
>   if (class == IOPRIO_CLASS_RT)
>   tf->hob_nsect |= ATA_PRIO_HIGH <<
>ATA_SHIFT_PRIO;
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 2bccc3c..87597a3 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -271,6 +271,73 @@ DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR,
>   ata_scsi_park_show, ata_scsi_park_store);
>  EXPORT_SYMBOL_GPL(dev_attr_unload_heads);
>  
> +static ssize_t ata_ncq_prio_enable_show(struct device *device,
> + struct device_attribute *attr, char 
> *buf)
> +{
> + struct scsi_device *sdev = to_scsi_device(device);
> + struct ata_port *ap;
> + struct ata_device *dev;
> + bool ncq_prio_enable;
> + int rc = 0;
> +
> + ap = ata_shost_to_port(sdev->host);
> +
> + spin_lock_irq(ap->lock);
> + dev = ata_scsi_find_dev(ap, sdev);
> + if (!dev) {
> + rc = -ENODEV;
> + goto unlock;
> + }
> +
> + ncq_prio_enable = dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLE;
> +
> +unlock:
> + spin_unlock_irq(ap->lock);
> +
> + return rc ? rc : snprintf(buf, 20, "%u\n", ncq_prio_enable);
> +}
> +
> +static ssize_t ata_ncq_prio_enable_store(struct device *device,
> +  struct device_attribute *attr,
> +  const char *buf, size_t len)
> +{
> + struct scsi_device *sdev = to_scsi_device(device);
> + struct ata_port *ap;
> + struct ata_device *dev;
> + long int input;
> + unsigned long flags;
> + int rc;
> +
> + rc = kstrtol(buf, 10, &input);
> + if (rc)
> + return rc;
> + if ((input < 0) || (input > 1))
> + return -EINVAL;
> +
> + ap = ata_shost_to_port(sdev->host);
> +
> + spin_lock_irqsave(ap->lock, flags);
> + dev = ata_scsi_find_dev(ap, sdev);
> + if (unlikely(!dev)) {
> + rc = -ENODEV;
> + goto unlock;
> + }
> +
> + if (input)
> + dev->flags |= ATA_DFLAG_NCQ_PRIO_ENABLE;
> + else
> + dev->flags &= ~ATA_DFLAG_NCQ_PRIO_ENABLE;
> +
> +unlock:
> + spin_unlock_irqrestore(ap->lock, flags);
> +
> + return rc ? rc : len;
> +}
> +
> +DEVICE_ATTR(ncq_prio_enable, S_IRUGO | S_IWUSR,
> + ata_ncq_prio_enable_show, ata_ncq_prio_enable_store);
> +EXPORT_SYMBOL_GPL(dev_attr_ncq_prio_enable);
> +
>  void ata_scsi_set_sense(struct ata_device *dev, struct scsi_cmnd *cmd,
>   u8 sk, u8 asc, u8 ascq)
>  {
> @@ -402,6 +469,7 @@ EXPORT_SYMBOL_GPL(dev_attr_sw_activity);
>  
>  struct device_attribute *ata_common_sdev_attrs[] = {
>   &dev_attr_unload_heads,
> +

Re: [PATCH v3 04/11] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-19 Thread Bart Van Assche


On 10/18/2016 02:50 PM, Bart Van Assche wrote:

blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).


(replying to my own e-mail)

The zero-day kernel test infrastructure reported to me that this patch 
causes a build failure with CONFIG_SRCU=n. Should I add "select SRCU" to 
block/Kconfig (excludes TINY_RCU) or should I rather modify this patch 
such that a mutex or rwsem is used instead of SRCU?


Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/11] Fix race conditions related to stopping block layer queues

2016-10-19 Thread Keith Busch

Hi Bart,

I'm running linux 4.9-rc1 + linux-block/for-linus, and alternating tests
with and without this series.

Without this, I'm not seeing any problems in a link-down test while
running fio after ~30 runs.

With this series, I only see the test pass infrequently. Most of the
time I observe one of several failures. In all cases, it looks like the
rq->queuelist is in an unexpected state.

I think I've almost got this tracked down, but I have to leave for the
day soon. Rather than having a more useful suggestion, I've put the two
failures below.


First failure:

[  214.782075] [ cut here ]
[  214.782098] kernel BUG at block/blk-mq.c:498!
[  214.782117] invalid opcode:  [#1] SMP
[  214.782133] Modules linked in: nvme nvme_core nf_conntrack_netbios_ns 
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack 
ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle 
ip6table_security iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security ebtable_filter 
ebtables ip6table_filter ip6_tables vfat fat
[  214.782356] CPU: 6 PID: 160 Comm: kworker/u16:6 Not tainted 4.9.0-rc1+ #28
[  214.782383] Hardware name: Gigabyte Technology Co., Ltd. 
Z97X-UD5H/Z97X-UD5H, BIOS F8 06/17/2014
[  214.782419] Workqueue: nvme nvme_reset_work [nvme]
[  214.782440] task: 8c0815403b00 task.stack: b6ad01384000
[  214.782463] RIP: 0010:[]  [] 
blk_mq_requeue_request+0x35/0x40
[  214.782502] RSP: 0018:b6ad01387b88  EFLAGS: 00010287
[  214.782524] RAX: 8c0814b98400 RBX: 8c0814b98200 RCX: 7530
[  214.782551] RDX: 0007 RSI: 0001 RDI: 8c0814b98200
[  214.782578] RBP: b6ad01387b98 R08:  R09: 9f408680
[  214.783394] R10: 0394 R11: 0388 R12: 0001
[  214.784212] R13: 8c081593a000 R14: 0001 R15: 8c080cdea740
[  214.785033] FS:  () GS:8c081fb8() 
knlGS:
[  214.785869] CS:  0010 DS:  ES:  CR0: 80050033
[  214.786710] CR2: 7ffae4497f34 CR3: 0001dfe06000 CR4: 001406e0
[  214.787559] Stack:
[  214.788406]  8c0814b98200  b6ad01387ba8 
c03451b3
[  214.789287]  b6ad01387bd0 c0357a4a 8c0814b98200 
d6acffc81a00
[  214.790174]  0006 b6ad01387bf8 9f3b8e22 
8c0814b98200
[  214.791066] Call Trace:
[  214.791935]  [] nvme_requeue_req+0x13/0x20 [nvme_core]
[  214.792810]  [] nvme_complete_rq+0x16a/0x1d0 [nvme]
[  214.793680]  [] __blk_mq_complete_request+0x72/0xe0
[  214.794551]  [] blk_mq_complete_request+0x1c/0x20
[  214.795422]  [] nvme_cancel_request+0x50/0x90 [nvme_core]
[  214.796299]  [] bt_tags_iter+0x2e/0x40
[  214.797157]  [] blk_mq_tagset_busy_iter+0x173/0x1e0
[  214.798005]  [] ? nvme_shutdown_ctrl+0x100/0x100 
[nvme_core]
[  214.798852]  [] ? nvme_shutdown_ctrl+0x100/0x100 
[nvme_core]
[  214.799682]  [] nvme_dev_disable+0x11d/0x380 [nvme]
[  214.800511]  [] ? acpi_unregister_gsi_ioapic+0x3a/0x40
[  214.801344]  [] ? dev_warn+0x6c/0x90
[  214.802157]  [] nvme_reset_work+0xa4/0xdc0 [nvme]
[  214.802961]  [] ? __switch_to+0x2b6/0x5f0
[  214.803773]  [] process_one_work+0x15f/0x430
[  214.804593]  [] worker_thread+0x4e/0x490
[  214.805419]  [] ? process_one_work+0x430/0x430
[  214.806255]  [] kthread+0xd9/0xf0
[  214.807096]  [] ? kthread_park+0x60/0x60
[  214.807946]  [] ret_from_fork+0x25/0x30
[  214.808801] Code: 54 53 48 89 fb 41 89 f4 e8 a9 fa ff ff 48 8b 03 48 39 c3 
75 16 41 0f b6 d4 48 89 df be 01 00 00 00 e8 10 ff ff ff 5b 41 5c 5d c3 <0f> 0b 
66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 be 40 00 00
[  214.810714] RIP  [] blk_mq_requeue_request+0x35/0x40
[  214.811628]  RSP 
[  214.812545] ---[ end trace 6ef3a3b6f8cea418 ]---
[  214.813437] [ cut here ]


Second failure, warning followed by NMI watchdog:

[  410.736619] [ cut here ]
[  410.736624] WARNING: CPU: 2 PID: 577 at lib/list_debug.c:29 
__list_add+0x62/0xb0
[  410.736883] list_add corruption. next->prev should be prev 
(acf481847d78), but was 931f8fb78000. (next=931f8fb78000).
[  410.736884] Modules linked in: nvme nvme_core nf_conntrack_netbios_ns 
nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter xt_conntrack 
ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_security 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw 
ip6table_mangle iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_mangle ebtable_filter 
ebtables ip6table_filter ip6_tables vfat fat
[  410.736902] CPU: 2 PID: 577 Comm: kworker/2:1H Not tainted 4.9.0-rc1+ #28
[  410.736903] Hardware name: Gigabyte Technology Co., Ltd. 
Z97X-UD5H/Z97X-UD5H,

Re: [RFC 1/6] qed: Add support for hardware offloaded iSCSI.

2016-10-19 Thread Arun Easi

Thanks Hannes for the review. Please see my comments inline..

On Wed, 19 Oct 2016, 12:31am, Hannes Reinecke wrote:

> On 10/19/2016 07:01 AM, manish.rangan...@cavium.com wrote:
> > From: Yuval Mintz 
> > 
> > This adds the backbone required for the various HW initalizations
> > which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ
> > 4 line of adapters - FW notification, resource initializations, etc.
> > 
> > Signed-off-by: Arun Easi 
> > Signed-off-by: Yuval Mintz 
> > ---
> >  drivers/net/ethernet/qlogic/Kconfig|   15 +
> >  drivers/net/ethernet/qlogic/qed/Makefile   |1 +
> >  drivers/net/ethernet/qlogic/qed/qed.h  |8 +-
> >  drivers/net/ethernet/qlogic/qed/qed_dev.c  |   15 +
> >  drivers/net/ethernet/qlogic/qed/qed_int.h  |1 -
> >  drivers/net/ethernet/qlogic/qed/qed_iscsi.c| 1310 
> > 
> >  drivers/net/ethernet/qlogic/qed/qed_iscsi.h|   52 +
> >  drivers/net/ethernet/qlogic/qed/qed_l2.c   |1 -
> >  drivers/net/ethernet/qlogic/qed/qed_ll2.c  |   35 +-
> >  drivers/net/ethernet/qlogic/qed/qed_main.c |2 -
> >  drivers/net/ethernet/qlogic/qed/qed_mcp.h  |6 -
> >  drivers/net/ethernet/qlogic/qed/qed_reg_addr.h |2 +
> >  drivers/net/ethernet/qlogic/qed/qed_spq.c  |   15 +
> >  include/linux/qed/qed_if.h |2 +
> >  include/linux/qed/qed_iscsi_if.h   |  249 +
> >  15 files changed, 1692 insertions(+), 22 deletions(-)
> >  create mode 100644 drivers/net/ethernet/qlogic/qed/qed_iscsi.c
> >  create mode 100644 drivers/net/ethernet/qlogic/qed/qed_iscsi.h
> >  create mode 100644 include/linux/qed/qed_iscsi_if.h
> > 

-- snipped --

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c 
> > b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
> > new file mode 100644
> > index 000..cb22dad
> > --- /dev/null
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
> > @@ -0,0 +1,1310 @@
> > +/* QLogic qed NIC Driver
> 
> Shouldn't that be qedi iSCSI Driver?

Actually, this is the common module under drivers/net/, which was 
submitted along with the NIC driver, qede, so the comment stayed.

In this driver architecture, for all protocols, there is this
common module, qed, as well as a protocol module (qede, qedr, qedi
etc.).

This comment needs to be changed in all files under qed/. We will submit 
another patch to do that.

> > +static int qed_sp_iscsi_conn_offload(struct qed_hwfn *p_hwfn,
> > +struct qed_iscsi_conn *p_conn,
> > +enum spq_mode comp_mode,
> > +struct qed_spq_comp_cb *p_comp_addr)
> > +{
> > +   struct iscsi_spe_conn_offload *p_ramrod = NULL;
> > +   struct tcp_offload_params_opt2 *p_tcp2 = NULL;
> > +   struct tcp_offload_params *p_tcp = NULL;
> > +   struct qed_spq_entry *p_ent = NULL;
> > +   struct qed_sp_init_data init_data;
> > +   union qed_qm_pq_params pq_params;
> > +   u16 pq0_id = 0, pq1_id = 0;
> > +   dma_addr_t r2tq_pbl_addr;
> > +   dma_addr_t xhq_pbl_addr;
> > +   dma_addr_t uhq_pbl_addr;
> > +   int rc = 0;
> > +   u32 dval;
> > +   u16 wval;
> > +   u8 ucval;
> > +   u8 i;
> > +
> > +   /* Get SPQ entry */
> > +   memset(&init_data, 0, sizeof(init_data));
> > +   init_data.cid = p_conn->icid;
> > +   init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
> > +   init_data.comp_mode = comp_mode;
> > +   init_data.p_comp_data = p_comp_addr;
> > +
> > +   rc = qed_sp_init_request(p_hwfn, &p_ent,
> > +ISCSI_RAMROD_CMD_ID_OFFLOAD_CONN,
> > +PROTOCOLID_ISCSI, &init_data);
> > +   if (rc)
> > +   return rc;
> > +
> > +   p_ramrod = &p_ent->ramrod.iscsi_conn_offload;
> > +
> > +   /* Transmission PQ is the first of the PF */
> > +   memset(&pq_params, 0, sizeof(pq_params));
> > +   pq0_id = qed_get_qm_pq(p_hwfn, PROTOCOLID_ISCSI, &pq_params);
> > +   p_conn->physical_q0 = cpu_to_le16(pq0_id);
> > +   p_ramrod->iscsi.physical_q0 = cpu_to_le16(pq0_id);
> > +
> > +   /* iSCSI Pure-ACK PQ */
> > +   pq_params.iscsi.q_idx = 1;
> > +   pq1_id = qed_get_qm_pq(p_hwfn, PROTOCOLID_ISCSI, &pq_params);
> > +   p_conn->physical_q1 = cpu_to_le16(pq1_id);
> > +   p_ramrod->iscsi.physical_q1 = cpu_to_le16(pq1_id);
> > +
> > +   p_ramrod->hdr.op_code = ISCSI_RAMROD_CMD_ID_OFFLOAD_CONN;
> > +   SET_FIELD(p_ramrod->hdr.flags, ISCSI_SLOW_PATH_HDR_LAYER_CODE,
> > + p_conn->layer_code);
> > +
> > +   p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);
> > +   p_ramrod->fw_cid = cpu_to_le32(p_conn->icid);
> > +
> > +   DMA_REGPAIR_LE(p_ramrod->iscsi.sq_pbl_addr, p_conn->sq_pbl_addr);
> > +
> > +   r2tq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->r2tq);
> > +   DMA_REGPAIR_LE(p_ramrod->iscsi.r2tq_pbl_addr, r2tq_pbl_addr);
> > +
> > +   xhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->xhq);
> > +   DMA_REGPAIR_LE(p_ramrod->iscsi.xhq_pbl_addr, xhq_pbl_addr);
> > +
> > +   uhq_pbl_addr = q

Re: [PATCH v3 01/11] blk-mq: Do not invoke .queue_rq() for a stopped queue

2016-10-19 Thread Ming Lei

On Wed, Oct 19, 2016 at 5:48 AM, Bart Van Assche
 wrote:
> The meaning of the BLK_MQ_S_STOPPED flag is "do not call
> .queue_rq()". Hence modify blk_mq_make_request() such that requests
> are queued instead of issued if a queue has been stopped.
>
> Signed-off-by: Bart Van Assche 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Sagi Grimberg 
> Cc: Johannes Thumshirn 
> Cc: 

Reviewed-by: Ming Lei 

> ---
>  block/blk-mq.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ddc2eed..b5dcafb 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1332,9 +1332,9 @@ static blk_qc_t blk_mq_make_request(struct 
> request_queue *q, struct bio *bio)
> blk_mq_put_ctx(data.ctx);
> if (!old_rq)
> goto done;
> -   if (!blk_mq_direct_issue_request(old_rq, &cookie))
> -   goto done;
> -   blk_mq_insert_request(old_rq, false, true, true);
> +   if (test_bit(BLK_MQ_S_STOPPED, &data.hctx->state) ||
> +   blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +   blk_mq_insert_request(old_rq, false, true, true);
> goto done;
> }
>
> --
> 2.10.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 04/11] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-19 Thread Ming Lei

On Thu, Oct 20, 2016 at 5:04 AM, Bart Van Assche
 wrote:
> On 10/18/2016 02:50 PM, Bart Van Assche wrote:
>>
>> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
>> have finished. This function does *not* wait until all outstanding
>> requests have finished (this means invocation of request.end_io()).
>
>
> (replying to my own e-mail)
>
> The zero-day kernel test infrastructure reported to me that this patch
> causes a build failure with CONFIG_SRCU=n. Should I add "select SRCU" to
> block/Kconfig (excludes TINY_RCU) or should I rather modify this patch such

Select SRCU is fine, and you can see it is done in lots of
places(btrfs, net, quota, kvm, power,...)

> that a mutex or rwsem is used instead of SRCU?

Both should be much worse than SRCU, even not as good as atomic_t.

Thanks,
Ming

>
> Thanks,
>
> Bart.
>



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/11] Fix race conditions related to stopping block layer queues

2016-10-19 Thread Bart Van Assche

On 10/19/2016 03:14 PM, Keith Busch wrote:

I'm running linux 4.9-rc1 + linux-block/for-linus, and alternating tests
with and without this series.

Without this, I'm not seeing any problems in a link-down test while
running fio after ~30 runs.

With this series, I only see the test pass infrequently. Most of the
time I observe one of several failures. In all cases, it looks like the
rq->queuelist is in an unexpected state.

I think I've almost got this tracked down, but I have to leave for the
day soon. Rather than having a more useful suggestion, I've put the two
failures below.

> First failure:
>

[  214.782098] kernel BUG at block/blk-mq.c:498!

Hello Keith,

Thank you for having taken the time to test this patch series. Since I 
think that the second and third failures are consequences of the first, 
I will focus on the first failure triggered by your tests.

I assume that line 498 in blk-mq.c corresponds to 
BUG_ON(blk_queued_rq(rq))? Anyway, it seems to me like this is a bug in 
the NVMe code and also that this bug is completely unrelated to my patch 
series. In nvme_complete_rq() I see that blk_mq_requeue_request() is 
called. I don't think this is allowed from the context of 
nvme_cancel_request() because blk_mq_requeue_request() assumes that a 
request has already been removed from the request list. However, neither 
blk_mq_tagset_busy_iter() nor nvme_cancel_request() remove a request 
from the request list before nvme_complete_rq() is called. I think this 
is what triggers the BUG_ON() statement in blk_mq_requeue_request(). 
Have you noticed that e.g. the scsi-mq code only calls 
blk_mq_requeue_request() after __blk_mq_end_request() has finished? Have 
you considered to follow the same approach in nvme_cancel_request()?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/6] qed: Add support for hardware offloaded iSCSI.

2016-10-19 Thread Arun Easi

Thanks Johannes for the review, please see my response below.

On Wed, 19 Oct 2016, 2:09am, Johannes Thumshirn wrote:

> Hi Manish,
> 
> Some initital comments
> 
> On Wed, Oct 19, 2016 at 01:01:08AM -0400, manish.rangan...@cavium.com wrote:
> > From: Yuval Mintz 
> > 
> > This adds the backbone required for the various HW initalizations
> > which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ
> > 4 line of adapters - FW notification, resource initializations, etc.
> > 
> > Signed-off-by: Arun Easi 
> > Signed-off-by: Yuval Mintz 
> > ---
> >  drivers/net/ethernet/qlogic/Kconfig|   15 +
> >  drivers/net/ethernet/qlogic/qed/Makefile   |1 +
> >  drivers/net/ethernet/qlogic/qed/qed.h  |8 +-
> >  drivers/net/ethernet/qlogic/qed/qed_dev.c  |   15 +
> >  drivers/net/ethernet/qlogic/qed/qed_int.h  |1 -
> >  drivers/net/ethernet/qlogic/qed/qed_iscsi.c| 1310 
> > 
> >  drivers/net/ethernet/qlogic/qed/qed_iscsi.h|   52 +
> >  drivers/net/ethernet/qlogic/qed/qed_l2.c   |1 -
> >  drivers/net/ethernet/qlogic/qed/qed_ll2.c  |   35 +-
> >  drivers/net/ethernet/qlogic/qed/qed_main.c |2 -
> >  drivers/net/ethernet/qlogic/qed/qed_mcp.h  |6 -
> >  drivers/net/ethernet/qlogic/qed/qed_reg_addr.h |2 +
> >  drivers/net/ethernet/qlogic/qed/qed_spq.c  |   15 +
> >  include/linux/qed/qed_if.h |2 +
> >  include/linux/qed/qed_iscsi_if.h   |  249 +
> >  15 files changed, 1692 insertions(+), 22 deletions(-)
> >  create mode 100644 drivers/net/ethernet/qlogic/qed/qed_iscsi.c
> >  create mode 100644 drivers/net/ethernet/qlogic/qed/qed_iscsi.h
> >  create mode 100644 include/linux/qed/qed_iscsi_if.h
> > 
> > diff --git a/drivers/net/ethernet/qlogic/Kconfig 
> > b/drivers/net/ethernet/qlogic/Kconfig
> > index 0df1391f9..bad4fae 100644
> > --- a/drivers/net/ethernet/qlogic/Kconfig
> > +++ b/drivers/net/ethernet/qlogic/Kconfig
> > @@ -118,4 +118,19 @@ config INFINIBAND_QEDR
> >   for QLogic QED. This would be replaced by the 'real' option
> >   once the QEDR driver is added [+relocated].
> >  
> > +config QED_ISCSI
> > +   bool
> > +
> > +config QEDI
> > +   tristate "QLogic QED 25/40/100Gb iSCSI driver"
> > +   depends on QED
> > +   select QED_LL2
> > +   select QED_ISCSI
> > +   default n
> > +   ---help---
> > + This provides a temporary node that allows the compilation
> > + and logical testing of the hardware offload iSCSI support
> > + for QLogic QED. This would be replaced by the 'real' option
> > + once the QEDI driver is added [+relocated].
> > +
> >  endif # NET_VENDOR_QLOGIC
> > diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
> > b/drivers/net/ethernet/qlogic/qed/Makefile
> > index cda0af7..b76669c 100644
> > --- a/drivers/net/ethernet/qlogic/qed/Makefile
> > +++ b/drivers/net/ethernet/qlogic/qed/Makefile
> > @@ -6,3 +6,4 @@ qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o 
> > qed_init_ops.o \
> >  qed-$(CONFIG_QED_SRIOV) += qed_sriov.o qed_vf.o
> >  qed-$(CONFIG_QED_LL2) += qed_ll2.o
> >  qed-$(CONFIG_INFINIBAND_QEDR) += qed_roce.o
> > +qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
> > b/drivers/net/ethernet/qlogic/qed/qed.h
> > index 653bb57..a61b1c0 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed.h
> > +++ b/drivers/net/ethernet/qlogic/qed/qed.h
> > @@ -35,6 +35,7 @@
> >  
> >  #define QED_WFQ_UNIT   100
> >  
> > +#define ISCSI_BDQ_ID(_port_id) (_port_id)
> 
> This looks a bit odd to me.
> 
> [...]
> 
> >  #endif
> > +   if (IS_ENABLED(CONFIG_QEDI) &&
> > +   p_hwfn->hw_info.personality == QED_PCI_ISCSI)
> > +   qed_iscsi_free(p_hwfn, p_hwfn->p_iscsi_info);
> 
> 
> Why not introduce a small helper like:
> static inline bool qed_is_iscsi_personality()
> {
>   return IS_ENABLED(CONFIG_QEDI) && p_hwfn->hw_info.personality ==
>   QED_PCI_ISCSI;
> }

I think I can remove the IS_ENABLED() check in places like this
and have the check contained in header file. qed_iscsi_free()
already is taken care, if I do the same fore qed_ooo*, I think
the check would just be "p_hwfn->hw_info.personality ==
QED_PCI_ISCSI", which would keep it consistent with the other
areas where similar check is done for other protocols.

> 
> > qed_iov_free(p_hwfn);
> 
> [...]
> 
> > +
> > +   if (!GET_FIELD(p_ramrod->iscsi.flags,
> > +  ISCSI_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B)) {
> > +   p_tcp = &p_ramrod->tcp;
> > +   ucval = p_conn->local_mac[1];
> > +   ((u8 *)(&p_tcp->local_mac_addr_hi))[0] = ucval;
> > +   ucval = p_conn->local_mac[0];
> > +   ((u8 *)(&p_tcp->local_mac_addr_hi))[1] = ucval;
> > +   ucval = p_conn->local_mac[3];
> > +   ((u8 *)(&p_tcp->local_mac_addr_mid))[0] = ucval;
> > +   ucval = p_conn->local_mac[2];
> > +

Re: [RFC 2/6] qed: Add iSCSI out of order packet handling.

2016-10-19 Thread Arun Easi

On Wed, 19 Oct 2016, 2:39am, Johannes Thumshirn wrote:

> On Wed, Oct 19, 2016 at 01:01:09AM -0400, manish.rangan...@cavium.com wrote:
> > From: Yuval Mintz 
> > 
> > This patch adds out of order packet handling for hardware offloaded
> > iSCSI. Out of order packet handling requires driver buffer allocation
> > and assistance.
> > 
> > Signed-off-by: Arun Easi 
> > Signed-off-by: Yuval Mintz 
> > ---
> 
> [...]
> 
> > +   if (IS_ENABLED(CONFIG_QEDI) &&
> > +   p_ll2_conn->conn_type == QED_LL2_TYPE_ISCSI_OOO) {
> 
> If you're going to implement the qed_is_iscsi_personallity() helper, please
> consider a qed_ll2_is_iscsi_() as well.

I see that I can avoid the IS_ENABLED() here as well. I will fix this
in the next revision.

> 
> > +   struct qed_ooo_buffer *p_buffer;
> 
> [...]
> 
> > +   while (cq_new_idx != cq_old_idx) {
> > +   struct core_rx_fast_path_cqe *p_cqe_fp;
> > +
> > +   cqe = qed_chain_consume(&p_rx->rcq_chain);
> > +   cq_old_idx = qed_chain_get_cons_idx(&p_rx->rcq_chain);
> > +   cqe_type = cqe->rx_cqe_sp.type;
> > +
> > +   if (cqe_type != CORE_RX_CQE_TYPE_REGULAR) {
> > +   DP_NOTICE(p_hwfn,
> > + "Got a non-regular LB LL2 completion [type 
> > 0x%02x]\n",
> > + cqe_type);
> > +   return -EINVAL;
> > +   }
> > +   p_cqe_fp = &cqe->rx_cqe_fp;
> > +
> > +   placement_offset = p_cqe_fp->placement_offset;
> > +   parse_flags = le16_to_cpu(p_cqe_fp->parse_flags.flags);
> > +   packet_length = le16_to_cpu(p_cqe_fp->packet_length);
> > +   vlan = le16_to_cpu(p_cqe_fp->vlan);
> > +   iscsi_ooo = (struct ooo_opaque *)&p_cqe_fp->opaque_data;
> > +   qed_ooo_save_history_entry(p_hwfn, p_hwfn->p_ooo_info,
> > +  iscsi_ooo);
> > +   cid = le32_to_cpu(iscsi_ooo->cid);
> > +
> > +   /* Process delete isle first */
> > +   if (iscsi_ooo->drop_size)
> > +   qed_ooo_delete_isles(p_hwfn, p_hwfn->p_ooo_info, cid,
> > +iscsi_ooo->drop_isle,
> > +iscsi_ooo->drop_size);
> > +
> > +   if (iscsi_ooo->ooo_opcode == TCP_EVENT_NOP)
> > +   continue;
> > +
> > +   /* Now process create/add/join isles */
> > +   if (list_empty(&p_rx->active_descq)) {
> > +   DP_NOTICE(p_hwfn,
> > + "LL2 OOO RX chain has no submitted 
> > buffers\n");
> > +   return -EIO;
> > +   }
> > +
> > +   p_pkt = list_first_entry(&p_rx->active_descq,
> > +struct qed_ll2_rx_packet, list_entry);
> > +
> > +   if ((iscsi_ooo->ooo_opcode == TCP_EVENT_ADD_NEW_ISLE) ||
> > +   (iscsi_ooo->ooo_opcode == TCP_EVENT_ADD_ISLE_RIGHT) ||
> > +   (iscsi_ooo->ooo_opcode == TCP_EVENT_ADD_ISLE_LEFT) ||
> > +   (iscsi_ooo->ooo_opcode == TCP_EVENT_ADD_PEN) ||
> > +   (iscsi_ooo->ooo_opcode == TCP_EVENT_JOIN)) {
> > +   if (!p_pkt) {
> > +   DP_NOTICE(p_hwfn,
> > + "LL2 OOO RX packet is not valid\n");
> > +   return -EIO;
> > +   }
> > +   list_del(&p_pkt->list_entry);
> > +   p_buffer = (struct qed_ooo_buffer *)p_pkt->cookie;
> > +   p_buffer->packet_length = packet_length;
> > +   p_buffer->parse_flags = parse_flags;
> > +   p_buffer->vlan = vlan;
> > +   p_buffer->placement_offset = placement_offset;
> > +   qed_chain_consume(&p_rx->rxq_chain);
> > +   list_add_tail(&p_pkt->list_entry, &p_rx->free_descq);
> > +
> > +   switch (iscsi_ooo->ooo_opcode) {
> > +   case TCP_EVENT_ADD_NEW_ISLE:
> > +   qed_ooo_add_new_isle(p_hwfn,
> > +p_hwfn->p_ooo_info,
> > +cid,
> > +iscsi_ooo->ooo_isle,
> > +p_buffer);
> > +   break;
> > +   case TCP_EVENT_ADD_ISLE_RIGHT:
> > +   qed_ooo_add_new_buffer(p_hwfn,
> > +  p_hwfn->p_ooo_info,
> > +  cid,
> > +  iscsi_ooo->ooo_isle,
> > +  p_buffer,
> > +  QED_OOO_RIGHT_BUF);
> > +   break;
> > +   case TCP_EVENT_ADD_ISLE_LEFT:
> >

Re: [PATCH 08/10] ufs-qcom: phy/hcd: Refactoring phy clock handling

2016-10-19 Thread Vivek Gautam

On Thu, Oct 20, 2016 at 12:48 AM, Subhash Jadavani
 wrote:
> On 2016-10-19 10:45, Vivek Gautam wrote:
>>
>> Hi,
>>
>>
>> On Wed, Oct 19, 2016 at 1:43 AM, Subhash Jadavani
>>  wrote:
>>>
>>> On 2016-10-18 07:28, Vivek Gautam wrote:


 Add phy clock enable code to phy_power_on/off callbacks, and
 remove explicit calls to enable these phy clocks from the
 ufs-qcom hcd driver.

 Signed-off-by: Vivek Gautam 
 ---

 Changes since v1:
  - staticized ufs_qcom_phy_enable(/disable)_ref_clk(),
  - staticized ufs_qcom_phy_enable(/disable)_iface_clk()
  - removed function declaration and export symbol for these APIs.
>>
>>
>> [snip]
>>
 --- a/drivers/scsi/ufs/ufs-qcom.c
 +++ b/drivers/scsi/ufs/ufs-qcom.c
 @@ -1112,17 +1112,6 @@ static int ufs_qcom_setup_clocks(struct ufs_hba
 *hba, bool on)
 return 0;

 if (on) {
 -   err = ufs_qcom_phy_enable_iface_clk(host->generic_phy);
 -   if (err)
 -   goto out;
 -
 -   err = ufs_qcom_phy_enable_ref_clk(host->generic_phy);
 -   if (err) {
 -   dev_err(hba->dev, "%s enable phy ref clock
 failed,
 err=%d\n",
 -   __func__, err);
 -
 ufs_qcom_phy_disable_iface_clk(host->generic_phy);
 -   goto out;
 -   }
>>>
>>>
>>>
>>> Now that you are moving these ref clk enable/disable to phy_power_on/off
>>> and
>>> these phy_power_on/off are called only in runtime suspend/resume (3
>>> seconds
>>> after last UFS access).
>>> Goal is to disable the phy reference clock during aggressive gating (10ms
>>> from last UFS access) so shouldn't we call the phy_power_on/off from
>>> these
>>> setup_clocks() function as well?
>>>
>>
>> So setup_clocks() is called for aggressive clock gating as well ?
>> If that's the case then yes, we may need to call. But we should try to
>> understand here. The phy_power_off turns off all the clocks - reflclk,
>> and other interface clocks. Do we want all of them to be turned off ?
>
>
> Yes, we want to turn off the ref clock (& other clocks) during aggressive
> gating.

Ok.

>
>>
>> phy_power_off will also turn off the PHY. Do we want all this for
>> aggressive
>> clock gating ?
>
>
> Yes, PHY rails can be powered off both during the aggressive clk gating and
> runtime suspend. But as the regulator on/off latencies could be higher
> (especially for the shared rail), we were turning them off doing it in
> runtime suspend only (via phy_power_off()).
> Now that phy_power_on/off is managing both clocks and regulators, and we
> will really want to turn off the ref clocks during clock gating, there is no
> option but to call phy_power_on/off during aggressive gating.

Alright then, i will update the change to add phy_power_off/on()
during aggressive clock gating.


Thanks
Vivek

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Device or HBA QD throttling creates holes in Sequential work load

2016-10-19 Thread Kashyap Desai

Reply to see if email reached to linux-scsi@vger.kernel.org.


On Thu, Oct 20, 2016 at 12:07 PM, Kashyap Desai
 wrote:
> Hi,
>
> I am doing some performance tuning in MR driver to understand how sdev queue
> depth and hba queue depth play role in IO submission from above layer.
>
>  I have 24 JBOD connected to MR 12GB controller and I can see performance
> for 4K Sequential work load as below.
>
>  HBA QD for MR controller is 4065 and Per device QD is set to 32
>
>
>
> queue depth from  256 reports 300K IOPS
>
> queue depth from  128 reports 330K IOPS
>
> queue depth from  64 reports 360K IOPS
>
> queue depth from  32 reports 510K IOPS
>
>
>
> In MR driver I added debug print and confirm that more IO come to driver as
> random IO whenever I have  queue depth more than 32.
>
> I have debug using scsi logging level and blktrace as well. Below is snippet
> of logs using scsi logging level.  In summary, if SML do flow control of IO
> due to Device QD or HBA QD,  IO coming to LLD is more random pattern.
>
>
>
> I see IO coming to driver is not sequential.
>
>
>
> [79546.912041] sd 18:2:21:0: [sdy] tag#854 CDB: Write(10) 2a 00 00 03 c0 3b
> 00 00 01 00 [79546.912049] sd 18:2:21:0: [sdy] tag#855 CDB: Write(10) 2a 00
> 00 03 c0 3c 00 00 01 00 [79546.912053] sd 18:2:21:0: [sdy] tag#886 CDB:
> Write(10) 2a 00 00 03 c0 5b 00 00 01 00 <- After 3c it jumps to 5b. Sequence
> are overlapped. Due to sdev QD throttling.
>
> [79546.912056] sd 18:2:21:0: [sdy] tag#887 CDB: Write(10) 2a 00 00 03 c0 5c
> 00 00 01 00 [79546.912250] sd 18:2:21:0: [sdy] tag#856 CDB: Write(10) 2a 00
> 00 03 c0 3d 00 00 01 00 [79546.912257] sd 18:2:21:0: [sdy] tag#888 CDB:
> Write(10) 2a 00 00 03 c0 5d 00 00 01 00 [79546.912259] sd 18:2:21:0: [sdy]
> tag#857 CDB: Write(10) 2a 00 00 03 c0 3e 00 00 01 00 [79546.912268] sd
> 18:2:21:0: [sdy] tag#858 CDB: Write(10) 2a 00 00 03 c0 3f 00 00 01 00
>
>
>
> If scsi_request_fn() breaks due to unavailability of device queue (due to
> below check), will there be any side defect as I observe ?
>
> if (!scsi_dev_queue_ready(q, sdev))
>
>  break;
>
>  If I reduce HBA QD and make sure IO from above layer is throttled due to
> HBA QD, there is a same impact.  MR driver use host wide shared tag map.
>
>  Can someone help me if this can be tunable in LLD providing additional
> settings or it is expected behavior ? Problem I am facing is, I am not able
> to figure out optimal device queue depth for different configuration and
> work load.
>
>  ` Kashyap
>
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

72 matches

Mail list logo