from:"Steve Wise"

Re: [PATCH] IB/cma: Fix RDMA port validation for iWarp

2016-01-06 Thread Steve Wise


On 1/6/2016 7:06 AM, Matan Barak wrote:

cma_validate_port wrongly assumed that Ethernet devices are RoCE
devices and thus their ndev should be matched in the GID table.
This broke the iWrap support. Fixing that matching the ndev only if


Typo "iWrap"

Reviewed-by: Steve Wise <sw...@opengridcomputing.com>


we work on a RoCE port.

Fixes: abae1b71dd37 ('IB/cma: cma_validate_port should verify the port
 and netdevice')
Reported-by: Hariprasad Shenai <haripra...@chelsio.com>
Tested-by: Hariprasad Shenai <haripra...@chelsio.com>
Signed-off-by: Matan Barak <mat...@mellanox.com>
---

Hi Doug,

This patch fixes an iWarp issue that was introduced in the RoCE
refactoring series.

Regards,
Matan

  drivers/infiniband/core/cma.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2d762a2..17a15c5 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -453,7 +453,7 @@ static inline int cma_validate_port(struct ib_device 
*device, u8 port,
if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
return ret;
  
-	if (dev_type == ARPHRD_ETHER)

+   if (dev_type == ARPHRD_ETHER && rdma_protocol_roce(device, port))
ndev = dev_get_by_index(_net, bound_if_index);
  
  	ret = ib_find_cached_gid_by_port(device, gid, port, ndev, NULL);


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: device attr cleanup

2016-01-05 Thread Steve Wise



> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Doug Ledford
> Sent: Wednesday, December 23, 2015 9:19 PM
> To: J. Bruce Fields; Chuck Lever
> Cc: Christoph Hellwig; Anna Schumaker; Jason Gunthorpe; 
> linux-rdma@vger.kernel.org; ira.weiny; Or Gerlitz; Steve Wise; Or Gerlitz;
Sagi
> Grimberg
> Subject: Re: device attr cleanup
> 
> On 12/23/2015 04:31 PM, J. Bruce Fields wrote:
> > On Thu, Dec 10, 2015 at 07:49:59PM -0500, Chuck Lever wrote:
> >>
> >>> On Dec 10, 2015, at 6:30 PM, Christoph Hellwig <h...@infradead.org> wrote:
> >>>
> >>> On Thu, Dec 10, 2015 at 11:07:03AM -0700, Jason Gunthorpe wrote:
> >>>> The ARM folks do this sort of stuff on a regular basis.. Very early on
> >>>> Doug prepares a topic branch with only the big change, NFS folks pull
> >>>> it and then pull your work. Then Doug would send the topic branch to
> >>>> Linus as soon as the merge window opens, then NFS would send theirs.
> >>>>
> >>>> This is alot less work overall than trying to sequence multiple
> >>>> patches over multiple releases..
> >>>
> >>> Agreed.  Staging has alaways been a giant pain and things tend to never
> >>> finish moving over that way if they are non-trivial enough.
> >>
> >> In that case:
> >>
> >> You need to make sure you have all the right Acks. I've added
> >> Anna and Bruce to Ack the NFS-related portions. Santosh should
> >> Ack the RDS part.
> >>
> >> http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/ib_device_attr
> >
> > Fine by me.
> >
> >> Given the proximity to the holidays and the next merge window,
> >> Doug will need to get a properly-acked topic branch published
> >> in the next day or two so the rest of us can rebase and start
> >> testing before the relevant parties disappear for the holiday.
> >
> > What branch should I be working from?
> 
> That patch was very intrusive (and I didn't like the change to the
> structure organization).  An alternative patch was proposed and I took
> it instead.  The patch I took is much less intrusive, but you might
> still need to adjust things slightly.  I've pushed my current WIP
> for-next branch to my github repo:
> 
> git://github.com/dledford/linux.git for-next
> 
> This branch might get rebased still yet before it gets pushed to my
> official repo at k.o, but it is perfectly fine to check that your
> patches will merge with my for-next branch without conflicts.
>

Hey Doug, I don't see this branch.  Which branch has the accepted device attr 
change?

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ANNOUNCE] libcxgb4-1.3.6 release

2015-12-18 Thread Steve Wise

Hello,

A new release of libcxgb4, version 1.3.6, is available at:

https://www.openfabrics.org/downloads/cxgb4/libcxgb4-1.3.6.tar.gz

with md5sum:

129b8cd955c6de29258697f98dfbb351  libcxgb4-1.3.6.tar.gz

I also pushed this to:

git://git.openfabrics.org/~swise/libcxgb4

The tag is named v1.3.6.

Change shortlog:

Hariprasad Shenai (1):
  Gracefully handle unknown CQE status errors

Steve Wise (6):
  libcxgb4: remove config/missing
  libcxgb4: update config.sub and config.guess
  libcxgb4: add NULL check for wr->sg_list
  libcxgb4:Eliminate warnings seen with buildall.sh on a 32 bit system.
  libcxgb4: skip TERMINATE CQEs before testing for errors
  Spin release 1.3.6.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-10 Thread Steve Wise



> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Thursday, December 10, 2015 10:08 AM
> To: linux-rdma@vger.kernel.org
> Cc: ira.weiny; Christoph Hellwig; Jason Gunthorpe; Or Gerlitz; Steve Wise; Or 
> Gerlitz; Sagi Grimberg; Doug Ledford
> Subject: Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)
> 
> 
> > On Dec 10, 2015, at 3:27 AM, Sagi Grimberg <sa...@dev.mellanox.co.il> wrote:
> >
> >
> >
> >> Doug this is going to conflict with the rdmavt work.  So if you take this 
> >> could
> >> you respond on the list.
> >
> > It will also conflict with the iser remote invalidate series.
> >
> > Doug it would help if you share your plans so people can rebase
> > accordingly.
> 
> I would be remiss not to mention that it probably also
> conflicts with the NFS server bi-directional RPC/RDMA
> series.
> 
> Invasive IB core changes like this clean up are especially
> burdensome for me because NFS/RDMA changes do not normally
> go through Doug's tree, so it takes extra co-ordination.
> 
> Here is a modest proposal. An obvious way to split the
> device attr cleanup might go like this:
> 
> a. first patch: add new fields to ib_device
> b. then one patch for each provider to populate these fields
> c. then one patch for each kernel ULP to use the new fields
> d. then one patch for each provider to remove ->query_attr
> e. last patch: remove ib_device_attr from the IB core
> 
> That way each provider and ULP maintainer can review and
> ack the portion of the changes that he or she is responsible
> for, and it should help make it much easier to merge with
> conflicting changes.
> 
> Splitting it across more than one kernel release would be
> helpful too, IMO. a. and b. can go into 4.5, c. into 4.6,
> and d. and e. can go in any time after that.
> 
> This adds more "process" but given the long chain of core
> changes now in plan, we should acknowledge how disruptive
> they will be, and come up with ways to make it possible to
> get other work done while the core maintenance work
> progresses.
> 
> 

The approach sounds reasonable to me.

Steve.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: warning in ext4 with nfs/rdma server

2015-12-08 Thread Steve Wise



> -Original Message-
> From: Chuck Lever [mailto:chuck.le...@oracle.com]
> Sent: Monday, December 07, 2015 9:45 AM
> To: Steve Wise
> Cc: linux-rdma@vger.kernel.org; Veeresh U. Kokatnur; Linux NFS Mailing List
> Subject: Re: warning in ext4 with nfs/rdma server
> 
> Hi Steve-
> 
> > On Dec 7, 2015, at 10:38 AM, Steve Wise <sw...@opengridcomputing.com> wrote:
> >
> > Hey Chuck/NFS developers,
> >
> > We're hitting this warning in ext4 on the linux-4.3 nfs server running over 
> > RDMA/cxgb4.  We're still gathering data, like if it
> > happens with NFS/TCP.  But has anyone seen this warning on 4.3?  Is it 
> > likely to indicate some bug in the xprtrdma transport or
> > above it in NFS?
> 
> Yes, please confirm with NFS/TCP. Thanks!
>

The same thing happens with NFS/TCP, so this isn't related to xprtrdma.
 
> 
> > We can hit this running cthon tests over 2 mount points:
> >
> > -
> > #!/bin/bash
> > rm -rf /root/cthon04/loop_iter.txt
> > while [ 1 ]
> > do
> > {
> >
> > ./server -s -m /mnt/share1 -o rdma,port=20049,vers=4 -p /mnt/share1 -N 100
> > 102.1.1.162 &
> > ./server -s -m /mnt/share2 -o 
> > rdma,port=20049,vers=3,rsize=65535,wsize=65535 -p
> > /mnt/share2 -N 100 102.2.2.162 &
> > wait
> > echo "iteration $i" >>/root/cthon04/loop_iter.txt
> > date >>/root/cthon04/loop_iter.txt
> > }
> > done
> > --
> >
> > Thanks,
> >
> > Steve.
> >
> > [ cut here ]
> > WARNING: CPU: 14 PID: 6689 at fs/ext4/inode.c:231 
> > ext4_evict_inode+0x41e/0x490
> > [ext4]()
> > Modules linked in: nfsd(E) lockd(E) grace(E) nfs_acl(E) exportfs(E)
> > auth_rpcgss(E) rpcrdma(E) sunrpc(E) rdma_ucm(E) ib_uverbs(E) rdma_cm(E)
> > ib_cm(E) ib_sa(E) ib_mad(E) iw_cxgb4(E) iw_cm(E) ib_core(E) ib_addr(E) 
> > cxgb4(E)
> > autofs4(E) target_core_iblock(E) target_core_file(E) target_core_pscsi(E)
> > target_core_mod(E) configfs(E) bnx2fc(E) cnic(E) uio(E) fcoe(E) libfcoe(E)
> > 8021q(E) libfc(E) garp(E) stp(E) llc(E) cpufreq_ondemand(E) cachefiles(E)
> > fscache(E) ipv6(E) dm_mirror(E) dm_region_hash(E) dm_log(E) vhost_net(E)
> > macvtap(E) macvlan(E) vhost(E) tun(E) kvm(E) uinput(E) microcode(E) sg(E)
> > pcspkr(E) serio_raw(E) fam15h_power(E) k10temp(E) amd64_edac_mod(E)
> > edac_core(E) edac_mce_amd(E) i2c_piix4(E) igb(E) dca(E) i2c_algo_bit(E)
> > i2c_core(E) ptp(E) pps_core(E) scsi_transport_fc(E) acpi_cpufreq(E) 
> > dm_mod(E)
> > ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) ahci(E) libahci(E)
> > [last unloaded: cxgb4]
> > CPU: 14 PID: 6689 Comm: nfsd Tainted: GE   4.3.0 #1
> > Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.512/19/2013
> > 00e7 88400634fad8 812a4084 a00c96eb
> >  88400634fb18 81059fd5 88400634fbd8
> > 880fd1a460c8 880fd1a461d8 880fd1a46008 88400634fbd8
> > Call Trace:
> > [] dump_stack+0x48/0x64
> > [] warn_slowpath_common+0x95/0xe0
> > [] warn_slowpath_null+0x1a/0x20
> > [] ext4_evict_inode+0x41e/0x490 [ext4]
> > [] evict+0xae/0x1a0
> > [] iput_final+0xe5/0x170
> > [] iput+0xa3/0xf0
> > [] ? fsnotify_destroy_marks+0x64/0x80
> > [] dentry_unlink_inode+0xa9/0xe0
> > [] d_delete+0xa6/0xb0
> > [] vfs_unlink+0x138/0x140
> > [] nfsd_unlink+0x165/0x200 [nfsd]
> > [] ? lru_put_end+0x5c/0x70 [nfsd]
> > [] nfsd3_proc_remove+0x83/0x120 [nfsd]
> > [] nfsd_dispatch+0xdc/0x210 [nfsd]
> > [] svc_process_common+0x311/0x620 [sunrpc]
> > [] ? nfsd_set_nrthreads+0x1b0/0x1b0 [nfsd]
> > [] svc_process+0x128/0x1b0 [sunrpc]
> > [] nfsd+0xf3/0x160 [nfsd]
> > [] kthread+0xcc/0xf0
> > [] ? schedule_tail+0x1e/0xc0
> > [] ? kthread_freezable_should_stop+0x70/0x70
> > [] ret_from_fork+0x3f/0x70
> > [] ? kthread_freezable_should_stop+0x70/0x70
> > ---[ end trace 39afe9aeef2cfb34 ]---
> > [ cut here ]
> 
> --
> Chuck Lever
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

warning in ext4 with nfs/rdma server

2015-12-07 Thread Steve Wise

Hey Chuck/NFS developers,

We're hitting this warning in ext4 on the linux-4.3 nfs server running over 
RDMA/cxgb4.  We're still gathering data, like if it
happens with NFS/TCP.  But has anyone seen this warning on 4.3?  Is it likely 
to indicate some bug in the xprtrdma transport or
above it in NFS?

We can hit this running cthon tests over 2 mount points:

-
#!/bin/bash
rm -rf /root/cthon04/loop_iter.txt
while [ 1 ]
do
{

./server -s -m /mnt/share1 -o rdma,port=20049,vers=4 -p /mnt/share1 -N 100
102.1.1.162 &
./server -s -m /mnt/share2 -o rdma,port=20049,vers=3,rsize=65535,wsize=65535 -p
/mnt/share2 -N 100 102.2.2.162 &
wait
echo "iteration $i" >>/root/cthon04/loop_iter.txt
date >>/root/cthon04/loop_iter.txt
}
done
--

Thanks,

Steve.

[ cut here ]
WARNING: CPU: 14 PID: 6689 at fs/ext4/inode.c:231 ext4_evict_inode+0x41e/0x490
[ext4]()
Modules linked in: nfsd(E) lockd(E) grace(E) nfs_acl(E) exportfs(E)
auth_rpcgss(E) rpcrdma(E) sunrpc(E) rdma_ucm(E) ib_uverbs(E) rdma_cm(E)
ib_cm(E) ib_sa(E) ib_mad(E) iw_cxgb4(E) iw_cm(E) ib_core(E) ib_addr(E) cxgb4(E)
autofs4(E) target_core_iblock(E) target_core_file(E) target_core_pscsi(E)
target_core_mod(E) configfs(E) bnx2fc(E) cnic(E) uio(E) fcoe(E) libfcoe(E)
8021q(E) libfc(E) garp(E) stp(E) llc(E) cpufreq_ondemand(E) cachefiles(E)
fscache(E) ipv6(E) dm_mirror(E) dm_region_hash(E) dm_log(E) vhost_net(E)
macvtap(E) macvlan(E) vhost(E) tun(E) kvm(E) uinput(E) microcode(E) sg(E)
pcspkr(E) serio_raw(E) fam15h_power(E) k10temp(E) amd64_edac_mod(E)
edac_core(E) edac_mce_amd(E) i2c_piix4(E) igb(E) dca(E) i2c_algo_bit(E)
i2c_core(E) ptp(E) pps_core(E) scsi_transport_fc(E) acpi_cpufreq(E) dm_mod(E)
ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) ahci(E) libahci(E)
[last unloaded: cxgb4]
CPU: 14 PID: 6689 Comm: nfsd Tainted: GE   4.3.0 #1
Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.512/19/2013
 00e7 88400634fad8 812a4084 a00c96eb
  88400634fb18 81059fd5 88400634fbd8
 880fd1a460c8 880fd1a461d8 880fd1a46008 88400634fbd8
Call Trace:
 [] dump_stack+0x48/0x64
 [] warn_slowpath_common+0x95/0xe0
 [] warn_slowpath_null+0x1a/0x20
 [] ext4_evict_inode+0x41e/0x490 [ext4]
 [] evict+0xae/0x1a0
 [] iput_final+0xe5/0x170
 [] iput+0xa3/0xf0
 [] ? fsnotify_destroy_marks+0x64/0x80
 [] dentry_unlink_inode+0xa9/0xe0
 [] d_delete+0xa6/0xb0
 [] vfs_unlink+0x138/0x140
 [] nfsd_unlink+0x165/0x200 [nfsd]
 [] ? lru_put_end+0x5c/0x70 [nfsd]
 [] nfsd3_proc_remove+0x83/0x120 [nfsd]
 [] nfsd_dispatch+0xdc/0x210 [nfsd]
 [] svc_process_common+0x311/0x620 [sunrpc]
 [] ? nfsd_set_nrthreads+0x1b0/0x1b0 [nfsd]
 [] svc_process+0x128/0x1b0 [sunrpc]
 [] nfsd+0xf3/0x160 [nfsd]
 [] kthread+0xcc/0xf0
 [] ? schedule_tail+0x1e/0xc0
 [] ? kthread_freezable_should_stop+0x70/0x70
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_freezable_should_stop+0x70/0x70
---[ end trace 39afe9aeef2cfb34 ]---
[ cut here ]

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v1 03/10] IB/iser: Don't register memory for all immediatedata writes

2015-11-24 Thread Steve Wise



> -Original Message-
> From: Sagi Grimberg [mailto:sa...@mellanox.com]
> Sent: Tuesday, November 24, 2015 10:24 AM
> To: linux-rdma@vger.kernel.org; target-de...@vger.kernel.org
> Cc: Nicholas A. Bellinger; Or Gerlitz; Jenny Derzhavetz; Steve Wise
> Subject: [PATCH v1 03/10] IB/iser: Don't register memory for all 
> immediatedata writes
> 
> From: Jenny Derzhavetz <jen...@mellanox.com>
> 
> When all the task data is sent as immeidatedata, we are

nit: the above should be "immediate data,"

> allowed to use the local_dma_lkey as it is not sent to
> the wire. In the long run we'd really need to rework
> the memory registration flow only when we need rkeys.
> 
> Signed-off-by: Jenny Derzhavetz <jen...@mellanox.com>
> Signed-off-by: Sagi Grimberg <sa...@mellanox.com>
> ---
>  drivers/infiniband/ulp/iser/iscsi_iser.h |  3 ++-
>  drivers/infiniband/ulp/iser/iser_initiator.c |  5 +++--
>  drivers/infiniband/ulp/iser/iser_memory.c| 13 +
>  3 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
> b/drivers/infiniband/ulp/iser/iscsi_iser.h
> index 233ec0c2ae3d..7b5cf1332ddb 100644
> --- a/drivers/infiniband/ulp/iser/iscsi_iser.h
> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
> @@ -650,7 +650,8 @@ void iser_finalize_rdma_unaligned_sg(struct 
> iscsi_iser_task *iser_task,
>enum iser_data_dir cmd_dir);
> 
>  int iser_reg_rdma_mem(struct iscsi_iser_task *task,
> -   enum iser_data_dir dir);
> +   enum iser_data_dir dir,
> +   bool all_imm);
>  void iser_unreg_rdma_mem(struct iscsi_iser_task *task,
>enum iser_data_dir dir);
> 
> diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
> b/drivers/infiniband/ulp/iser/iser_initiator.c
> index ffd00c420729..07bf26427ee7 100644
> --- a/drivers/infiniband/ulp/iser/iser_initiator.c
> +++ b/drivers/infiniband/ulp/iser/iser_initiator.c
> @@ -72,7 +72,7 @@ static int iser_prepare_read_cmd(struct iscsi_task *task)
>   return err;
>   }
> 
> - err = iser_reg_rdma_mem(iser_task, ISER_DIR_IN);
> + err = iser_reg_rdma_mem(iser_task, ISER_DIR_IN, false);
>   if (err) {
>   iser_err("Failed to set up Data-IN RDMA\n");
>   return err;
> @@ -126,7 +126,8 @@ iser_prepare_write_cmd(struct iscsi_task *task,
>   return err;
>   }
> 
> - err = iser_reg_rdma_mem(iser_task, ISER_DIR_OUT);
> + err = iser_reg_rdma_mem(iser_task, ISER_DIR_OUT,
> + buf_out->data_len == imm_sz);
>   if (err != 0) {
>   iser_err("Failed to register write cmd RDMA mem\n");
>   return err;
> diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
> b/drivers/infiniband/ulp/iser/iser_memory.c
> index b7a2b88f48ce..62d0578388d3 100644
> --- a/drivers/infiniband/ulp/iser/iser_memory.c
> +++ b/drivers/infiniband/ulp/iser/iser_memory.c
> @@ -190,7 +190,11 @@ iser_reg_dma(struct iser_device *device, struct 
> iser_data_buf *mem,
>   struct scatterlist *sg = mem->sg;
> 
>   reg->sge.lkey = device->pd->local_dma_lkey;
> - reg->rkey = device->mr->rkey;
> + /*
> +  * FIXME: rework the registration code path to differentiate
> +  * rkey/lkey use cases
> +  */
> + reg->rkey = device->mr ? device->mr->rkey : 0;
>   reg->sge.addr = ib_sg_dma_address(device->ib_device, [0]);
>   reg->sge.length = ib_sg_dma_len(device->ib_device, [0]);
> 
> @@ -495,7 +499,8 @@ iser_reg_data_sg(struct iscsi_iser_task *task,
>  }
> 
>  int iser_reg_rdma_mem(struct iscsi_iser_task *task,
> -   enum iser_data_dir dir)
> +   enum iser_data_dir dir,
> +   bool all_imm)
>  {
>   struct ib_conn *ib_conn = >iser_conn->ib_conn;
>   struct iser_device *device = ib_conn->device;
> @@ -506,8 +511,8 @@ int iser_reg_rdma_mem(struct iscsi_iser_task *task,
>   bool use_dma_key;
>   int err;
> 
> - use_dma_key = (mem->dma_nents == 1 && !iser_always_reg &&
> -scsi_get_prot_op(task->sc) == SCSI_PROT_NORMAL);
> + use_dma_key = mem->dma_nents == 1 && (all_imm || !iser_always_reg) &&
> +   scsi_get_prot_op(task->sc) == SCSI_PROT_NORMAL;
> 
>   if (!use_dma_key) {
>   desc = device->reg_ops->reg_desc_get(ib_conn);
> --
> 1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v1 00/10] iSER support for remote invalidate

2015-11-24 Thread Steve Wise



> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Sagi Grimberg
> Sent: Tuesday, November 24, 2015 10:24 AM
> To: linux-rdma@vger.kernel.org; target-de...@vger.kernel.org
> Cc: Nicholas A. Bellinger; Or Gerlitz; Jenny Derzhavetz; Steve Wise
> Subject: [PATCH v1 00/10] iSER support for remote invalidate
> 
> This patchset adds remote invalidation support to iser initiator and
> target. The support negotiation for this feature is based on IBTA
> annex 12 "Support for iSCSI Extensions for RDMA" carried in rdma_cm
> private data.
> 
> Remote invalidation allows a peer host to invalidate a remote key
> as part of a SEND operation. This feature allows a host to avoid
> invalidating an rkey locally. By supporting this feature iser initiator
> can save extra latency and processing time yielded by invalidating
> the memory key locally.
> 
> The initiator feature support is dependent on:
> - fastreg is used (not FMR)
> - always_register=Y
> 
> In this case the initiator will expose support for remote invalidation,
> however it will not blindly rely on the target to do so and will verify
> that in the work completion information. The iser target now looks into
> the iser header in the CM request and in case the initiator supports
> remote invalidation it will respond it will use remote invalidation for
> provided remote keys.
>

Series looks ok to me.

Reviewed-by: Steve Wise <sw...@opengridcomputing.com> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3/9] IB: add a helper to safely drain a QP

2015-11-23 Thread Steve Wise



> -Original Message-
> From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il]
> Sent: Monday, November 23, 2015 4:29 AM
> To: Steve Wise; 'Christoph Hellwig'; linux-rdma@vger.kernel.org
> Cc: bart.vanass...@sandisk.com; ax...@fb.com; linux-s...@vger.kernel.org; 
> linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 3/9] IB: add a helper to safely drain a QP
> 
> 
> > That won't work for iWARP.  Is this code new?  I didn't see any errors that 
> > would result from this code when I tested iSER over
> > cxgb4 with the old iwarp support patches.
> 
> Steve,
> 
> I think I figured out why this works with iWARP.
> 
> For iWARP, rdma_disconnect() calls iw_cm_disconnect() with abrupt=0
> which would make iw_cm_disconnect() move the QP into SQ_DRAIN state"
>

Yes.  Note:  SQ_DRAIN == CLOSING state for iWARP QPs.   CLOSING state means the 
transport will try and do an orderly shutdown.
More on this below.
 
> int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt)
> {
>   ...
> 
>  if (qp) {
>  if (abrupt)
>  ret = iwcm_modify_qp_err(qp);
>  else
>  ret = iwcm_modify_qp_sqd(qp);
> 
>  /*
>   * If both sides are disconnecting the QP could
>   * already be in ERR or SQD states
>   */
>  ret = 0;
>   }
> }
> 
> IFAIK, SQD state allows the ULP to post work requests on the send
> queue and expect these work requests to FLUSH.
> 

The iWARP QP states are different from IB unfortunately.  And the way iWARP was 
plugged into the original IB-centric RDMA subsystem,
this difference is not very visible.  Moving an iWARP to CLOSING/SQD begins an 
"orderly close" of the TCP connection.  IE TCP FIN,
FIN/ACK, ACK.   

> So Maybe we should have:
> void ib_drain_qp(struct ib_qp *qp)
> {
>  struct ib_qp_attr attr = { };
>  struct ib_stop_cqe rstop, sstop;
>  struct ib_recv_wr rwr = {}, *bad_rwr;
>  struct ib_send_wr swr = {}, *bad_swr;
>  enum ib_qp_state state;
>  int ret;
> 
>  if rdma_cap_ib_cm(id->device, id->port_num) {
>   state = IB_QPS_ERR;
>  else if rdma_cap_iw_cm(id->device, id->port_num)
>  state = IB_QPS_SQD;
>  else
> return;
> 
>  rwr.wr_cqe = 
>  rstop.cqe.done = ib_stop_done;
>  init_completion();
> 
>  swr.wr_cqe = 
>  sstop.cqe.done = ib_stop_done;
>  swr.send_flags = IB_SEND_SIGNALED;
>  init_completion();
> 
>  attr.qp_state = state;
>  ret = ib_modify_qp(qp, , IB_QP_STATE);
>  if (ret) {
>  WARN_ONCE(ret, "failed to drain QP: %d\n", ret);
>  return;
>  }
> 
>  ret = ib_post_recv(qp, , _rwr);
>  if (ret) {
>  WARN_ONCE(ret, "failed to drain recv queue: %d\n", ret);
>  return;
>  }
> 
>  ret = ib_post_send(qp, , _swr);
>  if (ret) {
>  WARN_ONCE(ret, "failed to drain send queue: %d\n", ret);
>  return;
>  }
> 
>  wait_for_completion();
>  wait_for_completion();
> }
> 
> Thoughts?

The problem with moving the QP -> CLOSING (aka SQD) is this:  as per the iWARP 
Verbs spec, ULPS _must_ quiesce the SQ before moving
it to CLOSING.  IE make sure there are no outstanding SQ WRs.  So the drain 
operation really has to be done _before_ the move to
CLOSING/SQD. :(  If there _are_ outstanding SQ WRs when an attempt to move the 
QP to CLOSING, or an ingress RDMA operation arrives
while the QP is in CLOSING (and doing a TCP fin/fin-ack exchange), the QP is 
immediately moved to ERROR.   Also, no WR posts are
allowed while the QP is in CLOSING, unlike the IB SQD state.

The valid drain logic that I think needs to be implemented to support iWARP is 
one of two methods:

1) as I said before, enhance the ib_qp struct to have a "flush complete" 
completion object, changes the providers to all complete
that object when a) they are in ERROR and b) the SQ and RQ become empty (or is 
already empty).  Then ib_drain_qp() just waits for
this completion.

2) change the iwarp providers to allow posting WRs while in ERROR.  One way is 
do this and still support the requirement that "at
some point while in error, the provider must synchronously fail posts", is to 
allow the posts if the SQ or RQ still has pending WRs,
but fail immediately if the SQ or RQ is already empty.  Thus the "drain" WRs 
issued by iw_drain_qp() would work if they were needed,
and fail immediately if they are not needed.  In either case, the flush 
operation is complete.

I really wish the iWARP spec architects had avoided these sorts of diversions 
from the IB spec

Steve.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3/9] IB: add a helper to safely drain a QP

2015-11-23 Thread Steve Wise



> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Sagi Grimberg
> Sent: Monday, November 23, 2015 4:36 AM
> To: Steve Wise; 'Christoph Hellwig'; linux-rdma@vger.kernel.org
> Cc: bart.vanass...@sandisk.com; ax...@fb.com; linux-s...@vger.kernel.org; 
> linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 3/9] IB: add a helper to safely drain a QP
> 
> 
> > So Maybe we should have:
> > void ib_drain_qp(struct ib_qp *qp)
> 
> Christoph suggested that this flushing would be taken care
> of by rdma_disconnect which sounds even better I think..
> --

Agreed. 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: memory registration updates

2015-11-23 Thread Steve Wise


On 11/22/2015 11:46 AM, Christoph Hellwig wrote:

This series removes huge chunks of code related to old memory
registration methods that we don't use anymore, and then simplifies the
current memory registration API

This expects my "IB: merge struct ib_device_attr into struct ib_device"
patch to be already applied.

Also available as a git tree:

http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-mr
git://git.infradead.org/users/hch/rdma.git rdma-mr




Series looks good.

Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9] IB: add a helper to safely drain a QP

2015-11-18 Thread Steve Wise


On 11/18/2015 8:06 AM, Christoph Hellwig wrote:

On Wed, Nov 18, 2015 at 01:32:19PM +0200, Sagi Grimberg wrote:

Christoph,

Given the discussion around this patch I think it would
be a good idea remove it from the patchset since it's not
mandatory for the CQ abstraction. I think that we should
take it with Steve to come up with a complete solution for
this bit.

Thoughts?

Yes, let's drop it for now.



Fine with me.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/9] amso1100: fold c2_reg_phys_mr into c2_get_dma_mr

2015-11-16 Thread Steve Wise


I think Doug is removing amso1100 now so this patch isn't needed.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/9] cxgb3: simplify iwch_get_dma_wr

2015-11-16 Thread Steve Wise


On 11/15/2015 12:05 PM, Christoph Hellwig wrote:

Fold simplified versions of build_phys_page_list and
iwch_register_phys_mem into iwch_get_dma_wr now that no other callers
are left.

Signed-off-by: Christoph Hellwig 
---
  drivers/infiniband/hw/cxgb3/iwch_mem.c  | 71 ---
  drivers/infiniband/hw/cxgb3/iwch_provider.c | 75 -
  drivers/infiniband/hw/cxgb3/iwch_provider.h |  8 ---
  3 files changed, 30 insertions(+), 124 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c 
b/drivers/infiniband/hw/cxgb3/iwch_mem.c
index 3a5e27d..1d04c87 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -99,74 +99,3 @@ int iwch_write_pbl(struct iwch_mr *mhp, __be64 *pages, int 
npages, int offset)
return cxio_write_pbl(>rhp->rdev, pages,
  mhp->attr.pbl_addr + (offset << 3), npages);
  }
-
-int build_phys_page_list(struct ib_phys_buf *buffer_list,
-   int num_phys_buf,
-   u64 *iova_start,
-   u64 *total_size,
-   int *npages,
-   int *shift,
-   __be64 **page_list)
-{
-   u64 mask;
-   int i, j, n;
-
-   mask = 0;
-   *total_size = 0;
-   for (i = 0; i < num_phys_buf; ++i) {
-   if (i != 0 && buffer_list[i].addr & ~PAGE_MASK)
-   return -EINVAL;
-   if (i != 0 && i != num_phys_buf - 1 &&
-   (buffer_list[i].size & ~PAGE_MASK))
-   return -EINVAL;
-   *total_size += buffer_list[i].size;
-   if (i > 0)
-   mask |= buffer_list[i].addr;
-   else
-   mask |= buffer_list[i].addr & PAGE_MASK;
-   if (i != num_phys_buf - 1)
-   mask |= buffer_list[i].addr + buffer_list[i].size;
-   else
-   mask |= (buffer_list[i].addr + buffer_list[i].size +
-   PAGE_SIZE - 1) & PAGE_MASK;
-   }
-
-   if (*total_size > 0xULL)
-   return -ENOMEM;
-
-   /* Find largest page shift we can use to cover buffers */
-   for (*shift = PAGE_SHIFT; *shift < 27; ++(*shift))
-   if ((1ULL << *shift) & mask)
-   break;
-
-   buffer_list[0].size += buffer_list[0].addr & ((1ULL << *shift) - 1);
-   buffer_list[0].addr &= ~0ull << *shift;
-
-   *npages = 0;
-   for (i = 0; i < num_phys_buf; ++i)
-   *npages += (buffer_list[i].size +
-   (1ULL << *shift) - 1) >> *shift;
-
-   if (!*npages)
-   return -EINVAL;
-
-   *page_list = kmalloc(sizeof(u64) * *npages, GFP_KERNEL);
-   if (!*page_list)
-   return -ENOMEM;
-
-   n = 0;
-   for (i = 0; i < num_phys_buf; ++i)
-   for (j = 0;
-j < (buffer_list[i].size + (1ULL << *shift) - 1) >> *shift;
-++j)
-   (*page_list)[n++] = cpu_to_be64(buffer_list[i].addr +
-   ((u64) j << *shift));
-
-   PDBG("%s va 0x%llx mask 0x%llx shift %d len %lld pbl_size %d\n",
-__func__, (unsigned long long) *iova_start,
-(unsigned long long) mask, *shift, (unsigned long long) 
*total_size,
-*npages);
-
-   return 0;
-
-}
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index b184933..9642ec4a 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -479,24 +479,26 @@ static int iwch_dereg_mr(struct ib_mr *ib_mr)
return 0;
  }
  
-static struct ib_mr *iwch_register_phys_mem(struct ib_pd *pd,

-   struct ib_phys_buf *buffer_list,
-   int num_phys_buf,
-   int acc,
-   u64 *iova_start)
+static struct ib_mr *iwch_get_dma_mr(struct ib_pd *pd, int acc)
  {
+   const u64 total_size = 0x;
+   const u64 mask = (total_size + PAGE_SIZE - 1) & PAGE_MASK;
+   struct iwch_pd *php = to_iwch_pd(pd);
+   struct iwch_dev *rhp = php->rhp;
__be64 *page_list;
int shift;
-   u64 total_size;
-   int npages;
-   struct iwch_dev *rhp;
-   struct iwch_pd *php;
struct iwch_mr *mhp;
-   int ret;
+   int ret, npages, i;
  
  	PDBG("%s ib_pd %p\n", __func__, pd);

-   php = to_iwch_pd(pd);
-   rhp = php->rhp;
+
+   /*
+* T3 only supports 32 bits of size.
+*/
+   if (sizeof(phys_addr_t) > 4) {
+   pr_warn_once(MOD "Cannot support dma_mrs on this platform.\n");
+

Re: [PATCH 3/9] IB: add a helper to safely drain a QP

2015-11-16 Thread Steve Wise


On 11/15/2015 3:34 AM, Sagi Grimberg wrote:



+
+struct ib_stop_cqe {
+struct ib_cqecqe;
+struct completion done;
+};
+
+static void ib_stop_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+struct ib_stop_cqe *stop =
+container_of(wc->wr_cqe, struct ib_stop_cqe, cqe);
+
+complete(>done);
+}
+
+/*
+ * Change a queue pair into the error state and wait until all receive
+ * completions have been processed before destroying it. This avoids 
that

+ * the receive completion handler can access the queue pair while it is
+ * being destroyed.
+ */
+void ib_drain_qp(struct ib_qp *qp)
+{
+struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
+struct ib_stop_cqe stop = { };
+struct ib_recv_wr wr, *bad_wr;
+int ret;
+
+wr.wr_cqe = 
+stop.cqe.done = ib_stop_done;
+init_completion();
+
+ret = ib_modify_qp(qp, , IB_QP_STATE);
+if (ret) {
+WARN_ONCE(ret, "failed to drain QP: %d\n", ret);
+return;
+}
+
+ret = ib_post_recv(qp, , _wr);
+if (ret) {
+WARN_ONCE(ret, "failed to drain QP: %d\n", ret);
+return;
+}
+
+wait_for_completion();
+}


This is taken from srp, and srp drains using a recv wr due to a race
causing a use-after-free condition in srp which re-posts a recv buffer
in the recv completion handler. srp does not really care if there are
pending send flushes.

I'm not sure if there are ordering rules for send/recv queues in
terms of flush completions, meaning that even if all recv flushes
were consumed maybe there are send flushes still pending.

I think that for a general drain helper it would be useful to
make sure that both the recv _and_ send flushes were drained.

So, something like:

void ib_drain_qp(struct ib_qp *qp)
{
struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
struct ib_stop_cqe rstop, sstop;
struct ib_recv_wr rwr = {}, *bad_rwr;
struct ib_send_wr swr = {}, *bad_swr;
int ret;

rwr.wr_cqe = 
rstop.cqe.done = ib_stop_done;
init_completion();

swr.wr_cqe = 
sstop.cqe.done = ib_stop_done;
init_completion();

ret = ib_modify_qp(qp, , IB_QP_STATE);
if (ret) {
WARN_ONCE(ret, "failed to drain QP: %d\n", ret);
return;
}

ret = ib_post_recv(qp, , _rwr);
if (ret) {
WARN_ONCE(ret, "failed to drain recv queue: %d\n", ret);
return;
}

ret = ib_post_send(qp, , _swr);
if (ret) {
WARN_ONCE(ret, "failed to drain send queue: %d\n", ret);
return;
}

wait_for_completion();
wait_for_completion();
}

Thoughts?


This won't work for iWARP as per my previous email.  But I will code 
something up that will.


Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3/9] IB: add a helper to safely drain a QP

2015-11-16 Thread Steve Wise



> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Sagi Grimberg
> Sent: Monday, November 16, 2015 12:38 PM
> To: Steve Wise; 'Christoph Hellwig'; linux-rdma@vger.kernel.org
> Cc: bart.vanass...@sandisk.com; ax...@fb.com; linux-s...@vger.kernel.org; 
> linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 3/9] IB: add a helper to safely drain a QP
> 
> 
> > After looking at the nes driver, I don't see any common way to support 
> > drain w/o some serious driver mods.  Since SRP is the
only
> > user, perhaps we can ignore iWARP for this function...
> 
> But iser/isert essentially does it too (and I think xprtrdma will have
> it soon)...
> 
> the modify_qp is invoked from rdma_disconnect() and we do post
> an 'empty' wr to wait for all the flushes to drain (see
> iser_conn_terminate).

That won't work for iWARP.  Is this code new?  I didn't see any errors that 
would result from this code when I tested iSER over
cxgb4 with the old iwarp support patches.   

Perhaps we need another way to do this?  Like a completion object in the QP 
that gets triggered when the SQ and RQ become empty
after a transition to ERROR (and CLOSING for iwarp).  Then a core service that 
just waits until the QP is empty.  Implementation of
this design would hit the providers though since only they know when the flush 
is completed.

Alternatively, I could enable post-while-in-error support in cxgb4 and ignore 
the spec in this regard.  But I'd rather not do that.
:)

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3/9] IB: add a helper to safely drain a QP

2015-11-16 Thread Steve Wise



> -Original Message-
> From: Steve Wise [mailto:sw...@opengridcomputing.com]
> Sent: Monday, November 16, 2015 10:38 AM
> To: Sagi Grimberg; Christoph Hellwig; linux-rdma@vger.kernel.org
> Cc: bart.vanass...@sandisk.com; ax...@fb.com; linux-s...@vger.kernel.org; 
> linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 3/9] IB: add a helper to safely drain a QP
> 
> On 11/15/2015 3:34 AM, Sagi Grimberg wrote:
> >
> >> +
> >> +struct ib_stop_cqe {
> >> +struct ib_cqecqe;
> >> +struct completion done;
> >> +};
> >> +
> >> +static void ib_stop_done(struct ib_cq *cq, struct ib_wc *wc)
> >> +{
> >> +struct ib_stop_cqe *stop =
> >> +container_of(wc->wr_cqe, struct ib_stop_cqe, cqe);
> >> +
> >> +complete(>done);
> >> +}
> >> +
> >> +/*
> >> + * Change a queue pair into the error state and wait until all receive
> >> + * completions have been processed before destroying it. This avoids
> >> that
> >> + * the receive completion handler can access the queue pair while it is
> >> + * being destroyed.
> >> + */
> >> +void ib_drain_qp(struct ib_qp *qp)
> >> +{
> >> +struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
> >> +struct ib_stop_cqe stop = { };
> >> +struct ib_recv_wr wr, *bad_wr;
> >> +int ret;
> >> +
> >> +wr.wr_cqe = 
> >> +stop.cqe.done = ib_stop_done;
> >> +init_completion();
> >> +
> >> +ret = ib_modify_qp(qp, , IB_QP_STATE);
> >> +if (ret) {
> >> +WARN_ONCE(ret, "failed to drain QP: %d\n", ret);
> >> +return;
> >> +}
> >> +
> >> +ret = ib_post_recv(qp, , _wr);
> >> +if (ret) {
> >> +WARN_ONCE(ret, "failed to drain QP: %d\n", ret);
> >> +return;
> >> +}
> >> +
> >> +wait_for_completion();
> >> +}
> >
> > This is taken from srp, and srp drains using a recv wr due to a race
> > causing a use-after-free condition in srp which re-posts a recv buffer
> > in the recv completion handler. srp does not really care if there are
> > pending send flushes.
> >
> > I'm not sure if there are ordering rules for send/recv queues in
> > terms of flush completions, meaning that even if all recv flushes
> > were consumed maybe there are send flushes still pending.
> >
> > I think that for a general drain helper it would be useful to
> > make sure that both the recv _and_ send flushes were drained.
> >
> > So, something like:
> >
> > void ib_drain_qp(struct ib_qp *qp)
> > {
> > struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
> > struct ib_stop_cqe rstop, sstop;
> > struct ib_recv_wr rwr = {}, *bad_rwr;
> > struct ib_send_wr swr = {}, *bad_swr;
> > int ret;
> >
> > rwr.wr_cqe = 
> > rstop.cqe.done = ib_stop_done;
> > init_completion();
> >
> > swr.wr_cqe = 
> > sstop.cqe.done = ib_stop_done;
> > init_completion();
> >
> > ret = ib_modify_qp(qp, , IB_QP_STATE);
> > if (ret) {
> > WARN_ONCE(ret, "failed to drain QP: %d\n", ret);
> > return;
> > }
> >
> > ret = ib_post_recv(qp, , _rwr);
> > if (ret) {
> > WARN_ONCE(ret, "failed to drain recv queue: %d\n", ret);
> > return;
> > }
> >
> > ret = ib_post_send(qp, , _swr);
> > if (ret) {
> > WARN_ONCE(ret, "failed to drain send queue: %d\n", ret);
> > return;
> > }
> >
> > wait_for_completion();
> > wait_for_completion();
> > }
> >
> > Thoughts?
> 
> This won't work for iWARP as per my previous email.  But I will code
> something up that will.
> 
> Steve

After looking at the nes driver, I don't see any common way to support drain 
w/o some serious driver mods.  Since SRP is the only
user, perhaps we can ignore iWARP for this function...

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9] IB: add a helper to safely drain a QP

2015-11-13 Thread Steve Wise


On 11/13/2015 7:46 AM, Christoph Hellwig wrote:

Signed-off-by: Christoph Hellwig 
---
  drivers/infiniband/core/cq.c | 46 
  include/rdma/ib_verbs.h  |  2 ++
  2 files changed, 48 insertions(+)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index d9eb796..bf2a079 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -206,3 +206,49 @@ void ib_free_cq(struct ib_cq *cq)
WARN_ON_ONCE(ret);
  }
  EXPORT_SYMBOL(ib_free_cq);
+
+struct ib_stop_cqe {
+   struct ib_cqe   cqe;
+   struct completion done;
+};
+
+static void ib_stop_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+   struct ib_stop_cqe *stop =
+   container_of(wc->wr_cqe, struct ib_stop_cqe, cqe);
+
+   complete(>done);
+}
+
+/*
+ * Change a queue pair into the error state and wait until all receive
+ * completions have been processed before destroying it. This avoids that
+ * the receive completion handler can access the queue pair while it is
+ * being destroyed.
+ */
+void ib_drain_qp(struct ib_qp *qp)
+{
+   struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
+   struct ib_stop_cqe stop = { };
+   struct ib_recv_wr wr, *bad_wr;
+   int ret;
+
+   wr.wr_cqe = 
+   stop.cqe.done = ib_stop_done;
+   init_completion();
+
+   ret = ib_modify_qp(qp, , IB_QP_STATE);
+   if (ret) {
+   WARN_ONCE(ret, "failed to drain QP: %d\n", ret);
+   return;
+   }
+
+   ret = ib_post_recv(qp, , _wr);
+   if (ret) {
+   WARN_ONCE(ret, "failed to drain QP: %d\n", ret);
+   return;
+   }
+
+   wait_for_completion();
+}
+EXPORT_SYMBOL(ib_drain_qp);


This won't work with iwarp qps.  Once the QP is in ERROR state, 
post_send/post_recv can return a synchronous error vs async via the 
cq.   The IB spec explicitly states that posts while in ERROR will be 
completed with "flushed" via the CQ.



From http://tools.ietf.org/html/draft-hilland-rddp-verbs-00#section-6.2.4:


   *   At some point in the execution of the flushing operation, the RI
   MUST begin to return an Immediate Error for any attempt to post
   a WR to a Work Queue; prior to that point, any WQEs posted to a
   Work Queue MUST be enqueued and then flushed as described above
   (e.g. The PostSQ is done in Non-Privileged Mode and the Non-
   Privileged Mode portion of the RI has not yet been informed that
   the QP is in the Error state).


Also pending send work requests can be completed with status "flushed", 
and I would think we need to do something similar for send wrs.  We 
definitely can see this with cxgb4 in the presence of unsignaled wrs 
that aren't followed by a signaled wr at the time the QP is moved out of 
RTS.   The driver has no way to know if these pending unsignaled wrs 
completed or not.  So it completes them with "flushed" status.


So how can we do this for iwarp?  It seems like all that might be needed 
is to modify the QP state to idle, retrying until it succeeds:


   If the QP is transitioning to the Error state, or has not yet
   finished flushing the Work Queues, a Modify QP request to transition
   to the IDLE state MUST fail with an Immediate Error. If none of the
   prior conditions are true, a Modify QP to the Idle state MUST take
   the QP to the Idle state. No other state transitions out of Error
   are supported. Any attempt to transition the QP to a state other
   than Idle MUST result in an Immediate Error.


Steve.


diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e11e038..f59a8d3 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3075,4 +3075,6 @@ int ib_sg_to_pages(struct ib_mr *mr,
   int sg_nents,
   int (*set_page)(struct ib_mr *, u64));
  
+void ib_drain_qp(struct ib_qp *qp);

+
  #endif /* IB_VERBS_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Steve Wise


On 11/10/2015 5:46 AM, Sagi Grimberg wrote:



On 10/11/2015 13:41, Christoph Hellwig wrote:

Oh, and while we're at it.  Can someone explain why we're even
using rdma_read_chunk_frmr for IB?  It seems to work around the
fact tat iWarp only allow a single RDMA READ SGE, but it's used
whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
wrong.


I think Steve can answer it better than I can. I think that it is
just to have a single code path for both IB and iWARP. I agree that
the condition seems wrong and for small transfers rdma_read_chunk_frmr
is really a performance loss.


This was probably just an oversight/mistake.  The focus was on enabling 
iWARP at the time.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags

2015-11-10 Thread Steve Wise


On 11/10/2015 5:31 AM, Christoph Hellwig wrote:

On Tue, Nov 10, 2015 at 12:44:12PM +0200, Sagi Grimberg wrote:

Instead of each ULP being aware of iWARP/IB protocol in order
to determine the rdma_read access flags, have it accessible
as an attribute in the ib_device.

Patch 2,3 fixes RDS and svcrdma which gave remote access to rdma_reads
unconditionally.

This patchset goes on top of Christoph's device attributes merge into
struct ib_device.

FYI, I've updated the git branch to be based on current linus' tree
which required a few bit to be fixed.  I'd also like to note that while
everyone but Or seemed to be generally fine with it I'd really prefer
and actualy revivewed-by or acked-by tag.
--


Acked-by: Steve Wise <sw...@opengridcomputing.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 0/2] Expose max_sge_rd correctly

2015-10-27 Thread Steve Wise



> -Original Message-
> From: Sagi Grimberg [mailto:sa...@mellanox.com]
> Sent: Tuesday, October 27, 2015 4:41 AM
> To: linux-rdma@vger.kernel.org; target-de...@vger.kernel.org
> Cc: Steve Wise; Nicholas A. Bellinger; Or Gerlitz; Doug Ledford
> Subject: [PATCH 0/2] Expose max_sge_rd correctly
> 
> This addresses a specific mlx4 issue where the max_sge_rd
> is actually smaller than max_sge (rdma reads with max_sge
> entries completes with error).
> 
> The second patch removes the explicit work-around from the
> iser target code.
> 
> This applies on top of Christoph's device attributes modification.
> 


Looks correct to me.

Series Reviewed-by: Steve Wise <sw...@opengridcomputing.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: merge struct ib_device_attr into struct ib_device V2

2015-10-21 Thread Steve Wise


On 10/21/2015 11:43 AM, Jason Gunthorpe wrote:

On Wed, Oct 21, 2015 at 08:48:10AM -0700, Bart Van Assche wrote:

On 10/21/2015 12:11 AM, Or Gerlitz wrote:

haven't found any review or ack to your giant patch that touches the
whole subsystem (drivers, core and ULPs) expect from Sagi's -- lets
hear more opinions.

Although I have not yet had the time to review the entire patch, removing
ib_query_device() seems a great idea to me and an idea that I welcome very
much. The ib_device_attr structure is too large to be allocated on the
stack. This means that with Christoph's patch it is no longer needed to call
kmalloc() + ib_query_device() + kfree() when a device attribute is needed
from kernel code.

I agree, this is absolutely the right way to go.

The bikeshedding is not important, nobody has come up with a reason
why we need to maintain the attr structure as-is and Christoph already
has a patch - I say go with it.

Jason



While I don't really like the uber patch review-wise, I'm all for nuking 
ib_query_device() and the attr struct.


Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH for-next 0/2] RDMA/cxgb4: Add iWARP support for T6 adapter

2015-10-12 Thread Steve Wise



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On 
> Behalf Of Hariprasad Shenai
> Sent: Wednesday, September 23, 2015 6:49 AM
> To: linux-rdma@vger.kernel.org; net...@vger.kernel.org
> Cc: dledf...@redhat.com; da...@davemloft.net; sw...@opengridcomputing.com; 
> lee...@chelsio.com; nirran...@chelsio.com;
> Hariprasad Shenai
> Subject: [PATCH for-next 0/2] RDMA/cxgb4: Add iWARP support for T6 adapter
> 
> Hi,
> 
> PATCH 1/2 adds changes like new register, structure and functions in cxgb4
> driver for iw_cxgb4 driver, and PATCH 2/2 adds iw_cxgb4 specific code to
> support T6 adapter.
> 
> This patch series has been created against Doug's linux tree and includes
> patches on cxgb4 and iw_cxgb4 driver.
> 
> We have included all the maintainers of respective drivers. Kindly review
> the change and let us know in case of any review comments.
> 
> Thanks
> 

These look ok to me. 

Series Reviewed-by: Steve Wise <sw...@opengridcomputing.com>

Doug, should these get staged through your tree?

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] svcrdma: Fix NFS server crash triggered by 1MB NFS WRITE

2015-10-05 Thread Steve Wise



> -Original Message-
> From: linux-nfs-ow...@vger.kernel.org 
> [mailto:linux-nfs-ow...@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Sunday, October 04, 2015 10:03 PM
> To: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org
> Subject: [PATCH] svcrdma: Fix NFS server crash triggered by 1MB NFS WRITE
> 
> Now that the NFS server advertises a maximum payload size of 1MB
> for RPC/RDMA again, it crashes in svc_process_common() when NFS
> client sends a 1MB NFS WRITE on an NFS/RDMA mount.
> 
> The server has set up a 259 element array of struct page pointers
> in rq_pages[] for each incoming request. The last element of the
> array is NULL.
> 
> When an incoming request has been completely received,
> rdma_read_complete() attempts to set the starting page of the
> incoming page vector:
> 
>   rqstp->rq_arg.pages = >rq_pages[head->hdr_count];
> 
> and the page to use for the reply:
> 
>   rqstp->rq_respages = >rq_arg.pages[page_no];
> 
> But the value of page_no has already accounted for head->hdr_count.
> Thus rq_respages now points past the end of the incoming pages. For
> NFS WRITE operations smaller than the maximum, this is harmless.
> 
> But when the NFS WRITE operation is as large as the server's max
> payload size, rq_respages now points at the last entry in rq_pages,
> which is NULL.
> 
> Fixes: cc9a903d915c ('svcrdma: Change maximum server payload . . .')
> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270
> Signed-off-by: Chuck Lever <chuck.le...@oracle.com>
> ---
> 
> This fixes a 4.3-rc regression. Please apply to 4.3-rc when this
> patch passes review.
> 
> It could also be appropriate for stable kernels which do not have
> commit 7e5be28827bf ("svcrdma: advertise the correct max payload"),
> though I have not tested them with this patch.
> 
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index cb51742..37b4341 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -531,7 +531,7 @@ static int rdma_read_complete(struct svc_rqst *rqstp,
>   rqstp->rq_arg.page_base = head->arg.page_base;
> 
>   /* rq_respages starts after the last arg page */
> -     rqstp->rq_respages = >rq_arg.pages[page_no];
> + rqstp->rq_respages = >rq_pages[page_no];
>   rqstp->rq_next_page = rqstp->rq_respages + 1;
> 
>   /* Rebuild rq_arg head and tail. */
> 

Reviewed-by: Steve Wise <sw...@opengridcomputing.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: libmlx4 and libmlx5 git trees? Who is handling those?

2015-09-28 Thread Steve Wise


On 9/28/2015 12:39 PM, Jason Gunthorpe wrote:

On Mon, Sep 28, 2015 at 05:28:20PM +, Woodruff, Robert J wrote:

On Mon, 28 Sep 2015, Christoph Lameter wrote:


Right. Its really nasty when you are trying to add features that require 
libibverbs and libmlx? changes. Plus it may depend on kernel changes.

On the other hand, combining everything into one package limits the
ability of the maintainer of the individual components to release
packages with simple bug fixes or enhancements that are component
specific and don't require kernel core or libibverbs changes. Thus,
they would have to wait till a new combined package gets released
and makes the maintainer of that package the bottleneck for getting
new code out.

That is certainly the minority of work these days, by my observation.

Nearly everything is being driven by kernel side changes now and
requires cross-library work.

Or it is maintenance activity, which is so hard now (I've done a few
patches over the years) it isn't worth doing unless it is really
important.

Jason



I've released many libcxb4 releases that fixed bugs, and even added new 
device support w/o any libibverbs changes.  So I'm not sure I see the 
benefit of munging all the RDMA libraries into some uber-release...


Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RESEND] svcrdma: handle rdma read with a non-zero initial page offset

2015-09-28 Thread Steve Wise

The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
were not taking into account the initial page_offset when determining
the rdma read length.  This resulted in a read who's starting address
and length exceeded the base/bounds of the frmr.

Most work loads don't tickle this bug apparently, but one test hit it
every time: building the linux kernel on a 16 core node with 'make -j
16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.

This bug seems to only be tripped with devices having small fastreg page
list depths.  I didn't see it with mlx4, for instance.

Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
Signed-off-by: Steve Wise <sw...@opengridcomputing.com>
Tested-by: Chuck Lever <chuck.le...@oracle.com>
---

 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index cb51742..5f6ca47 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -136,7 +136,8 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
ctxt->direction = DMA_FROM_DEVICE;
ctxt->read_hdr = head;
pages_needed = min_t(int, pages_needed, xprt->sc_max_sge_rd);
-   read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
+   read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
+rs_length);
 
for (pno = 0; pno < pages_needed; pno++) {
int len = min_t(int, rs_length, PAGE_SIZE - pg_off);
@@ -235,7 +236,8 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
ctxt->direction = DMA_FROM_DEVICE;
ctxt->frmr = frmr;
pages_needed = min_t(int, pages_needed, xprt->sc_frmr_pg_list_len);
-   read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
+   read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
+rs_length);
 
frmr->kva = page_address(rqstp->rq_arg.pages[pg_no]);
frmr->direction = DMA_FROM_DEVICE;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 2/3] svcrdma: handle rdma read with a non-zero initial page offset

2015-09-28 Thread Steve Wise



> -Original Message-
> From: J. Bruce Fields [mailto:bfie...@fieldses.org]
> Sent: Monday, September 28, 2015 4:05 PM
> To: Steve Wise
> Cc: trond.mykleb...@primarydata.com; linux-...@vger.kernel.org; 
> linux-rdma@vger.kernel.org
> Subject: Re: [PATCH 2/3] svcrdma: handle rdma read with a non-zero initial 
> page offset
> 
> On Mon, Sep 28, 2015 at 09:31:25AM -0500, Steve Wise wrote:
> > On 9/21/2015 12:24 PM, Steve Wise wrote:
> > >The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
> > >were not taking into account the initial page_offset when determining
> > >the rdma read length.  This resulted in a read who's starting address
> > >and length exceeded the base/bounds of the frmr.
> > >
> > >Most work loads don't tickle this bug apparently, but one test hit it
> > >every time: building the linux kernel on a 16 core node with 'make -j
> > >16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.
> > >
> > >This bug seems to only be tripped with devices having small fastreg page
> > >list depths.  I didn't see it with mlx4, for instance.
> > >
> > >Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
> > >Signed-off-by: Steve Wise <sw...@opengridcomputing.com>
> > >Tested-by: Chuck Lever <chuck.le...@oracle.com>
> > >---
> > >
> > >
> >
> > Hey Bruce, can this make 4.3-rc?  Also, what do you think about
> > pushing it to stable?
> 
> It looks like a reasonable candidate for stable.  Apologies, somehow I
> missed it when you posted it--would you mind resending?
> 
> --b.

resent this one patch.

What is your process for pushing to stable?

Thanks,

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] svcrdma: handle rdma read with a non-zero initial page offset

2015-09-28 Thread Steve Wise


On 9/21/2015 12:24 PM, Steve Wise wrote:

The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
were not taking into account the initial page_offset when determining
the rdma read length.  This resulted in a read who's starting address
and length exceeded the base/bounds of the frmr.

Most work loads don't tickle this bug apparently, but one test hit it
every time: building the linux kernel on a 16 core node with 'make -j
16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.

This bug seems to only be tripped with devices having small fastreg page
list depths.  I didn't see it with mlx4, for instance.

Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
Signed-off-by: Steve Wise <sw...@opengridcomputing.com>
Tested-by: Chuck Lever <chuck.le...@oracle.com>
---




Hey Bruce, can this make 4.3-rc?  Also, what do you think about pushing 
it to stable?


Thanks,

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 1/3] xprtrdma: disconnect and flush cqs before freeing buffers

2015-09-28 Thread Steve Wise



> -Original Message-
> From: Anna Schumaker [mailto:anna.schuma...@netapp.com]
> Sent: Monday, September 28, 2015 9:45 AM
> To: Steve Wise; trond.mykleb...@primarydata.com; bfie...@fieldses.org
> Cc: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org
> Subject: Re: [PATCH 1/3] xprtrdma: disconnect and flush cqs before freeing 
> buffers
> 
> Hi Steve,
> 
> On 09/28/2015 10:30 AM, Steve Wise wrote:
> > On 9/21/2015 12:24 PM, Steve Wise wrote:
> >> Otherwise a FRMR completion can cause a touch-after-free crash.
> >>
> >> In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling
> >> rpcrdma_ep_destroy().
> >>
> >> In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the
> >> qp, then drain the cqs, then destroy the qp, and finally destroy the cqs.
> >>
> >> Signed-off-by: Steve Wise <sw...@opengridcomputing.com>
> >> Tested-by: Chuck Lever <chuck.le...@oracle.com>
> >> ---
> >
> > Hey Trond,  I'm hoping this can make 4.3-rc (and stable if you agree).
> 
> This patch looks fine to me.  I'll pass it on to Trond!
> 
> I'll save patch 3/3 for the Linux 4.4 merge.
> 
> Thanks,
> Anna
> 

Thanks.  Going forward I'll make sure you are CC'd for client patches too!  I 
wasn't sure if you are formally taking all xprtrdma patches and sending them to 
Trond...

Steve.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH for-next 0/2] RDMA/cxgb4: Add iWARP support for T6 adapter

2015-09-25 Thread Steve Wise

Acked-by: Steve Wise <sw...@opengridcomputing.com>



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: message size, was Re: merge struct ib_device_attr into struct ib_device

2015-09-23 Thread Steve Wise



> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, September 22, 2015 6:08 PM
> 
> > How do we change the message size limits?  Reviewing w/o it being
> > inline is painful for the (many) reviewers...
> 
> I've increased it.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: merge struct ib_device_attr into struct ib_device

2015-09-22 Thread Steve Wise



> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Christoph Hellwig
> Sent: Tuesday, September 22, 2015 3:32 PM
> To: Yann Droneaud
> Cc: linux-rdma@vger.kernel.org
> Subject: Re: merge struct ib_device_attr into struct ib_device
> 
> Hi Yann,
> 
> looks like the patch was too large and majordomo ate it
> 
> Here is a link:
> 
> http://git.infradead.org/users/hch/rdma.git/commitdiff/0e46553467cd01b63ab9c985f87c18c5328880bb

Hey Christoph,

Can you create a series of smaller patches that will fit on the list?  That 
would make it easier for everyone to review/comment. 

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] svcrdma: handle rdma read with a non-zero initial page offset

2015-09-21 Thread Steve Wise

The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
were not taking into account the initial page_offset when determining
the rdma read length.  This resulted in a read who's starting address
and length exceeded the base/bounds of the frmr.

Most work loads don't tickle this bug apparently, but one test hit it
every time: building the linux kernel on a 16 core node with 'make -j
16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.

This bug seems to only be tripped with devices having small fastreg page
list depths.  I didn't see it with mlx4, for instance.

Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
Signed-off-by: Steve Wise <sw...@opengridcomputing.com>
Tested-by: Chuck Lever <chuck.le...@oracle.com>
---

 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index cb51742..5f6ca47 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -136,7 +136,8 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
ctxt->direction = DMA_FROM_DEVICE;
ctxt->read_hdr = head;
pages_needed = min_t(int, pages_needed, xprt->sc_max_sge_rd);
-   read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
+   read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
+rs_length);
 
for (pno = 0; pno < pages_needed; pno++) {
int len = min_t(int, rs_length, PAGE_SIZE - pg_off);
@@ -235,7 +236,8 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
ctxt->direction = DMA_FROM_DEVICE;
ctxt->frmr = frmr;
pages_needed = min_t(int, pages_needed, xprt->sc_frmr_pg_list_len);
-   read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
+   read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
+rs_length);
 
frmr->kva = page_address(rqstp->rq_arg.pages[pg_no]);
frmr->direction = DMA_FROM_DEVICE;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] xprtrdma: don't log warnings for flushed completions

2015-09-21 Thread Steve Wise

Unsignaled send WRs can get flushed as part of normal unmount, so don't
log them as warnings.

Signed-off-by: Steve Wise <sw...@opengridcomputing.com>
---

 net/sunrpc/xprtrdma/frwr_ops.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index d6653f5..f1868ba 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -257,8 +257,11 @@ frwr_sendcompletion(struct ib_wc *wc)
 
/* WARNING: Only wr_id and status are reliable at this point */
r = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
-   pr_warn("RPC:   %s: frmr %p flushed, status %s (%d)\n",
-   __func__, r, ib_wc_status_msg(wc->status), wc->status);
+   if (wc->status == IB_WC_WR_FLUSH_ERR)
+   dprintk("RPC:   %s: frmr %p flushed\n", __func__, r);
+   else
+   pr_warn("RPC:   %s: frmr %p error, status %s (%d)\n",
+   __func__, r, ib_wc_status_msg(wc->status), wc->status);
r->r.frmr.fr_state = FRMR_IS_STALE;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] xprtrdma: disconnect and flush cqs before freeing buffers

2015-09-21 Thread Steve Wise

Otherwise a FRMR completion can cause a touch-after-free crash.

In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling
rpcrdma_ep_destroy().

In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the
qp, then drain the cqs, then destroy the qp, and finally destroy the cqs.

Signed-off-by: Steve Wise <sw...@opengridcomputing.com>
Tested-by: Chuck Lever <chuck.le...@oracle.com>
---

 net/sunrpc/xprtrdma/transport.c |2 +-
 net/sunrpc/xprtrdma/verbs.c |9 ++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 64443eb..41e452b 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -270,8 +270,8 @@ xprt_rdma_destroy(struct rpc_xprt *xprt)
 
xprt_clear_connected(xprt);
 
-   rpcrdma_buffer_destroy(_xprt->rx_buf);
rpcrdma_ep_destroy(_xprt->rx_ep, _xprt->rx_ia);
+   rpcrdma_buffer_destroy(_xprt->rx_buf);
rpcrdma_ia_close(_xprt->rx_ia);
 
xprt_rdma_free_addresses(xprt);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 6829967..01a314a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -755,19 +755,22 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct 
rpcrdma_ia *ia)
 
cancel_delayed_work_sync(>rep_connect_worker);
 
-   if (ia->ri_id->qp) {
+   if (ia->ri_id->qp)
rpcrdma_ep_disconnect(ep, ia);
+
+   rpcrdma_clean_cq(ep->rep_attr.recv_cq);
+   rpcrdma_clean_cq(ep->rep_attr.send_cq);
+
+   if (ia->ri_id->qp) {
rdma_destroy_qp(ia->ri_id);
ia->ri_id->qp = NULL;
}
 
-   rpcrdma_clean_cq(ep->rep_attr.recv_cq);
rc = ib_destroy_cq(ep->rep_attr.recv_cq);
if (rc)
dprintk("RPC:   %s: ib_destroy_cq returned %i\n",
__func__, rc);
 
-   rpcrdma_clean_cq(ep->rep_attr.send_cq);
rc = ib_destroy_cq(ep->rep_attr.send_cq);
if (rc)
dprintk("RPC:   %s: ib_destroy_cq returned %i\n",

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH for-next 0/5] set MPA revision to 2 and misc. fixes for iw_cxgb4

2015-09-08 Thread Steve Wise

Signed-off-by: Steve Wise <sw...@opengridcomputing.com>

> -Original Message-
> From: Hariprasad Shenai [mailto:haripra...@chelsio.com]
> Sent: Monday, September 07, 2015 11:27 PM
> To: dledf...@redhat.com
> Cc: linux-rdma@vger.kernel.org; sw...@opengridcomputing.com; Hariprasad Shenai
> Subject: [PATCH for-next 0/5] set MPA revision to 2 and misc. fixes for 
> iw_cxgb4
> 
> Hi,
> 
> This patch series adds the following.
> Detect errors while creating listening servers, pass ird/ord info in
> connect reply events, fix misuse of ord for ird calculation and for the
> ESTABLISHED event we should have the peer's ord/ird so swap the values in
> the event before the upcall. Set default MPA version to 2.
> 
> This patch series has been created against Doug's linux tree and includes
> patches on iw_cxgb4 driver.
> 
> We have included all the maintainers of respective drivers. Kindly review
> the change and let us know in case of any review comments.
> 
> Thanks
> 
> 
> Hariprasad Shenai (5):
>   iw_cxgb4: detect fatal errors while creating listening filters
>   iw_cxgb4: pass the ord/ird in connect reply events
>   iw_cxgb4: fix misuse of ep->ord for minimum ird calculation
>   iw_cxgb4: reverse the ord/ird in the ESTABLISHED upcall
>   iw_cxgb4: set the default MPA version to 2
> 
>  drivers/infiniband/hw/cxgb4/cm.c | 18 +-
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> --
> 2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] infiniband:cxgb4:Fix incorrect return statement in the function c4iw_destroy_cq

2015-09-01 Thread Steve Wise



> -Original Message-
> From: Nicholas Krause [mailto:xerofo...@gmail.com]
> Sent: Sunday, August 30, 2015 3:12 PM
> To: sw...@chelsio.com
> Cc: dledf...@redhat.com; sean.he...@intel.com; hal.rosenst...@gmail.com; 
> linux-rdma@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: [PATCH] infiniband:cxgb4:Fix incorrect return statement in the 
> function c4iw_destroy_cq
> 
> This fixes the incorrect return statement at the end of the function
> c4iw_destroy_cq's body that returns zero to instead correctly return
> the return value of the call to the function destroy_cq as all callers
> of c4iw_destroy_cq should be signaled when this call fails in order
> for them to handle it in their own intended error paths.
> 
> Signed-off-by: Nicholas Krause 
> ---
>  drivers/infiniband/hw/cxgb4/cq.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/cxgb4/cq.c 
> b/drivers/infiniband/hw/cxgb4/cq.c
> index 92d5183..4f7af20 100644
> --- a/drivers/infiniband/hw/cxgb4/cq.c
> +++ b/drivers/infiniband/hw/cxgb4/cq.c
> @@ -848,6 +848,7 @@ int c4iw_destroy_cq(struct ib_cq *ib_cq)
>  {
>   struct c4iw_cq *chp;
>   struct c4iw_ucontext *ucontext;
> + int ret;
> 
>   PDBG("%s ib_cq %p\n", __func__, ib_cq);
>   chp = to_c4iw_cq(ib_cq);
> @@ -858,10 +859,10 @@ int c4iw_destroy_cq(struct ib_cq *ib_cq)
> 
>   ucontext = ib_cq->uobject ? to_c4iw_ucontext(ib_cq->uobject->context)
> : NULL;
> - destroy_cq(>rhp->rdev, >cq,
> -ucontext ? >uctx : >cq.rdev->uctx);
> + ret = destroy_cq(>rhp->rdev, >cq,
> +  ucontext ? >uctx : >cq.rdev->uctx);
>   kfree(chp);
> - return 0;
> + return ret;
>  }


The SW CQ is destroyed regardless of any errors returned by destroy_cq().  So 
c4iw_destroy_cq() shouldn't return non-zero since it
is freeing the CQ memory.   I think the correct change here is to only 
kfree(chp) if destroy_cq() returns 0.

Steve.


> 
>  struct ib_cq *c4iw_create_cq(struct ib_device *ibdev,
> --
> 2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] infiniband:cxgb4:Fix if statement check in the function pick_local_ip6adddrs

2015-08-27 Thread Steve Wise

Acked-by: Steve Wise sw...@opengridcomputing.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] infiniband:cxgb4:Fix incorrect return statement in the function c4iw_reject_cr

2015-08-27 Thread Steve Wise

 -Original Message-
 From: Nicholas Krause [mailto:xerofo...@gmail.com]
 Sent: Wednesday, August 26, 2015 7:22 PM
 To: sw...@chelsio.com
 Cc: dledf...@redhat.com; sean.he...@intel.com; hal.rosenst...@gmail.com; 
 linux-rdma@vger.kernel.org; linux-ker...@vger.kernel.org
 Subject: [PATCH] infiniband:cxgb4:Fix incorrect return statement in the 
 function c4iw_reject_cr

 This fixes the incorrect return statement in the function
 c4iw_reject_cr that returns the value zero directly to instead
 return the variable err as this function can fail when called
 and if so we will incorrectly return success rather then the
 correct status of a failed call to the caller of this particular
 function.

 Signed-off-by: Nicholas Krause xerofo...@gmail.com
 ---

NAK.  

The return code for these cpl handlers indicates if process_work() or other 
callers needs to free the skb.   They are supposed to
return 0.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH for-4.3] iw_cxgb4: Add support for clip

2015-08-25 Thread Steve Wise

Acked-by: Steve Wise sw...@opengridcomputing.com


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-20 Thread Steve Wise


On 8/20/2015 3:49 AM, Sagi Grimberg wrote:

On 8/19/2015 8:54 PM, Jason Gunthorpe wrote:

On Wed, Aug 19, 2015 at 07:48:02PM +0200, Christoph Hellwig wrote:

On Wed, Aug 19, 2015 at 11:46:14AM -0600, Jason Gunthorpe wrote:

Reviewed-by: Jason Gunthorpe jguntho...@obsidianresearch.com

AFAIK, this path is rarely (never?) actually used. I think all the
drivers we have can post directly from userspace.


Oh, interesting.  Is there any chance to deprecate it?  Not having
to care for the uvers command would really help with some of the
upcoming changes I have in my mind.


Hmm, we'd need a survey of the userspace side to see if it is rarely
or never...

And we'd have to talk to the soft XXX guys to see if they plan to use
it..


Checked in librxe (user-space softroce). Looks like posts are going via
this path...


Ditto for the soft iWARP stack, which is still out-of-linux.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-08-12 Thread Steve Wise

  Hey Sagi, how is this coming along?  How can I help?
 
 
  Hi Steve,
 
  This is taking longer than I expected, the changes needed seem
  pretty extensive throughout the IO path. I don't think it will be ready
  for 4.3
 
 
  Perhaps then we should go with my version that adds iwarp-only FRMR IO 
  handlers for 4.3.  Then they can be phased out as the rework
  matures.  Thoughts?
 
 
  Shall I send out the series again for merging into 4.3?
 
 Hi Steve,
 
 Not sure about that.. I'm a bit reluctant in adding a code that
 branches the isert code even more than it already is.
 
 Nic, WDYT?

Nic is silent... 

Sagi, do you have an ETA on when you can have the recode ready for detailed 
review and test? If we can't make linux-4.3, can we be early in staging it for 
linux-4.4?

Thanks,

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH for-4.3 11/15] iw_cxgb4: Support ib_alloc_mr verb

2015-08-07 Thread Steve Wise

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of 'Christoph Hellwig'
 Sent: Friday, August 07, 2015 11:26 AM
 To: Steve Wise
 Cc: 'Christoph Hellwig'; 'Sagi Grimberg'; 'Doug Ledford'; 
 linux-rdma@vger.kernel.org; linux-...@vger.kernel.org; target-
 de...@vger.kernel.org
 Subject: Re: [PATCH for-4.3 11/15] iw_cxgb4: Support ib_alloc_mr verb

 On Fri, Aug 07, 2015 at 11:19:59AM -0500, Steve Wise wrote:
  I guess I'll post two patches, the NFS fix that preceeds af78181/ b7e06cd, 
  and a reworked patch to replace e20684a.

  Is that the way to go in your opinion?

 To me this sounds good.  We have a couple patches from Jason's series
 that already need to be replaced, so the tree will need a rebase anyway,
 so I don't see a problem with replacing ones in Sagi's series either.
 If Sagi needs to do a repost for some reason he can just include your
 NFS patch in front of the series.

I misspoke.  I had the order reversed. The order is such that we can add my new 
NFS patch after:

e20684a xprtrdma, svcrdma: Convert to ib_alloc_mr

and before these:

af78181 cxgb3: Support ib_alloc_mr verb
b7e06cd iw_cxgb4: Support ib_alloc_mr verb

Steve

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] svcrdma: limit FRMR page list lengths to device max

2015-08-07 Thread Steve Wise

Svcrdma was incorrectly allocating fastreg MRs and page lists using
RPCSVC_MAXPAGES, which can exceed the device capabilities.  So limit
the depth to the minimum of RPCSVC_MAXPAGES and xprt-sc_frmr_pg_list_len.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---
Doug, this patch needs to be added after this commit:

e20684a xprtrdma, svcrdma: Convert to ib_alloc_mr

and before these commits:

af78181 cxgb3: Support ib_alloc_mr verb
b7e06cd iw_cxgb4: Support ib_alloc_mr verb

This will avoid a bisect window where NFSRDMA over cxgb4 is broken.

Bruce, please ACK if this commit looks good, and also if you're ok with
this flowing through Doug's rdma tree due to the dependencies.
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 8752a2d..11d5133 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -733,17 +733,19 @@ static struct svc_rdma_fastreg_mr *rdma_alloc_frmr(struct 
svcxprt_rdma *xprt)
struct ib_mr *mr;
struct ib_fast_reg_page_list *pl;
struct svc_rdma_fastreg_mr *frmr;
+   u32 num_sg;
 
frmr = kmalloc(sizeof(*frmr), GFP_KERNEL);
if (!frmr)
goto err;
 
-   mr = ib_alloc_mr(xprt-sc_pd, IB_MR_TYPE_MEM_REG, RPCSVC_MAXPAGES);
+   num_sg = min_t(u32, RPCSVC_MAXPAGES, xprt-sc_frmr_pg_list_len);
+   mr = ib_alloc_mr(xprt-sc_pd, IB_MR_TYPE_MEM_REG, num_sg);
if (IS_ERR(mr))
goto err_free_frmr;
 
pl = ib_alloc_fast_reg_page_list(xprt-sc_cm_id-device,
-RPCSVC_MAXPAGES);
+num_sg);
if (IS_ERR(pl))
goto err_free_mr;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-4.3 11/15] iw_cxgb4: Support ib_alloc_mr verb

2015-08-07 Thread Steve Wise

On 8/7/2015 11:19 AM, Steve Wise wrote:

-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Christoph Hellwig
Sent: Friday, August 07, 2015 10:13 AM
To: Steve Wise
Cc: 'Sagi Grimberg'; 'Doug Ledford'; linux-rdma@vger.kernel.org; 
linux-...@vger.kernel.org; target-de...@vger.kernel.org
Subject: Re: [PATCH for-4.3 11/15] iw_cxgb4: Support ib_alloc_mr verb

On Fri, Aug 07, 2015 at 10:06:26AM -0500, Steve Wise wrote:

If it is too much of a pain to alter this patch, then I'll just
submit the NFSRDMA fix and live with the bisect issue...

Doug's tree is still to be rebased.  So please submit your NFS
fix now as ask Doug to merge it before Sagi's series in the final
tree.

My new NFS fix needs to land before these two:

af78181 cxgb3: Support ib_alloc_mr verb
b7e06cd iw_cxgb4: Support ib_alloc_mr verb

But it will cause the following patch, which is after the above two, to need 
rework because it hits the same lines:

e20684a xprtrdma, svcrdma: Convert to ib_alloc_mr

I guess I'll post two patches, the NFS fix that preceeds af78181/ b7e06cd, and 
a reworked patch to replace e20684a.

Is that the way to go in your opinion?

Steve.

Ignore this.  I think I was looking at the wrong staging branch.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH for-4.3 11/15] iw_cxgb4: Support ib_alloc_mr verb

2015-08-07 Thread Steve Wise



 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Sagi Grimberg
 Sent: Thursday, July 30, 2015 2:33 AM
 To: linux-rdma@vger.kernel.org; linux-...@vger.kernel.org; 
 target-de...@vger.kernel.org
 Subject: [PATCH for-4.3 11/15] iw_cxgb4: Support ib_alloc_mr verb
 
 Signed-off-by: Sagi Grimberg sa...@mellanox.com
 ---
  drivers/infiniband/hw/cxgb4/iw_cxgb4.h |  4 +++-
  drivers/infiniband/hw/cxgb4/mem.c  | 12 +---
  drivers/infiniband/hw/cxgb4/provider.c |  2 +-
  3 files changed, 13 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h 
 b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
 index cc77844..c7bb38c 100644
 --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
 +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
 @@ -970,7 +970,9 @@ void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list 
 *page_list);
  struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(
   struct ib_device *device,
   int page_list_len);
 -struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth);
 +struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
 + enum ib_mr_type mr_type,
 + u32 max_num_sg);
  int c4iw_dealloc_mw(struct ib_mw *mw);
  struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
  struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,
 diff --git a/drivers/infiniband/hw/cxgb4/mem.c 
 b/drivers/infiniband/hw/cxgb4/mem.c
 index cff815b..026b91e 100644
 --- a/drivers/infiniband/hw/cxgb4/mem.c
 +++ b/drivers/infiniband/hw/cxgb4/mem.c
 @@ -853,7 +853,9 @@ int c4iw_dealloc_mw(struct ib_mw *mw)
   return 0;
  }
 
 -struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
 +struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
 + enum ib_mr_type mr_type,
 + u32 max_num_sg)
  {
   struct c4iw_dev *rhp;
   struct c4iw_pd *php;
 @@ -862,6 +864,10 @@ struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, 
 int pbl_depth)
   u32 stag = 0;
   int ret = 0;
 
 + if (mr_type != IB_MR_TYPE_MEM_REG ||
 + max_num_sg  t4_max_fr_depth(use_dsgl))
 + return ERR_PTR(-EINVAL);
 +
   php = to_c4iw_pd(pd);
   rhp = php-rhp;
   mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);

Hey Sagi/Doug,

The above change introduces a regression with NFSRDMA over cxgb4.  Prior to 
this commit, cxgb4 allowed frmr and page list
allocations that exceeded the device max.  It does enforce the max when 
processing a fastreg WR though.  This is definitely a bug in
cxgb4, but was benign.  Further, the NFSRDMA server currently allocates frmrs 
and page_lists using  RPCSVC_MAXPAGES for max_num_sg
which exceeds the max supported by cxgb4.  Thus with the above patch, NFSRDMA 
doesn't work at all over cxgb4. :(

So I need to fix NFSRDMA for sure.  But adding a fix on top of the above patch 
will leave a bisectable window where NFSRDMA is
broken on cxgb4.  Is it too late to change the above patch so the regression is 
avoided?  Then I can provide a new series of patches
to fix the NFS server to mind the device max,  and fix cxgb4 to enforce the 
device max.

If it is too much of a pain to alter this patch, then I'll just submit the 
NFSRDMA fix and live with the bisect issue...

Thoughts?

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] split struct ib_send_wr

2015-08-07 Thread Steve Wise


On 8/6/2015 11:24 AM, Christoph Hellwig wrote:

I've pushed out a new version.  Updates:

  - the ib_recv_wr change Bart notices has been fixed.
  - iser and isert have been converted
  - the handling of the embedded WR in the qib software queue entry
has been fixed.

Which means we're basically done now and the patch could use
broader testing.

The full patch will be too much for the list again, so here is the
git commit:

http://git.infradead.org/users/hch/scsi.git/commitdiff/a0027ed00fc3ae2686d8a843a724b50597115a71



This tests ok over iwarp/cxgb4 using NFSRDMA and user mode RDMA apps.

Tested-by: Steve Wise sw...@opengridcomputing.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH for-4.3 11/15] iw_cxgb4: Support ib_alloc_mr verb

2015-08-07 Thread Steve Wise

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Christoph Hellwig
 Sent: Friday, August 07, 2015 10:13 AM
 To: Steve Wise
 Cc: 'Sagi Grimberg'; 'Doug Ledford'; linux-rdma@vger.kernel.org; 
 linux-...@vger.kernel.org; target-de...@vger.kernel.org
 Subject: Re: [PATCH for-4.3 11/15] iw_cxgb4: Support ib_alloc_mr verb

 On Fri, Aug 07, 2015 at 10:06:26AM -0500, Steve Wise wrote:
  If it is too much of a pain to alter this patch, then I'll just
  submit the NFSRDMA fix and live with the bisect issue...

 Doug's tree is still to be rebased.  So please submit your NFS
 fix now as ask Doug to merge it before Sagi's series in the final
 tree.

My new NFS fix needs to land before these two:

af78181 cxgb3: Support ib_alloc_mr verb
b7e06cd iw_cxgb4: Support ib_alloc_mr verb

But it will cause the following patch, which is after the above two, to need 
rework because it hits the same lines:

e20684a xprtrdma, svcrdma: Convert to ib_alloc_mr

I guess I'll post two patches, the NFS fix that preceeds af78181/ b7e06cd, and 
a reworked patch to replace e20684a.

Is that the way to go in your opinion?

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] split struct ib_send_wr

2015-08-06 Thread Steve Wise


On 8/6/2015 11:24 AM, Christoph Hellwig wrote:

I've pushed out a new version.  Updates:

  - the ib_recv_wr change Bart notices has been fixed.
  - iser and isert have been converted
  - the handling of the embedded WR in the qib software queue entry
has been fixed.

Which means we're basically done now and the patch could use
broader testing.

The full patch will be too much for the list again, so here is the
git commit:

http://git.infradead.org/users/hch/scsi.git/commitdiff/a0027ed00fc3ae2686d8a843a724b50597115a71



Hey Christoph,

You missed amso1100 (and probably ipath) that have been moved to 
drivers/staging...


  CC [M]  drivers/staging/amso1100/c2_qp.o
drivers/staging/amso1100/c2_qp.c: In function âc2_post_sendâ:
drivers/staging/amso1100/c2_qp.c:863: error: âstruct ib_send_wrâ has no 
member named âwrâ


I'll disable them from my config so I can test your code on cxgb4, but I 
wanted to let you know...


Steve.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC] split struct ib_send_wr

2015-08-06 Thread Steve Wise

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Christoph Hellwig
 Sent: Thursday, August 06, 2015 12:32 PM
 To: Steve Wise
 Cc: Christoph Hellwig; linux-rdma@vger.kernel.org; Sagi Grimberg
 Subject: Re: [RFC] split struct ib_send_wr

 On Thu, Aug 06, 2015 at 12:04:32PM -0500, Steve Wise wrote:
  You missed amso1100 (and probably ipath) that have been moved to
  drivers/staging...

 Driver/staging isn't considered in tree for global API change
 perspective, so I didn't bother with all these staging drivers.

The kbuild test bot will probably catch this. 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-08-05 Thread Steve Wise


 
   Hey Sagi, how is this coming along?  How can I help?
  
 
  Hi Steve,
 
  This is taking longer than I expected, the changes needed seem
  pretty extensive throughout the IO path. I don't think it will be ready
  for 4.3
 
 
 Perhaps then we should go with my version that adds iwarp-only FRMR IO 
 handlers for 4.3.  Then they can be phased out as the rework
 matures.  Thoughts?
 

Shall I send out the series again for merging into 4.3?

Thanks,

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] RDMA/amso1100: deprecate the amso1100 provider

2015-08-04 Thread Steve Wise

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Marciniszyn, Mike
 Sent: Tuesday, August 04, 2015 11:33 AM
 To: Doug Ledford; Steve Wise
 Cc: linux-rdma@vger.kernel.org; t...@opengridcomputing.com
 Subject: RE: [PATCH] RDMA/amso1100: deprecate the amso1100 provider

  Subject: Re: [PATCH] RDMA/amso1100: deprecate the amso1100 provider

  On 07/29/2015 10:44 AM, Steve Wise wrote:
   The HW hasn't been sold since 2005, and the SW has definite bit rot.
   Its time to remove it.  So move it to staging for a few releases and
   then remove it after that.

   Signed-off-by: Steve Wise sw...@opengridcomputing.com

  Thanks, applied.

 This might run into the same issue as in 
 https://lists.01.org/pipermail/kbuild-all/2015-August/011216.html.

Hey Mike, what is the issue exactly?

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] RDMA/amso1100: deprecate the amso1100 provider

2015-08-04 Thread Steve Wise

   This might run into the same issue as in
  https://lists.01.org/pipermail/kbuild-all/2015-August/011216.html.
  
 
  Hey Mike, what is the issue exactly?
 
 The problem is when CONFIG_INFINIBAND=n and CONFIG_INFINIBAND_HFI1=y. Here is 
 what I did for ipath which had the same issue:
 
 http://marc.info/?l=linux-rdmam=143870511731459w=2
 
 -Denny

Thanks.  I'll submit a patch for amso1100 shortly.

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] RDMA/amso1100: Do not build amso1100 without infiniband subsystem

2015-08-04 Thread Steve Wise

Moving the amso1100 driver to staging now requires a guard on the
INFINIBAND subsytem in order to prevent the driver from being built
without infiniband enabled. This will lead to a broken build.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/staging/amso1100/Kconfig |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/amso1100/Kconfig b/drivers/staging/amso1100/Kconfig
index e6ce5f2..809cb14 100644
--- a/drivers/staging/amso1100/Kconfig
+++ b/drivers/staging/amso1100/Kconfig
@@ -1,6 +1,6 @@
 config INFINIBAND_AMSO1100
tristate Ammasso 1100 HCA support
-   depends on PCI  INET
+   depends on PCI  INET  INFINIBAND
---help---
  This is a low-level driver for the Ammasso 1100 host
  channel adapter (HCA).

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-08-04 Thread Steve Wise

 -Original Message-
 From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il]
 Sent: Tuesday, August 04, 2015 12:26 PM
 To: Steve Wise; dledf...@redhat.com
 Cc: infinip...@intel.com; sa...@mellanox.com; ogerl...@mellanox.com; 
 r...@mellanox.com; linux-rdma@vger.kernel.org;
 e...@mellanox.com; target-de...@vger.kernel.org; linux-...@vger.kernel.org; 
 bfie...@fieldses.org
 Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive 
 names

  Hey Sagi, how is this coming along?  How can I help?

 Hi Steve,

 This is taking longer than I expected, the changes needed seem
 pretty extensive throughout the IO path. I don't think it will be ready
 for 4.3

Perhaps then we should go with my version that adds iwarp-only FRMR IO handlers 
for 4.3.  Then they can be phased out as the rework matures.  Thoughts?

 I'll try to send you soon a preliminary version to play with.
 Acceptable?

I can test the iwarp parts once you think the code is ready to try.

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-08-03 Thread Steve Wise


  Steve,
 
  I've given this some thought and I think we should avoid splitting
  logic from PI and iWARP. The reason (other than code duplication) is
  that currently the iser target support only up to 1MB IOs. I have some
  code (not done yet) to support larger IOs by using multiple
  registrations  per IO (with or without PI).
  With a little tweaking I think we can get iwarp to fit in too...
 
  So, do you mind if I take a crack at it?
 
 Sure, go ahead.  Let me know how I can help.  Certainly I can test it
 for you.  I'm very keen to get this in for 4.3 if possible...
 

Hey Sagi, how is this coming along?  How can I help?

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/12] IB: Replace safe uses for ib_get_dma_mr with pd-local_dma_lkey

2015-07-31 Thread Steve Wise



Series looks good.

Reviewed-by: Steve Wise sw...@opengridcomputing.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-4.3 02/15] IB: Modify ib_create_mr API

2015-07-30 Thread Steve Wise


On 7/30/2015 2:32 AM, Sagi Grimberg wrote:

Use ib_alloc_mr with specific parameters.
Change the existing callers.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
  drivers/infiniband/core/verbs.c  | 31 --
  drivers/infiniband/hw/mlx5/main.c|  2 +-
  drivers/infiniband/hw/mlx5/mlx5_ib.h |  5 +++--
  drivers/infiniband/hw/mlx5/mr.c  | 17 ++-
  drivers/infiniband/ulp/iser/iser_verbs.c |  6 ++
  drivers/infiniband/ulp/isert/ib_isert.c  |  6 +-
  include/rdma/ib_verbs.h  | 37 +---
  7 files changed, 58 insertions(+), 46 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 003bb62..2ac599b 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1272,15 +1272,32 @@ int ib_dereg_mr(struct ib_mr *mr)
  }
  EXPORT_SYMBOL(ib_dereg_mr);
  
-struct ib_mr *ib_create_mr(struct ib_pd *pd,

-  struct ib_mr_init_attr *mr_init_attr)
+/**
+ * ib_alloc_mr() - Allocates a memory region
+ * @pd:protection domain associated with the region
+ * @mr_type:   memory region type
+ * @max_num_sg:maximum sg entries available for registration.
+ *
+ * Notes:
+ * Memory registeration page/sg lists must not exceed max_num_sg.
+ * For mr_type IB_MR_TYPE_MEM_REG, the total length cannot exceed
+ * max_num_sg * used_page_size.
+ *


Nit:  the above sounds like used_page_size is a variable.  Something 
like this might work?


max_num_sg * the page size used for this sg list.



+ */
+struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
+ enum ib_mr_type mr_type,
+ u32 max_num_sg)
  {
struct ib_mr *mr;
  
-	if (!pd-device-create_mr)

-   return ERR_PTR(-ENOSYS);
-
-   mr = pd-device-create_mr(pd, mr_init_attr);
+   if (pd-device-alloc_mr) {
+   mr = pd-device-alloc_mr(pd, mr_type, max_num_sg);
+   } else {
+   if (mr_type != IB_MR_TYPE_MEM_REG ||
+   !pd-device-alloc_fast_reg_mr)
+   return ERR_PTR(-ENOSYS);
+   mr = pd-device-alloc_fast_reg_mr(pd, max_num_sg);
+   }
  
  	if (!IS_ERR(mr)) {

mr-device  = pd-device;
@@ -1292,7 +1309,7 @@ struct ib_mr *ib_create_mr(struct ib_pd *pd,
  
  	return mr;

  }
-EXPORT_SYMBOL(ib_create_mr);
+EXPORT_SYMBOL(ib_alloc_mr);
  
  struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)

  {
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 46d1383..2c2a461 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1489,7 +1489,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev-ib_dev.attach_mcast = mlx5_ib_mcg_attach;
dev-ib_dev.detach_mcast = mlx5_ib_mcg_detach;
dev-ib_dev.process_mad  = mlx5_ib_process_mad;
-   dev-ib_dev.create_mr= mlx5_ib_create_mr;
+   dev-ib_dev.alloc_mr = mlx5_ib_alloc_mr;
dev-ib_dev.alloc_fast_reg_mr= mlx5_ib_alloc_fast_reg_mr;
dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev-ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 537f42e..3030abe 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -572,8 +572,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
  int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index,
   int npages, int zap);
  int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
-struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
-   struct ib_mr_init_attr *mr_init_attr);
+struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
+  enum ib_mr_type mr_type,
+  u32 max_num_sg);
  struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
int max_page_list_len);
  struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct 
ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 03cf74e..b0b68bb 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1246,14 +1246,15 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr)
return 0;
  }
  
-struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,

-   struct ib_mr_init_attr *mr_init_attr)
+struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
+  enum ib_mr_type mr_type,
+  u32 max_num_sg)
  {
struct mlx5_ib_dev *dev = to_mdev(pd-device);
struct mlx5_create_mkey_mbox_in *in;
struct mlx5_ib_mr *mr;

Re: [PATCH 20/22] IB/iser: Support up to 8MB data transfer in a single command

2015-07-30 Thread Steve Wise


On 7/30/2015 3:06 AM, Sagi Grimberg wrote:

iser support up to 512KB data transfer in a single scsi
command. In order to support up to 8MB, iser needs to pre-allocate
larger memory regions and larger page vectors.

Given that a few target implementations don't support data transfers
of more than 512KB by default and the fact that larger IO sizes require
more resources, we introduce a module parameter to determine the
maximum number of 512B sectors in a single scsi command.
Users that are interested in larger transfers can change this value given
that the target supports larger transfers.

IO operations that consists of N pages will need a page vector
of size N+1 in case the first SG element contains an offset. Given
that some devices allocates memory regions in powers of 2, this
means that allocating a region with N+1 pages, will result in
region resources allocation of the next power of 2. Since we don't
want that to happen, in case we are in the limit of IO size supported
and the first SG element has an offset, we align the SG list using a
bounce buffer (which is OK given that this is not likely to happen a lot).

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
  drivers/infiniband/ulp/iser/iscsi_iser.c | 19 ---
  drivers/infiniband/ulp/iser/iscsi_iser.h | 14 --
  drivers/infiniband/ulp/iser/iser_initiator.c |  2 +-
  drivers/infiniband/ulp/iser/iser_memory.c| 14 --
  drivers/infiniband/ulp/iser/iser_verbs.c | 27 +++
  5 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index e3cea61..9eeefc8 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -93,6 +93,10 @@ static unsigned int iscsi_max_lun = 512;
  module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);
  MODULE_PARM_DESC(max_lun, Max LUNs to allow per session (default:512);
  
+unsigned int iser_max_sectors = ISER_DEF_MAX_SECTORS;

+module_param_named(max_sectors, iser_max_sectors, uint, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(max_sectors, Max number of sectors in a single scsi command 
(default:1024);
+
  bool iser_pi_enable = false;
  module_param_named(pi_enable, iser_pi_enable, bool, S_IRUGO);
  MODULE_PARM_DESC(pi_enable, Enable T10-PI offload support 
(default:disabled));
@@ -625,6 +629,8 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
if (ep) {
iser_conn = ep-dd_data;
max_cmds = iser_conn-max_cmds;
+   shost-sg_tablesize = iser_conn-scsi_sg_tablesize;
+   shost-max_sectors = iser_conn-scsi_max_sectors;
  
  		mutex_lock(iser_conn-state_mutex);

if (iser_conn-state != ISER_CONN_UP) {
@@ -643,15 +649,6 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
   SHOST_DIX_GUARD_CRC);
}
  
-		/*

-* Limit the sg_tablesize and max_sectors based on the device
-* max fastreg page list length.
-*/
-   shost-sg_tablesize = min_t(unsigned short, shost-sg_tablesize,
-   ib_conn-device-dev_attr.max_fast_reg_page_list_len);
-   shost-max_sectors = min_t(unsigned int,
-   1024, (shost-sg_tablesize * PAGE_SIZE)  9);
-
if (iscsi_host_add(shost,
   ib_conn-device-ib_device-dma_device)) {
mutex_unlock(iser_conn-state_mutex);
@@ -966,8 +963,8 @@ static struct scsi_host_template iscsi_iser_sht = {
.name   = iSCSI Initiator over iSER,
.queuecommand   = iscsi_queuecommand,
.change_queue_depth = scsi_change_queue_depth,
-   .sg_tablesize   = ISCSI_ISER_SG_TABLESIZE,
-   .max_sectors= 1024,
+   .sg_tablesize   = ISCSI_ISER_DEF_SG_TABLESIZE,
+   .max_sectors= ISER_DEF_MAX_SECTORS,
.cmd_per_lun= ISER_DEF_CMD_PER_LUN,
.eh_abort_handler   = iscsi_eh_abort,
.eh_device_reset_handler= iscsi_eh_device_reset,
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index e9ebe0b..8a32e20 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -98,8 +98,13 @@
  #define SHIFT_4K  12
  #define SIZE_4K   (1ULL  SHIFT_4K)
  #define MASK_4K   (~(SIZE_4K-1))
-   /* support up to 512KB in one RDMA */
-#define ISCSI_ISER_SG_TABLESIZE (0x8  SHIFT_4K)
+
+/* Default support is 512KB I/O size */
+#define ISER_DEF_MAX_SECTORS   1024
+#define ISCSI_ISER_DEF_SG_TABLESIZE((ISER_DEF_MAX_SECTORS * 512)  
SHIFT_4K)
+/* Maximum support is 8MB I/O size */
+#define ISCSI_ISER_MAX_SG_TABLESIZE(16384 * 512  SHIFT_4K)
+
  #define

Re: [PATCH for-4.3 00/15] Modify MR allocation API

2015-07-30 Thread Steve Wise


On 7/30/2015 2:32 AM, Sagi Grimberg wrote:

This patch set is detached from my WIP for modifying our
fast registration kernel API. I incorporated some comments
from Jason and Christoph. The current set is a drop-in replacement
of ib_alloc_fast_reg_mr to ib_alloc_mr which receives a memory
region type (whcih can be IB_MR_TYPE_MEM_REG for normal memory
registration, IB_MR_TYPE_SIGNATURE for a data-integrity capable
memory region and future arbitrary SG support capable memory
region).




Series looks good.

Reviewed-by: Steve Wise sw...@opengridcomputing.com
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 12/22] IB/iser: Introduce iser_reg_ops

2015-07-30 Thread Steve Wise


On 7/30/2015 3:06 AM, Sagi Grimberg wrote:

Move all the per-device function pointers to an easy
extensible iser_reg_ops structure that contains all
the iser registration operations.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
  drivers/infiniband/ulp/iser/iscsi_iser.h | 39 ++--
  drivers/infiniband/ulp/iser/iser_initiator.c | 16 ++--
  drivers/infiniband/ulp/iser/iser_memory.c| 35 +
  drivers/infiniband/ulp/iser/iser_verbs.c | 30 +
  4 files changed, 75 insertions(+), 45 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 70bf6e7..9ce090c 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -326,6 +326,25 @@ struct iser_comp {
  };
  
  /**

+ * struct iser_device - Memory registration operations
+ * per-device registration schemes
+ *
+ * @alloc_reg_res: Allocate registration resources
+ * @free_reg_res:  Free registration resources
+ * @reg_rdma_mem:  Register memory buffers
+ * @unreg_rdma_mem:Un-register memory buffers
+ */
+struct iser_reg_ops {
+   int(*alloc_reg_res)(struct ib_conn *ib_conn,
+   unsigned cmds_max);
+   void   (*free_reg_res)(struct ib_conn *ib_conn);
+   int(*reg_rdma_mem)(struct iscsi_iser_task *iser_task,
+  enum iser_data_dir cmd_dir);
+   void   (*unreg_rdma_mem)(struct iscsi_iser_task *iser_task,
+enum iser_data_dir cmd_dir);
+};
+
+/**
   * struct iser_device - iSER device handle
   *
   * @ib_device: RDMA device
@@ -338,11 +357,7 @@ struct iser_comp {
   * @comps_used:Number of completion contexts used, Min between online
   * cpus and device max completion vectors
   * @comps: Dinamically allocated array of completion handlers
- * Memory registration pool Function pointers (FMR or Fastreg):
- * @iser_alloc_rdma_reg_res: Allocation of memory regions pool
- * @iser_free_rdma_reg_res:  Free of memory regions pool
- * @iser_reg_rdma_mem:   Memory registration routine
- * @iser_unreg_rdma_mem: Memory deregistration routine
+ * @reg_ops:   Registration ops
   */
  struct iser_device {
struct ib_device *ib_device;
@@ -354,13 +369,7 @@ struct iser_device {
int  refcount;
int  comps_used;
struct iser_comp *comps;
-   int  (*iser_alloc_rdma_reg_res)(struct ib_conn 
*ib_conn,
-   unsigned 
cmds_max);
-   void (*iser_free_rdma_reg_res)(struct ib_conn 
*ib_conn);
-   int  (*iser_reg_rdma_mem)(struct 
iscsi_iser_task *iser_task,
- enum iser_data_dir 
cmd_dir);
-   void (*iser_unreg_rdma_mem)(struct 
iscsi_iser_task *iser_task,
-   enum iser_data_dir 
cmd_dir);
+   struct iser_reg_ops  *reg_ops;
  };
  
  #define ISER_CHECK_GUARD	0xc0

@@ -563,6 +572,8 @@ extern int iser_debug_level;
  extern bool iser_pi_enable;
  extern int iser_pi_guard;
  
+int iser_assign_reg_ops(struct iser_device *device);

+
  int iser_send_control(struct iscsi_conn *conn,
  struct iscsi_task *task);
  
@@ -636,9 +647,9 @@ int  iser_initialize_task_headers(struct iscsi_task *task,

struct iser_tx_desc *tx_desc);
  int iser_alloc_rx_descriptors(struct iser_conn *iser_conn,
  struct iscsi_session *session);
-int iser_create_fmr_pool(struct ib_conn *ib_conn, unsigned cmds_max);
+int iser_alloc_fmr_pool(struct ib_conn *ib_conn, unsigned cmds_max);
  void iser_free_fmr_pool(struct ib_conn *ib_conn);
-int iser_create_fastreg_pool(struct ib_conn *ib_conn, unsigned cmds_max);
+int iser_alloc_fastreg_pool(struct ib_conn *ib_conn, unsigned cmds_max);
  void iser_free_fastreg_pool(struct ib_conn *ib_conn);
  u8 iser_check_task_pi_status(struct iscsi_iser_task *iser_task,
 enum iser_data_dir cmd_dir, sector_t *sector);
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 42d6f42..88d8a89 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -73,7 +73,7 @@ static int iser_prepare_read_cmd(struct iscsi_task *task)
return err;
}
  
-	err = device-iser_reg_rdma_mem(iser_task, ISER_DIR_IN);

+   err = device-reg_ops-reg_rdma_mem(iser_task, ISER_DIR_IN);
if (err) {
iser_err(Failed to set up Data-IN RDMA\n);

RE: deprecating amso1100

2015-07-29 Thread Steve Wise

 -Original Message-
 From: Doug Ledford [mailto:dledf...@redhat.com]
 Sent: Wednesday, July 29, 2015 9:28 AM
 To: Dennis Dalessandro
 Cc: Steve Wise; linux-rdma@vger.kernel.org
 Subject: Re: deprecating amso1100

 On 07/21/2015 04:45 PM, Doug Ledford wrote:

  On Jul 21, 2015, at 4:42 PM, Dalessandro, Dennis 
  dennis.dalessan...@intel.com wrote:

  -Original Message-
  From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
  ow...@vger.kernel.org] On Behalf Of Doug Ledford
  Sent: Tuesday, July 21, 2015 2:49 PM
  To: Steve Wise
  Cc: linux-rdma@vger.kernel.org
  Subject: Re: deprecating amso1100

  On Jul 21, 2015, at 2:48 PM, Steve Wise sw...@opengridcomputing.com
  wrote:

  Hey Doug,

  How should I submit changes to deprecate amso1100?  The HW hasn't been
  sold since 2005, and the SW has definite bit rot.  Its time to remove 
  it...

  Thanks,

  Steve.

  Send me a git patch that uses git mv to move it to staging and weâ€™ll 
  leave it
  there for a release or two and then remove it after that.

  Steve, git format-patch -M does a nice job creating a patch that can 
  actually be looked at via email.

  http://marc.info/?l=linux-kernelm=143593782206479w=2

  For the Ipath driver I was going to send directly to the staging 
  maintainer/list. Doug, should that come to you through linux-rdma
 instead?

  I sent an email to Greg K-H to ask that question.  It doesn’t matter to me. 
   If you want, send the email to this list and if I get a response
 from GKH that he wants to take it, I’ll just point him to it once he lands.

 I'll be sending these through to Linus (as agreed with Greg).  However,
 the patches need a TODO file added to the driver directory when it is
 placed in the staging tree (that's a mandatory requirement of being in
 that tree).  In this case, the TODO would just be to remove the driver
 as some point in time.  That point it pretty arbitrary, but I would
 suggest removing the driver in the 4.6 merge window.

Sure.  Will you create this file or do you want a patch from me?

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] RDMA/amso1100: deprecate the amso1100 provider

2015-07-29 Thread Steve Wise

The HW hasn't been sold since 2005, and the SW has definite bit rot.
Its time to remove it.  So move it to staging for a few releases and
then remove it after that.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---
 drivers/infiniband/Kconfig |1 -
 drivers/infiniband/hw/Makefile |1 -
 drivers/staging/Kconfig|2 ++
 drivers/staging/Makefile   |1 +
 drivers/{infiniband/hw = staging}/amso1100/Kbuild |0
 .../{infiniband/hw = staging}/amso1100/Kconfig|0
 drivers/staging/amso1100/TODO  |4 
 drivers/{infiniband/hw = staging}/amso1100/c2.c   |0
 drivers/{infiniband/hw = staging}/amso1100/c2.h   |0
 .../{infiniband/hw = staging}/amso1100/c2_ae.c|0
 .../{infiniband/hw = staging}/amso1100/c2_ae.h|0
 .../{infiniband/hw = staging}/amso1100/c2_alloc.c |0
 .../{infiniband/hw = staging}/amso1100/c2_cm.c|0
 .../{infiniband/hw = staging}/amso1100/c2_cq.c|0
 .../{infiniband/hw = staging}/amso1100/c2_intr.c  |0
 .../{infiniband/hw = staging}/amso1100/c2_mm.c|0
 .../{infiniband/hw = staging}/amso1100/c2_mq.c|0
 .../{infiniband/hw = staging}/amso1100/c2_mq.h|0
 .../{infiniband/hw = staging}/amso1100/c2_pd.c|0
 .../hw = staging}/amso1100/c2_provider.c  |0
 .../hw = staging}/amso1100/c2_provider.h  |0
 .../{infiniband/hw = staging}/amso1100/c2_qp.c|0
 .../{infiniband/hw = staging}/amso1100/c2_rnic.c  |0
 .../hw = staging}/amso1100/c2_status.h|0
 .../{infiniband/hw = staging}/amso1100/c2_user.h  |0
 .../{infiniband/hw = staging}/amso1100/c2_vq.c|0
 .../{infiniband/hw = staging}/amso1100/c2_vq.h|0
 .../{infiniband/hw = staging}/amso1100/c2_wr.h|0
 28 files changed, 7 insertions(+), 2 deletions(-)
 rename drivers/{infiniband/hw = staging}/amso1100/Kbuild (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/Kconfig (100%)
 create mode 100644 drivers/staging/amso1100/TODO
 rename drivers/{infiniband/hw = staging}/amso1100/c2.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_ae.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_ae.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_alloc.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_cm.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_cq.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_intr.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_mm.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_mq.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_mq.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_pd.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_provider.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_provider.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_qp.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_rnic.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_status.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_user.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_vq.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_vq.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_wr.h (100%)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index b899531..c3c184e 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -58,7 +58,6 @@ source drivers/infiniband/hw/mthca/Kconfig
 source drivers/infiniband/hw/ipath/Kconfig
 source drivers/infiniband/hw/qib/Kconfig
 source drivers/infiniband/hw/ehca/Kconfig
-source drivers/infiniband/hw/amso1100/Kconfig
 source drivers/infiniband/hw/cxgb3/Kconfig
 source drivers/infiniband/hw/cxgb4/Kconfig
 source drivers/infiniband/hw/mlx4/Kconfig
diff --git a/drivers/infiniband/hw/Makefile b/drivers/infiniband/hw/Makefile
index e900b03..e179dfb 100644
--- a/drivers/infiniband/hw/Makefile
+++ b/drivers/infiniband/hw/Makefile
@@ -2,7 +2,6 @@ obj-$(CONFIG_INFINIBAND_MTHCA)  += mthca/
 obj-$(CONFIG_INFINIBAND_IPATH) += ipath/
 obj-$(CONFIG_INFINIBAND_QIB)   += qib/
 obj-$(CONFIG_INFINIBAND_EHCA)  += ehca/
-obj-$(CONFIG_INFINIBAND_AMSO1100)  += amso1100/
 obj-$(CONFIG_INFINIBAND_CXGB3) += cxgb3/
 obj-$(CONFIG_INFINIBAND_CXGB4) += cxgb4/
 obj-$(CONFIG_MLX4_INFINIBAND)  += mlx4/
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 7f6cae5..cec20d2 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -112,4 +112,6 @@ source drivers/staging/fsl-mc/Kconfig
 
 source drivers/staging/wilc1000/Kconfig
 
+source drivers/staging/amso1100/Kconfig
+
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers

[PATCH] RDMA/iser: Limit sgs to the device fastreg depth

2015-07-28 Thread Steve Wise

Currently the sg tablesize, which dictates fast register page list
depth to use, does not take into account the limits of the rdma device.
So adjust it once we discover the device fastreg max depth limit.  Also
adjust the max_sectors based on the resulting sg tablesize.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---
Note: This patch was originally part of
http://www.spinics.net/lists/linux-rdma/msg27436.html. The isert work
will be a separate series, so I'm submitting this one to go ahead and
get it merged.
---


 drivers/infiniband/ulp/iser/iscsi_iser.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 6a594aa..de8730d 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -640,6 +640,15 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
   SHOST_DIX_GUARD_CRC);
}
 
+   /*
+* Limit the sg_tablesize and max_sectors based on the device
+* max fastreg page list length.
+*/
+   shost-sg_tablesize = min_t(unsigned short, shost-sg_tablesize,
+   ib_conn-device-dev_attr.max_fast_reg_page_list_len);
+   shost-max_sectors = min_t(unsigned int,
+   1024, (shost-sg_tablesize * PAGE_SIZE)  9);
+
if (iscsi_host_add(shost,
   ib_conn-device-ib_device-dma_device)) {
mutex_unlock(iser_conn-state_mutex);

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API

2015-07-27 Thread Steve Wise


On 7/27/2015 12:14 PM, Jason Gunthorpe wrote:

On Sun, Jul 26, 2015 at 12:45:10PM +0300, Sagi Grimberg wrote:

On 7/23/2015 9:51 PM, Jason Gunthorpe wrote:

On Thu, Jul 23, 2015 at 07:47:14PM +0300, Sagi Grimberg wrote:


So we force ULPs to think about what they are doing properly, and we
get a chance to actually force lkey to be local use only for IB.

The lkey/rkey decision is passed in the fastreg post_send().

That is too late to check the access flags.

Why? the access permissions are kept in the mr context?

Sure, one could do if (key == mr-lkey) .. check lkey flags in the
post, but that seems silly considering we want the post inlined..

Why should we check the lkey/rkey access flags in the post?

Eh? It was your idea..

I just want to check the access flags and force lkey's to not have
ACCESS_REMOTE set without complaining loudly.

To do that you need to know if the mr is a lkey/rkey, and you need to
know the flags.


I can move it to the post interface if it makes more sense.
the access is kind of out of place in the mapping routine anyway...

All the dma routines have an access equivalent during map, I don't
think it is out of place..

To my mind, the map is the point where the MR should crystallize into
an rkey or lkey MR, not at the post.

I'm not sure I understand why the lkey/rkey should be set at the map
routine. To me, it seems more natural to map_mr_sg and then either
register the lkey or the rkey.

We need to check the access flags to put a stop to this remote access
lkey security problem. That means we need to label every MR as a lkey
or rkey MR.

No more MR's can be both nonsense.


Well technically an MR with REMOTE_WRITE also has LOCAL_WRITE set. So 
you are proposing the core disallow a ULP from using the lkey for this 
type of MR?  Say in a RECV sge?




--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-27 Thread Steve Wise

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Steve Wise
 Sent: Sunday, July 26, 2015 3:17 PM
 To: Sagi Grimberg; dledf...@redhat.com
 Cc: infinip...@intel.com; sa...@mellanox.com; ogerl...@mellanox.com; 
 r...@mellanox.com; linux-rdma@vger.kernel.org;
 e...@mellanox.com; target-de...@vger.kernel.org; linux-...@vger.kernel.org; 
 bfie...@fieldses.org
 Subject: Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive 
 names

 On 7/26/2015 5:08 AM, Sagi Grimberg wrote:
  On 7/24/2015 7:18 PM, Steve Wise wrote:
  This is in preparation for adding new FRMR-only IO handlers
  for devices that support FRMR and not PI.

  Steve,

  I've given this some thought and I think we should avoid splitting
  logic from PI and iWARP. The reason (other than code duplication) is
  that currently the iser target support only up to 1MB IOs. I have some
  code (not done yet) to support larger IOs by using multiple
  registrations  per IO (with or without PI).
  With a little tweaking I think we can get iwarp to fit in too...

  So, do you mind if I take a crack at it?

 Sure, go ahead.  Let me know how I can help.  Certainly I can test it
 for you.  I'm very keen to get this in for 4.3 if possible...

Since Sagi is going to work on isert to support iwarp as part of his current 
isert large work, I'll drop the isert patches.  I also want to split up the 
max_sge_rd patches to their own submission.  So I will send out 2 new series 
for a final submission:

1) iser support for iwarp

2) use max_sge_rd and remove rdma_cap_read_multi_sge().

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RESEND 2/4] ipath,qib: Expose max_sge_rd correctly

2015-07-27 Thread Steve Wise

Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
Acked-by: Mike Marciniszyn mike.marcinis...@intel.com
---

 drivers/infiniband/hw/ipath/ipath_verbs.c |1 +
 drivers/infiniband/hw/qib/qib_verbs.c |1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c 
b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 30ba49c..ed2bbc2 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1521,6 +1521,7 @@ static int ipath_query_device(struct ib_device *ibdev, 
struct ib_device_attr *pr
props-max_qp = ib_ipath_max_qps;
props-max_qp_wr = ib_ipath_max_qp_wrs;
props-max_sge = ib_ipath_max_sges;
+   props-max_sge_rd = ib_ipath_max_sges;
props-max_cq = ib_ipath_max_cqs;
props-max_ah = ib_ipath_max_ahs;
props-max_cqe = ib_ipath_max_cqes;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c 
b/drivers/infiniband/hw/qib/qib_verbs.c
index a05d1a3..bc723b5 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1574,6 +1574,7 @@ static int qib_query_device(struct ib_device *ibdev, 
struct ib_device_attr *prop
props-max_qp = ib_qib_max_qps;
props-max_qp_wr = ib_qib_max_qp_wrs;
props-max_sge = ib_qib_max_sges;
+   props-max_sge_rd = ib_qib_max_sges;
props-max_cq = ib_qib_max_cqs;
props-max_ah = ib_qib_max_ahs;
props-max_cqe = ib_qib_max_cqes;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RESEND 4/4] RDMA/Core: remove rdma_cap_read_multi_sge() helper

2015-07-27 Thread Steve Wise

This functionality already exists via the max_sge_rd
device capability.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 include/rdma/ib_verbs.h |   28 
 1 files changed, 0 insertions(+), 28 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index b0f898e..7448a27 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2071,34 +2071,6 @@ static inline bool rdma_cap_eth_ah(const struct 
ib_device *device, u8 port_num)
 }
 
 /**
- * rdma_cap_read_multi_sge - Check if the port of device has the capability
- * RDMA Read Multiple Scatter-Gather Entries.
- * @device: Device to check
- * @port_num: Port number to check
- *
- * iWARP has a restriction that RDMA READ requests may only have a single
- * Scatter/Gather Entry (SGE) in the work request.
- *
- * NOTE: although the linux kernel currently assumes all devices are either
- * single SGE RDMA READ devices or identical SGE maximums for RDMA READs and
- * WRITEs, according to Tom Talpey, this is not accurate.  There are some
- * devices out there that support more than a single SGE on RDMA READ
- * requests, but do not support the same number of SGEs as they do on
- * RDMA WRITE requests.  The linux kernel would need rearchitecting to
- * support these imbalanced READ/WRITE SGEs allowed devices.  So, for now,
- * suffice with either the device supports the same READ/WRITE SGEs, or
- * it only gets one READ sge.
- *
- * Return: true for any device that allows more than one SGE in RDMA READ
- * requests.
- */
-static inline bool rdma_cap_read_multi_sge(struct ib_device *device,
-  u8 port_num)
-{
-   return !(device-port_immutable[port_num].core_cap_flags  
RDMA_CORE_CAP_PROT_IWARP);
-}
-
-/**
  * rdma_max_mad_size - Return the max MAD size required by this RDMA Port.
  *
  * @device: Device

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RESEND 0/4] Use max_sge_rd device capability

2015-07-27 Thread Steve Wise

Resending because I forgot to cc linux-rdma :(

Some devices were not setting this capability, so fix those devices,
and svcrdma to use max_sge_rd.  Also remove rdma_cap_read_multi_sge()
since it isn't needed.

These patches were originally part of:

http://www.spinics.net/lists/linux-rdma/msg27436.html

They really aren't part of iSER/iWARP at all, so I've split 
them out.  

Bruce: This hits svcrdma, but I suggest they get merged via Doug's tree
to avoid any merge problems.

---

Sagi Grimberg (1):
  mlx4, mlx5, mthca: Expose max_sge_rd correctly

Steve Wise (3):
  RDMA/Core: remove rdma_cap_read_multi_sge() helper
  svcrdma: Use max_sge_rd for destination read depths
  ipath,qib: Expose max_sge_rd correctly


 drivers/infiniband/hw/ipath/ipath_verbs.c|1 +
 drivers/infiniband/hw/mlx4/main.c|1 +
 drivers/infiniband/hw/mlx5/main.c|1 +
 drivers/infiniband/hw/mthca/mthca_provider.c |1 +
 drivers/infiniband/hw/qib/qib_verbs.c|1 +
 include/linux/sunrpc/svc_rdma.h  |1 +
 include/rdma/ib_verbs.h  |   28 --
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |   12 +--
 net/sunrpc/xprtrdma/svc_rdma_transport.c |4 
 9 files changed, 11 insertions(+), 39 deletions(-)

-- 

Steve
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RESEND 3/4] svcrdma: Use max_sge_rd for destination read depths

2015-07-27 Thread Steve Wise

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 include/linux/sunrpc/svc_rdma.h  |1 +
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |   12 +---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |4 
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index cb94ee4..83211bc 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -132,6 +132,7 @@ struct svcxprt_rdma {
struct list_head sc_accept_q;   /* Conn. waiting accept */
int  sc_ord;/* RDMA read limit */
int  sc_max_sge;
+   int  sc_max_sge_rd; /* max sge for read target */
 
int  sc_sq_depth;   /* Depth of SQ */
atomic_t sc_sq_count;   /* Number of SQ WR on queue */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 2e1348b..cb51742 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -115,15 +115,6 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
rqstp-rq_arg.tail[0].iov_len = 0;
 }
 
-static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
-{
-   if (!rdma_cap_read_multi_sge(xprt-sc_cm_id-device,
-xprt-sc_cm_id-port_num))
-   return 1;
-   else
-   return min_t(int, sge_count, xprt-sc_max_sge);
-}
-
 /* Issue an RDMA_READ using the local lkey to map the data sink */
 int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
struct svc_rqst *rqstp,
@@ -144,8 +135,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
 
ctxt-direction = DMA_FROM_DEVICE;
ctxt-read_hdr = head;
-   pages_needed =
-   min_t(int, pages_needed, rdma_read_max_sge(xprt, pages_needed));
+   pages_needed = min_t(int, pages_needed, xprt-sc_max_sge_rd);
read = min_t(int, pages_needed  PAGE_SHIFT, rs_length);
 
for (pno = 0; pno  pages_needed; pno++) {
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 6b36279..fdc850f 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -872,6 +872,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
*xprt)
 * capabilities of this particular device */
newxprt-sc_max_sge = min((size_t)devattr.max_sge,
  (size_t)RPCSVC_MAXPAGES);
+   newxprt-sc_max_sge_rd = min_t(size_t, devattr.max_sge_rd,
+  RPCSVC_MAXPAGES);
newxprt-sc_max_requests = min((size_t)devattr.max_qp_wr,
   (size_t)svcrdma_max_requests);
newxprt-sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt-sc_max_requests;
@@ -1046,6 +1048,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
*xprt)
remote_ip   : %pI4\n
remote_port : %d\n
max_sge : %d\n
+   max_sge_rd  : %d\n
sq_depth: %d\n
max_requests: %d\n
ord : %d\n,
@@ -1059,6 +1062,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
*xprt)
ntohs(((struct sockaddr_in *)newxprt-sc_cm_id-
   route.addr.dst_addr)-sin_port),
newxprt-sc_max_sge,
+   newxprt-sc_max_sge_rd,
newxprt-sc_sq_depth,
newxprt-sc_max_requests,
newxprt-sc_ord);

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RESEND 1/4] mlx4, mlx5, mthca: Expose max_sge_rd correctly

2015-07-27 Thread Steve Wise

From: Sagi Grimberg sa...@mellanox.com

Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Reported-by: Steve Wise sw...@opengridcomputing.com
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---

 drivers/infiniband/hw/mlx4/main.c|1 +
 drivers/infiniband/hw/mlx5/main.c|1 +
 drivers/infiniband/hw/mthca/mthca_provider.c |1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 8be6db8..05166b7 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -229,6 +229,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props-max_qp_wr   = dev-dev-caps.max_wqes - 
MLX4_IB_SQ_MAX_SPARE;
props-max_sge = min(dev-dev-caps.max_sq_sg,
 dev-dev-caps.max_rq_sg);
+   props-max_sge_rd = props-max_sge;
props-max_cq  = dev-dev-quotas.cq;
props-max_cqe = dev-dev-caps.max_cqes;
props-max_mr  = dev-dev-quotas.mpt;
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 085c24b..bf27f21 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -273,6 +273,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 sizeof(struct mlx5_wqe_ctrl_seg)) /
 sizeof(struct mlx5_wqe_data_seg);
props-max_sge = min(max_rq_sg, max_sq_sg);
+   props-max_sge_rd = props-max_sge;
props-max_cq  = 1  MLX5_CAP_GEN(mdev, log_max_cq);
props-max_cqe = (1  MLX5_CAP_GEN(mdev, log_max_eq_sz)) - 1;
props-max_mr  = 1  MLX5_CAP_GEN(mdev, log_max_mkey);
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c 
b/drivers/infiniband/hw/mthca/mthca_provider.c
index 93ae51d..dc2d48c 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -97,6 +97,7 @@ static int mthca_query_device(struct ib_device *ibdev, struct 
ib_device_attr *pr
props-max_qp  = mdev-limits.num_qps - 
mdev-limits.reserved_qps;
props-max_qp_wr   = mdev-limits.max_wqes;
props-max_sge = mdev-limits.max_sg;
+   props-max_sge_rd  = props-max_sge;
props-max_cq  = mdev-limits.num_cqs - 
mdev-limits.reserved_cqs;
props-max_cqe = mdev-limits.max_cqes;
props-max_mr  = mdev-limits.num_mpts - 
mdev-limits.reserved_mrws;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Steve Wise


On 7/26/2015 12:40 PM, Sagi Grimberg wrote:



Ideally, the post contains a chain of all 4 registrations and the
rdma_read (and an opportunistic good scsi response).


Just to be clear: This example is for IB only, correct?  IW would
require rkeys with REMOTE_WRITE and 4 read wrs.


My assumption is that it would depend on max_sge_rd.



yea.


IB only? iWARP by definition isn't capable of doing rdma_read to
more than one scatter? Anyway, we'll need to calculate the number
of RDMA_READs.



The wire protocol limits the destination to a single stg/to/len (aka 
rkey/addr/len).  Devices/fw/sw could implement some magic to support a 
single stg/to/len that maps to a scatter gather list of stags/tos/lens.


And you're ignoring invalidation wrs (or read-with-inv) in the 
example...


Yes, didn't want to inflate the example too much...


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Steve Wise


On 7/26/2015 5:08 AM, Sagi Grimberg wrote:

On 7/24/2015 7:18 PM, Steve Wise wrote:

This is in preparation for adding new FRMR-only IO handlers
for devices that support FRMR and not PI.


Steve,

I've given this some thought and I think we should avoid splitting
logic from PI and iWARP. The reason (other than code duplication) is
that currently the iser target support only up to 1MB IOs. I have some
code (not done yet) to support larger IOs by using multiple
registrations  per IO (with or without PI).
With a little tweaking I think we can get iwarp to fit in too...

So, do you mind if I take a crack at it?


Sure, go ahead.  Let me know how I can help.  Certainly I can test it 
for you.  I'm very keen to get this in for 4.3 if possible...



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Steve Wise


On 7/26/2015 6:00 AM, Sagi Grimberg wrote:

On 7/26/2015 1:43 PM, Christoph Hellwig wrote:

On Sun, Jul 26, 2015 at 01:08:16PM +0300, Sagi Grimberg wrote:

I've given this some thought and I think we should avoid splitting
logic from PI and iWARP. The reason (other than code duplication) is
that currently the iser target support only up to 1MB IOs. I have some
code (not done yet) to support larger IOs by using multiple
registrations  per IO (with or without PI).


Just curious: How is this going to work with iSER only having a single
rkey/offset/len field?



Good question,

On the wire iser sends a single rkey, but the target is allowed to
transfer the data however it wants to.

Say that the local target HCA supports only 32 pages (128K bytes for 4K
pages) registration and the initiator sent:
rkey=0x1234
address=0x
length=512K

The target would allocate a 512K buffer and:
register offset 0-128K to lkey=0x1
register offset 128K-256K to lkey=0x2
register offset 256K-384K to lkey=0x3
register offset 384K-512K to lkey=0x4

then constructs sg_list as:
sg_list[0] = {addr=buf, length=128K, lkey=0x1}
sg_list[1] = {addr=buf+128K, length=128K, lkey=0x2}
sg_list[2] = {addr=buf+256K, length=128K, lkey=0x3}
sg_list[3] = {addr=buf+384K, length=128K, lkey=0x4}

Then set rdma_read wr with:
rdma_r_wr.sg_list=sg_list
rdma_r_wr.rdma.addr=0x
rdma_r_wr.rdma.rkey=0x1234

post_send(rdma_r_wr);

Ideally, the post contains a chain of all 4 registrations and the
rdma_read (and an opportunistic good scsi response).


Just to be clear: This example is for IB only, correct?  IW would 
require rkeys with REMOTE_WRITE and 4 read wrs.  And you're ignoring 
invalidation wrs (or read-with-inv) in the example...


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH WIP 38/43] iser-target: Port to new memory registration API

2015-07-24 Thread Steve Wise

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Jason Gunthorpe
 Sent: Friday, July 24, 2015 11:27 AM
 To: Chuck Lever
 Cc: Sagi Grimberg; Christoph Hellwig; linux-rdma; Liran Liss; Oren Duer
 Subject: Re: [PATCH WIP 38/43] iser-target: Port to new memory registration 
 API

 On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote:

  Unfinished, but operational:

  http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future

 Nice..

 Can you spend some time and reflect on how some of this could be
 lowered into the core code? The FMR and FRWR side have many
 similarities now..

  FRWR is seeing a 10-15% throughput reduction with 8-thread dbench,
  but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct
  read and write are negatively impacted.

 I'm not surprised since invalidate is sync. I belive you need to
 incorporate SEND WITH INVALIDATE to substantially recover this
 overhead.

 It would be neat if the RQ could continue to advance while waiting for
 the invalidate.. That looks almost doable..

  I converted the RPC reply handler tasklet to a work queue context
  to allow sleeping. A new .ro_unmap_sync method is invoked after
  the RPC/RDMA header is parsed but before xprt_complete_rqst()
  wakes up the waiting RPC.

 .. so the issue is the RPC must be substantially parsed to learn which
 MR it is associated with to schedule the invalidate?

  This is actually much more efficient than the current logic,
  which serially does an ib_unmap_fmr() for each MR the RPC owns.
  So FMR overall performs better with this change.

 Interesting..

  Because the next RPC cannot awaken until the last send completes,
  send queue accounting is based on RPC/RDMA credit flow control.

 So for FRWR the sync invalidate effectively guarentees all SQEs
 related to this RPC are flushed. That seems reasonable, if the number
 of SQEs and CQEs are properly sized in relation to the RPC slot count
 it should be workable..

 How does FMR and PHYS synchronize?

  I’m sure there are some details here that still need to be
  addressed, but this fixes the big problem with FRWR send queue
  accounting, which was that LOCAL_INV WRs would continue to
  consume SQEs while another RPC was allowed to start.

 Did you test without that artificial limit you mentioned before?

 I'm also wondering about this:

  During some other testing I found that when a completion upcall
  returns to the provider leaving CQEs still on the completion queue,
  there is a non-zero probability that a completion will be lost.

 What does lost mean?

 The CQ is edge triggered, so if you don't drain it you might not get
 another timely CQ callback (which is bad), but CQEs themselves should
 not be lost.

This condition (not fully draining the CQEs) is due to SQ flow control, yes?  
If so, then when the SQ resumes can it wake up the appropriate thread 
(simulating another CQE insertion)?

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] RDMA/cxgb3: fail get_dma_mr if the memory footprint can exceed 32b

2015-07-24 Thread Steve Wise

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Doug Ledford
 Sent: Friday, July 24, 2015 9:45 AM
 To: Steve Wise
 Cc: linux-rdma@vger.kernel.org
 Subject: Re: [PATCH] RDMA/cxgb3: fail get_dma_mr if the memory footprint can 
 exceed 32b

 On 07/23/2015 06:47 PM, Steve Wise wrote:
  -Original Message- From: Doug Ledford
  Should this be a static check of the pointer size versus installed
  memory?  Would it be possible to have this work for machines with
  less than 4GB of physical memory even if they have 64bit pointers,
  or are you concerned that hotplug memory could take us over the
  limit after registration and cause problems?

  NFSRDMA doesn't need dma-mrs for T3 since it has FRMR + local dma
  lkey support.  And since the deficiency really can cause problems on
  64b systems if the memory grows  4GB after dma-mr allocation, I
  decided to just not allow them for potential large memory systems.

 Ok.  I've pulled this for 4.2-rc then.

The problem has been there since day one, so it doesn't represent a regression. 
 It is your call, but I think 4.3 is fine.

Thanks,

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-24 Thread Steve Wise

This is in preparation for adding new FRMR-only IO handlers
for devices that support FRMR and not PI.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/ulp/isert/ib_isert.c |   28 ++--
 1 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 6af3dd4..dcd3c55 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -48,15 +48,15 @@ static struct workqueue_struct *isert_comp_wq;
 static struct workqueue_struct *isert_release_wq;
 
 static void
-isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
+isert_unmap_lkey(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
 static int
-isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+isert_map_lkey(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
   struct isert_rdma_wr *wr);
 static void
-isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
+isert_unreg_frmr_pi(struct isert_cmd *isert_cmd, struct isert_conn 
*isert_conn);
 static int
-isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
-  struct isert_rdma_wr *wr);
+isert_reg_frmr_pi(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+ struct isert_rdma_wr *wr);
 static int
 isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd);
 static int
@@ -367,12 +367,12 @@ isert_create_device_ib_res(struct isert_device *device)
if (dev_attr-device_cap_flags  IB_DEVICE_MEM_MGT_EXTENSIONS 
dev_attr-device_cap_flags  IB_DEVICE_SIGNATURE_HANDOVER) {
device-use_fastreg = 1;
-   device-reg_rdma_mem = isert_reg_rdma;
-   device-unreg_rdma_mem = isert_unreg_rdma;
+   device-reg_rdma_mem = isert_reg_frmr_pi;
+   device-unreg_rdma_mem = isert_unreg_frmr_pi;
} else {
device-use_fastreg = 0;
-   device-reg_rdma_mem = isert_map_rdma;
-   device-unreg_rdma_mem = isert_unmap_cmd;
+   device-reg_rdma_mem = isert_map_lkey;
+   device-unreg_rdma_mem = isert_unmap_lkey;
}
 
ret = isert_alloc_comps(device, dev_attr);
@@ -1701,7 +1701,7 @@ isert_unmap_data_buf(struct isert_conn *isert_conn, 
struct isert_data_buf *data)
 
 
 static void
-isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
+isert_unmap_lkey(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
 {
struct isert_rdma_wr *wr = isert_cmd-rdma_wr;
 
@@ -1726,7 +1726,7 @@ isert_unmap_cmd(struct isert_cmd *isert_cmd, struct 
isert_conn *isert_conn)
 }
 
 static void
-isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
+isert_unreg_frmr_pi(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
 {
struct isert_rdma_wr *wr = isert_cmd-rdma_wr;
 
@@ -2442,7 +2442,7 @@ isert_build_rdma_wr(struct isert_conn *isert_conn, struct 
isert_cmd *isert_cmd,
 }
 
 static int
-isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+isert_map_lkey(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
   struct isert_rdma_wr *wr)
 {
struct se_cmd *se_cmd = cmd-se_cmd;
@@ -2848,8 +2848,8 @@ unmap_prot_cmd:
 }
 
 static int
-isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
-  struct isert_rdma_wr *wr)
+isert_reg_frmr_pi(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+ struct isert_rdma_wr *wr)
 {
struct se_cmd *se_cmd = cmd-se_cmd;
struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V6 3/9] ipath,qib: Expose max_sge_rd correctly

2015-07-24 Thread Steve Wise

Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
Acked-by: Mike Marciniszyn mike.marcinis...@intel.com
---

 drivers/infiniband/hw/ipath/ipath_verbs.c |1 +
 drivers/infiniband/hw/qib/qib_verbs.c |1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c 
b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 30ba49c..ed2bbc2 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1521,6 +1521,7 @@ static int ipath_query_device(struct ib_device *ibdev, 
struct ib_device_attr *pr
props-max_qp = ib_ipath_max_qps;
props-max_qp_wr = ib_ipath_max_qp_wrs;
props-max_sge = ib_ipath_max_sges;
+   props-max_sge_rd = ib_ipath_max_sges;
props-max_cq = ib_ipath_max_cqs;
props-max_ah = ib_ipath_max_ahs;
props-max_cqe = ib_ipath_max_cqes;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c 
b/drivers/infiniband/hw/qib/qib_verbs.c
index a05d1a3..bc723b5 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1574,6 +1574,7 @@ static int qib_query_device(struct ib_device *ibdev, 
struct ib_device_attr *prop
props-max_qp = ib_qib_max_qps;
props-max_qp_wr = ib_qib_max_qp_wrs;
props-max_sge = ib_qib_max_sges;
+   props-max_sge_rd = ib_qib_max_sges;
props-max_cq = ib_qib_max_cqs;
props-max_ah = ib_qib_max_ahs;
props-max_cqe = ib_qib_max_cqes;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V6 8/9] isert: Use local_dma_lkey whenever possible.

2015-07-24 Thread Steve Wise

No need to allocate a dma_mr if the device provides a local_dma_lkey.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/ulp/isert/ib_isert.c |   47 ++-
 drivers/infiniband/ulp/isert/ib_isert.h |1 +
 2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 8ae9208..47bb790 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -249,7 +249,7 @@ isert_alloc_rx_descriptors(struct isert_conn *isert_conn)
rx_sg = rx_desc-rx_sg;
rx_sg-addr = rx_desc-dma_addr;
rx_sg-length = ISER_RX_PAYLOAD_SIZE;
-   rx_sg-lkey = device-mr-lkey;
+   rx_sg-lkey = device-local_dma_lkey;
}
 
isert_conn-rx_desc_head = 0;
@@ -375,8 +375,9 @@ isert_create_device_ib_res(struct isert_device *device)
if (ret)
return ret;
 
-   /* asign function handlers */
+   /* assign function handlers */
if (dev_attr-device_cap_flags  IB_DEVICE_MEM_MGT_EXTENSIONS 
+   dev_attr-device_cap_flags  IB_DEVICE_LOCAL_DMA_LKEY 
dev_attr-device_cap_flags  IB_DEVICE_SIGNATURE_HANDOVER) {
device-use_fastreg = 1;
device-reg_rdma_mem = isert_reg_frmr_pi;
@@ -399,12 +400,17 @@ isert_create_device_ib_res(struct isert_device *device)
goto out_cq;
}
 
-   device-mr = ib_get_dma_mr(device-pd, IB_ACCESS_LOCAL_WRITE);
-   if (IS_ERR(device-mr)) {
-   ret = PTR_ERR(device-mr);
-   isert_err(failed to create dma mr, device %p, ret=%d\n,
- device, ret);
-   goto out_mr;
+   if (device-use_fastreg)
+   device-local_dma_lkey = device-ib_device-local_dma_lkey;
+   else {
+   device-mr = ib_get_dma_mr(device-pd, IB_ACCESS_LOCAL_WRITE);
+   if (IS_ERR(device-mr)) {
+   ret = PTR_ERR(device-mr);
+   isert_err(failed to create dma mr, device %p, 
ret=%d\n,
+ device, ret);
+   goto out_mr;
+   }
+   device-local_dma_lkey = device-mr-lkey;
}
 
/* Check signature cap */
@@ -425,7 +431,8 @@ isert_free_device_ib_res(struct isert_device *device)
 {
isert_info(device %p\n, device);
 
-   ib_dereg_mr(device-mr);
+   if (!device-use_fastreg)
+   ib_dereg_mr(device-mr);
ib_dealloc_pd(device-pd);
isert_free_comps(device);
 }
@@ -1108,8 +1115,8 @@ isert_create_send_desc(struct isert_conn *isert_conn,
tx_desc-num_sge = 1;
tx_desc-isert_cmd = isert_cmd;
 
-   if (tx_desc-tx_sg[0].lkey != device-mr-lkey) {
-   tx_desc-tx_sg[0].lkey = device-mr-lkey;
+   if (tx_desc-tx_sg[0].lkey != device-local_dma_lkey) {
+   tx_desc-tx_sg[0].lkey = device-local_dma_lkey;
isert_dbg(tx_desc %p lkey mismatch, fixing\n, tx_desc);
}
 }
@@ -1132,7 +1139,7 @@ isert_init_tx_hdrs(struct isert_conn *isert_conn,
tx_desc-dma_addr = dma_addr;
tx_desc-tx_sg[0].addr  = tx_desc-dma_addr;
tx_desc-tx_sg[0].length = ISER_HEADERS_LEN;
-   tx_desc-tx_sg[0].lkey = device-mr-lkey;
+   tx_desc-tx_sg[0].lkey = device-local_dma_lkey;
 
isert_dbg(Setup tx_sg[0].addr: 0x%llx length: %u lkey: 0x%x\n,
  tx_desc-tx_sg[0].addr, tx_desc-tx_sg[0].length,
@@ -1165,7 +1172,7 @@ isert_rdma_post_recvl(struct isert_conn *isert_conn)
memset(sge, 0, sizeof(struct ib_sge));
sge.addr = isert_conn-login_req_dma;
sge.length = ISER_RX_LOGIN_SIZE;
-   sge.lkey = isert_conn-device-mr-lkey;
+   sge.lkey = isert_conn-device-local_dma_lkey;
 
isert_dbg(Setup sge: addr: %llx length: %d 0x%08x\n,
sge.addr, sge.length, sge.lkey);
@@ -1215,7 +1222,7 @@ isert_put_login_tx(struct iscsi_conn *conn, struct 
iscsi_login *login,
 
tx_dsg-addr= isert_conn-login_rsp_dma;
tx_dsg-length  = length;
-   tx_dsg-lkey= isert_conn-device-mr-lkey;
+   tx_dsg-lkey= isert_conn-device-local_dma_lkey;
tx_desc-num_sge = 2;
}
if (!login-login_failed) {
@@ -1734,6 +1741,7 @@ isert_unmap_lkey(struct isert_cmd *isert_cmd, struct 
isert_conn *isert_conn)
 
if (wr-send_wr) {
isert_dbg(Cmd %p free send_wr\n, isert_cmd);
+   wr-send_wr_num = 0;
kfree(wr-send_wr);
wr-send_wr = NULL;
}
@@ -1967,7 +1975,6 @@ isert_completion_rdma_read(struct iser_tx_desc *tx_desc,
iscsit_stop_dataout_timer(cmd);
device-unreg_rdma_mem(isert_cmd, isert_conn);
cmd-write_data_done = wr-data.len;
-   wr-send_wr_num = 0;
 
isert_dbg(Cmd: %p

[PATCH V6 0/9] iSER support for iWARP

2015-07-24 Thread Steve Wise

The following series implements support for iWARP transports in the iSER
initiator and target.  This is based on v4.2-rc3.

I know we're in the middle of some API changes that will affect the isert
patches in this series, but I wanted to get these out for another round
of review.  I can merge the isert changes on top of the new API once
it solidifies and we have a staging tree for them.  My goal is to get
this series in for 4.3, so if the API changes don't gel fairly soon,
I ask that we consider pulling this series in first.  But I'm definitly
willing to help get this -and- the API stuff in for 4.3.

Changes since V5:

The big change in V6 is to introduce new register/unregister functions in isert
to use FRMRs + local dma lkey for devices that support this yet do not
support DIF/PI.  So iWARP devices do not require a DMA-MR.

svcrdma patch added to use max_sge_rd.

reordered the series some:
patch 1 is the small iser changes to support iwarp
patches 2..5 fix drivers to set max_sge_rd and ULPs to use them
patches 6-9 enhance isert to support iwarp

Changes since V4:

iser: fixedcompiler warning

isert: back to setting REMOTE_WRITE only for iWARP devices

Changes since V3:

Fixed commit messages based on feedback.

iser: adjust max_sectors

isert: split into 2 patches

isert: always set REMOTE_WRITE for dma mrs

Changes since V2:

The transport independent work is removed from this series and will
be submitted in a subsequent series.  This V3 series now enables iWARP
using existing core services.

Changes since V1:

Introduce and use transport-independent RDMA core services for allocating
DMA MRs and computing fast register access flags.

Correctly set the device max_sge_rd capability in several rdma device
drivers.

isert: use device capability max_sge_rd for the read sge depth.

isert: change max_sge to max_write_sge in struct isert_conn.

---

Sagi Grimberg (1):
  mlx4, mlx5, mthca: Expose max_sge_rd correctly

Steve Wise (8):
  isert: Support iWARP transports using FRMRs
  isert: Use local_dma_lkey whenever possible.
  isert: Use the device's max fastreg page list depth
  isert: Rename IO functions to more descriptive names
  RDMA/isert: Limit read depth based on the device max_sge_rd capability
  svcrdma: Use max_sge_rd for destination read depths
  ipath,qib: Expose max_sge_rd correctly
  RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth


 drivers/infiniband/hw/ipath/ipath_verbs.c|1 
 drivers/infiniband/hw/mlx4/main.c|1 
 drivers/infiniband/hw/mlx5/main.c|1 
 drivers/infiniband/hw/mthca/mthca_provider.c |1 
 drivers/infiniband/hw/qib/qib_verbs.c|1 
 drivers/infiniband/ulp/iser/iscsi_iser.c |9 +
 drivers/infiniband/ulp/isert/ib_isert.c  |  345 ++
 drivers/infiniband/ulp/isert/ib_isert.h  |5 
 include/linux/sunrpc/svc_rdma.h  |1 
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |   12 -
 net/sunrpc/xprtrdma/svc_rdma_transport.c |4 
 11 files changed, 320 insertions(+), 61 deletions(-)

-- 
Steve
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth

2015-07-24 Thread Steve Wise

Currently the sg tablesize, which dictates fast register page list
depth to use, does not take into account the limits of the rdma device.
So adjust it once we discover the device fastreg max depth limit.  Also
adjust the max_sectors based on the resulting sg tablesize.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/ulp/iser/iscsi_iser.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 6a594aa..de8730d 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -640,6 +640,15 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
   SHOST_DIX_GUARD_CRC);
}
 
+   /*
+* Limit the sg_tablesize and max_sectors based on the device
+* max fastreg page list length.
+*/
+   shost-sg_tablesize = min_t(unsigned short, shost-sg_tablesize,
+   ib_conn-device-dev_attr.max_fast_reg_page_list_len);
+   shost-max_sectors = min_t(unsigned int,
+   1024, (shost-sg_tablesize * PAGE_SIZE)  9);
+
if (iscsi_host_add(shost,
   ib_conn-device-ib_device-dma_device)) {
mutex_unlock(iser_conn-state_mutex);

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V6 2/9] mlx4, mlx5, mthca: Expose max_sge_rd correctly

2015-07-24 Thread Steve Wise

From: Sagi Grimberg sa...@mellanox.com

Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Reported-by: Steve Wise sw...@opengridcomputing.com
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---

 drivers/infiniband/hw/mlx4/main.c|1 +
 drivers/infiniband/hw/mlx5/main.c|1 +
 drivers/infiniband/hw/mthca/mthca_provider.c |1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 8be6db8..05166b7 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -229,6 +229,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props-max_qp_wr   = dev-dev-caps.max_wqes - 
MLX4_IB_SQ_MAX_SPARE;
props-max_sge = min(dev-dev-caps.max_sq_sg,
 dev-dev-caps.max_rq_sg);
+   props-max_sge_rd = props-max_sge;
props-max_cq  = dev-dev-quotas.cq;
props-max_cqe = dev-dev-caps.max_cqes;
props-max_mr  = dev-dev-quotas.mpt;
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 085c24b..bf27f21 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -273,6 +273,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 sizeof(struct mlx5_wqe_ctrl_seg)) /
 sizeof(struct mlx5_wqe_data_seg);
props-max_sge = min(max_rq_sg, max_sq_sg);
+   props-max_sge_rd = props-max_sge;
props-max_cq  = 1  MLX5_CAP_GEN(mdev, log_max_cq);
props-max_cqe = (1  MLX5_CAP_GEN(mdev, log_max_eq_sz)) - 1;
props-max_mr  = 1  MLX5_CAP_GEN(mdev, log_max_mkey);
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c 
b/drivers/infiniband/hw/mthca/mthca_provider.c
index 93ae51d..dc2d48c 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -97,6 +97,7 @@ static int mthca_query_device(struct ib_device *ibdev, struct 
ib_device_attr *pr
props-max_qp  = mdev-limits.num_qps - 
mdev-limits.reserved_qps;
props-max_qp_wr   = mdev-limits.max_wqes;
props-max_sge = mdev-limits.max_sg;
+   props-max_sge_rd  = props-max_sge;
props-max_cq  = mdev-limits.num_cqs - 
mdev-limits.reserved_cqs;
props-max_cqe = mdev-limits.max_cqes;
props-max_mr  = mdev-limits.num_mpts - 
mdev-limits.reserved_mrws;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH WIP 38/43] iser-target: Port to new memory registration API

2015-07-24 Thread Steve Wise

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Jason Gunthorpe
 Sent: Friday, July 24, 2015 3:25 PM
 To: Chuck Lever
 Cc: Sagi Grimberg; Christoph Hellwig; linux-rdma; Liran Liss; Oren Duer
 Subject: Re: [PATCH WIP 38/43] iser-target: Port to new memory registration 
 API

 On Fri, Jul 24, 2015 at 03:59:06PM -0400, Chuck Lever wrote:
  And RPC-over-RDMA version 1 does not have any way to signal that
  the server has invalidated the MRs. Such signaling would be a
  pre-requisite to allow the Linux NFS/RDMA client to interoperate
  with non-Linux NFS/RDMA servers that do not have such support.

 You can implement client support immediately, nothing special is
 required.

 When processing a SEND WC check ex.invalidate_rkey and
 IB_WC_WITH_INVALIDATE. If that rkey matches the MR associated with
 that RPC slot then skip the invalidate.

 No protocol negotiation is required at that point.

 I am unclear what happens sever side if the server starts issuing
 SEND_WITH_INVALIDATE to a client that doesn't expect it. The net
 result is a MR would be invalidated twice. I don't know if this is OK
 or not.

It is ok to invalidate an already-invalid MR.

 If it is OK, then the server can probably just start using it as
 well without negotiation.

 Otherwise the client has to signal the server it supports it once at
 connection setup.

  For FRWR, you could post LINV from the receive completion upcall
  handler, and handle the rest of the invalidation from the send
  completion upcall, then poke the RPC reply handler.

 Yes

  But this wouldn’t work at all for FMR, whose unmap verb is
  synchronous, would it?

 It could run the FMR unmap in a thread/workqueue/tasklet and then
 complete the RPC side from that context. Same basic idea, using your
 taslket not the driver's sendq context.

  I’m not sure we’d buy more than a few microseconds here, and
  the receive upcall is single-threaded.

 Not sure on how that matches your performance goals, just remarking
 that lauching the invalidate in the recv upcall and completing
 processing from the sendq upcall is the very best performance you can
 expect from this API.

 Jason
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V6 4/9] svcrdma: Use max_sge_rd for destination read depths

2015-07-24 Thread Steve Wise



 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Steve Wise
 Sent: Friday, July 24, 2015 11:19 AM
 To: dledf...@redhat.com
 Cc: infinip...@intel.com; sa...@mellanox.com; ogerl...@mellanox.com; 
 r...@mellanox.com; linux-rdma@vger.kernel.org;
 e...@mellanox.com; target-de...@vger.kernel.org; linux-...@vger.kernel.org; 
 bfie...@fieldses.org
 Subject: [PATCH V6 4/9] svcrdma: Use max_sge_rd for destination read depths
 
 Signed-off-by: Steve Wise sw...@opengridcomputing.com
 ---
 
  include/linux/sunrpc/svc_rdma.h  |1 +
  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |   12 +---
  net/sunrpc/xprtrdma/svc_rdma_transport.c |4 
  3 files changed, 6 insertions(+), 11 deletions(-)
 
 diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
 index cb94ee4..83211bc 100644
 --- a/include/linux/sunrpc/svc_rdma.h
 +++ b/include/linux/sunrpc/svc_rdma.h
 @@ -132,6 +132,7 @@ struct svcxprt_rdma {
   struct list_head sc_accept_q;   /* Conn. waiting accept */
   int  sc_ord;/* RDMA read limit */
   int  sc_max_sge;
 + int  sc_max_sge_rd; /* max sge for read target */
 
   int  sc_sq_depth;   /* Depth of SQ */
   atomic_t sc_sq_count;   /* Number of SQ WR on queue */
 diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
 b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 index 2e1348b..cb51742 100644
 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 @@ -115,15 +115,6 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
   rqstp-rq_arg.tail[0].iov_len = 0;
  }
 
 -static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
 -{
 - if (!rdma_cap_read_multi_sge(xprt-sc_cm_id-device,
 -  xprt-sc_cm_id-port_num))
 - return 1;
 - else
 - return min_t(int, sge_count, xprt-sc_max_sge);
 -}
 -
  /* Issue an RDMA_READ using the local lkey to map the data sink */
  int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
   struct svc_rqst *rqstp,
 @@ -144,8 +135,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
 
   ctxt-direction = DMA_FROM_DEVICE;
   ctxt-read_hdr = head;
 - pages_needed =
 - min_t(int, pages_needed, rdma_read_max_sge(xprt, pages_needed));
 + pages_needed = min_t(int, pages_needed, xprt-sc_max_sge_rd);
   read = min_t(int, pages_needed  PAGE_SHIFT, rs_length);
 
   for (pno = 0; pno  pages_needed; pno++) {
 diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
 b/net/sunrpc/xprtrdma/svc_rdma_transport.c
 index 6b36279..fdc850f 100644
 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
 +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
 @@ -872,6 +872,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
 *xprt)
* capabilities of this particular device */
   newxprt-sc_max_sge = min((size_t)devattr.max_sge,
 (size_t)RPCSVC_MAXPAGES);
 + newxprt-sc_max_sge_rd = min_t(size_t, devattr.max_sge_rd,
 +RPCSVC_MAXPAGES);
   newxprt-sc_max_requests = min((size_t)devattr.max_qp_wr,
  (size_t)svcrdma_max_requests);
   newxprt-sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt-sc_max_requests;
 @@ -1046,6 +1048,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
 *xprt)
   remote_ip   : %pI4\n
   remote_port : %d\n
   max_sge : %d\n
 + max_sge_rd  : %d\n
   sq_depth: %d\n
   max_requests: %d\n
   ord : %d\n,
 @@ -1059,6 +1062,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
 *xprt)
   ntohs(((struct sockaddr_in *)newxprt-sc_cm_id-
  route.addr.dst_addr)-sin_port),
   newxprt-sc_max_sge,
 + newxprt-sc_max_sge_rd,
   newxprt-sc_sq_depth,
   newxprt-sc_max_requests,
   newxprt-sc_ord);
 

With the above patch change, we have no more users of the recently created 
rdma_cap_read_multi_sge().  Should I add a patch to remove it?

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

2015-07-24 Thread Steve Wise

 -Original Message-
 From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
 Sent: Friday, July 24, 2015 11:57 AM
 To: Steve Wise
 Cc: dledf...@redhat.com; infinip...@intel.com; sa...@mellanox.com; 
 ogerl...@mellanox.com; r...@mellanox.com; linux-
 r...@vger.kernel.org; e...@mellanox.com; target-de...@vger.kernel.org; 
 linux-...@vger.kernel.org; bfie...@fieldses.org
 Subject: Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

 On Fri, Jul 24, 2015 at 11:19:05AM -0500, Steve Wise wrote:
  Add new register and unregister functions to be used with devices that
  support FRMRs, provide a local dma lkey, yet do not support DIF/PI.

  isert_reg_frmr() only needs to use FRMRs for RDMA READ since RDMA WRITE
  can be handled entirely with the local dma lkey.  So for RDMA READ,
  it calls isert_reg_read_frmr().  Otherwise is uses the lkey map service
  isert_map_lkey() for RDMA WRITEs.

  isert_reg_read_frmr() will create a linked list of WR triplets of the
  form: INV-FRWR-READ.  The number of these triplets is dependent on
  the devices fast reg page list length limit.

 That ordering seems strange, surely it should be

 FRWR-READ-INV

 And use IB_WR_RDMA_READ_WITH_INV if possible?

 ACCESS_REMOTE rkey's should not be left open across the FROM_DEVICE
 DMA flush.

You're correct.  I was thinking to simplify the IO by always invalidating 
before re-registering.  But it does leave the FRMR
registered and exposes a security hole.  

I'll have to rework this.

  /* assign function handlers */
  -   if (dev_attr-device_cap_flags  IB_DEVICE_MEM_MGT_EXTENSIONS 
  -   dev_attr-device_cap_flags  IB_DEVICE_LOCAL_DMA_LKEY 
  -   dev_attr-device_cap_flags  IB_DEVICE_SIGNATURE_HANDOVER) {
  -   device-use_fastreg = 1;
  -   device-reg_rdma_mem = isert_reg_frmr_pi;
  -   device-unreg_rdma_mem = isert_unreg_frmr_pi;
  +   cap_flags = dev_attr-device_cap_flags;
  +   if (cap_flags  IB_DEVICE_MEM_MGT_EXTENSIONS 
  +   cap_flags  IB_DEVICE_LOCAL_DMA_LKEY) {
  +   if (cap_flags  IB_DEVICE_SIGNATURE_HANDOVER) {
  +   device-use_fastreg = 1;
  +   device-reg_rdma_mem = isert_reg_frmr_pi;
  +   device-unreg_rdma_mem = isert_unreg_frmr_pi;
  +   } else {
  +   device-use_fastreg = 1;
  +   device-reg_rdma_mem = isert_reg_frmr;
  +   device-unreg_rdma_mem = isert_unreg_frmr;
  +   }

 The use of FRWR for RDMA READ should be iWarp specific, IB shouldn't
 pay that overhead. I am expecting to see a cap_rdma_read_rkey or
 something in here ?

Ok.  But cap_rdma_read_rkey() doesn't really describe the requirement.  The 
requirement is rkey + REMOTE_WRITE.  So it is more like
rdma_cap_read_requires_remote_write() which is ugly and too long (but 
descriptive)...

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V6 8/9] isert: Use local_dma_lkey whenever possible.

2015-07-24 Thread Steve Wise



 -Original Message-
 From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
 Sent: Friday, July 24, 2015 11:49 AM
 To: Steve Wise
 Cc: dledf...@redhat.com; infinip...@intel.com; sa...@mellanox.com; 
 ogerl...@mellanox.com; r...@mellanox.com; linux-
 r...@vger.kernel.org; e...@mellanox.com; target-de...@vger.kernel.org; 
 linux-...@vger.kernel.org; bfie...@fieldses.org
 Subject: Re: [PATCH V6 8/9] isert: Use local_dma_lkey whenever possible.
 
 On Fri, Jul 24, 2015 at 11:18:59AM -0500, Steve Wise wrote:
  No need to allocate a dma_mr if the device provides a local_dma_lkey.
 
 It is probably safe to put your series on top of mine, which
 incorporates this patch already.
 
 https://github.com/jgunthorpe/linux/commits/remove-ib_get_dma_mr
 
 Jason

I will rebase on this. 

 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth

2015-07-24 Thread Steve Wise

 -Original Message-
 From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
 Sent: Friday, July 24, 2015 11:41 AM
 To: Steve Wise
 Cc: dledf...@redhat.com; infinip...@intel.com; sa...@mellanox.com; 
 ogerl...@mellanox.com; r...@mellanox.com; linux-
 r...@vger.kernel.org; e...@mellanox.com; target-de...@vger.kernel.org; 
 linux-...@vger.kernel.org; bfie...@fieldses.org
 Subject: Re: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to 
 device fastreg max depth

 On Fri, Jul 24, 2015 at 11:18:21AM -0500, Steve Wise wrote:
  Currently the sg tablesize, which dictates fast register page list
  depth to use, does not take into account the limits of the rdma device.
  So adjust it once we discover the device fastreg max depth limit.  Also
  adjust the max_sectors based on the resulting sg tablesize.

 Huh. How does this relate to the max_page_list_len argument:

  struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)

 Shouldn't max_fast_reg_page_list_len be checked during the above?

 Ie does this still make sense:

 drivers/infiniband/ulp/iser/iser_verbs.c:   desc-data_mr = 
 ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE + 1);

 ?

 The only ULP that checks this is SRP, so basically, all our ULPs are
 probably quietly broken? cxgb3 has a limit of 10 (!?!?!!)

Yea seems like some drivers need to enforce this in ib_alloc_fast_reg_mr() as 
well as ib_alloc_fast_reg_page_list(), and ULPs need
to not exceed the device max.

I will fix iser to limit the mr and page_list allocation based on the device 
max.

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

2015-07-24 Thread Steve Wise

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Steve Wise
 Sent: Friday, July 24, 2015 2:58 PM
 To: 'Jason Gunthorpe'
 Cc: dledf...@redhat.com; infinip...@intel.com; sa...@mellanox.com; 
 ogerl...@mellanox.com; r...@mellanox.com; linux-
 r...@vger.kernel.org; e...@mellanox.com; target-de...@vger.kernel.org; 
 linux-...@vger.kernel.org; bfie...@fieldses.org
 Subject: RE: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

  -Original Message-
  From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
  Sent: Friday, July 24, 2015 2:24 PM
  To: Steve Wise
  Cc: dledf...@redhat.com; infinip...@intel.com; sa...@mellanox.com; 
  ogerl...@mellanox.com; r...@mellanox.com; linux-
  r...@vger.kernel.org; e...@mellanox.com; target-de...@vger.kernel.org; 
  linux-...@vger.kernel.org; bfie...@fieldses.org
  Subject: Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

  On Fri, Jul 24, 2015 at 01:48:09PM -0500, Steve Wise wrote:
The use of FRWR for RDMA READ should be iWarp specific, IB shouldn't
pay that overhead. I am expecting to see a cap_rdma_read_rkey or
something in here ?

   Ok.  But cap_rdma_read_rkey() doesn't really describe the
   requirement.  The requirement is rkey + REMOTE_WRITE.  So it is more
   like rdma_cap_read_requires_remote_write() which is ugly and too
   long (but descriptive)...

  I don't care much what name you pick, just jam something like this in
  the description

   If set then RDMA_READ must be performed by mapping the local
   buffers through a rkey MR with ACCESS_REMOTE_WRITE enabled.
   The rkey of this MR should be passed in as the sg_lists's lkey for
   IB_WR_RDMA_READ_WITH_INV.

   FRWR should be used to register the buffer in the send queue,
   and the read should be issued using IB_WR_RDMA_READ_WITH_INV (xx
   can we just implicitly rely on this? Are there any iWarp cards that
   support FRWR but not WITH_INV?)

 No.  And iWARP devices must support READ_WITH_INV from my reading of the 
 iWARP verbs spec.

 I will add all these comments and make use of READ_WITH_INV.

   Finally, only a single SGE can be used with RDMA_READ, all scattering
   must be accomplished with the MR.

  This quite dramatically changes what is an allowed scatter for the
  transfer, IB can support arbitary unaligned S/G lists, while this is
  now forced into gapless page aligned elements.

  Your patch takes care of this? And only impacts IB?

 Did you mean only impacts iWARP?

 Yes the patch takes care of sg-fr page list packing. And it uses the 
 existing isert_map_fr_pagelist() to pack the sg into the
 fastreg page list.  This same routine is used by the IB FRMR/PI reg/unreg 
 routines as well.

By the way, just to be clear: If you use a FRWR, you by definition only have 
one SGE entry as the result of the registration.  So
regardless of what a device/protocol can do with the destination SGE of an RDMA 
READ operation, if you use FRWR to register the
destination region, you need only 1 SGE in the RDMA READ WR.

Stevo.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] RDMA/cxgb3: fail get_dma_mr if the memory footprint can exceed 32b

2015-07-23 Thread Steve Wise

 -Original Message-
 From: Doug Ledford [mailto:dledf...@redhat.com]
 Sent: Thursday, July 23, 2015 4:33 PM
 To: Steve Wise
 Cc: linux-rdma@vger.kernel.org
 Subject: Re: [PATCH] RDMA/cxgb3: fail get_dma_mr if the memory footprint can 
 exceed 32b

 On 07/22/2015 03:14 PM, Steve Wise wrote:
  T3 HW only supports MRs of length  4GB.  If the system can have more
  than that we need to fail dma mr allocation so we con't create a MR that
  cannot span the entire possible memory space.

  Signed-off-by: Steve Wise sw...@opengridcomputing.com
  ---

   drivers/infiniband/hw/cxgb3/iwch_provider.c |4 
   1 files changed, 4 insertions(+), 0 deletions(-)

  diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
  b/drivers/infiniband/hw/cxgb3/iwch_provider.c
  index b1b7323..bbbe018 100644
  --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
  +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
  @@ -736,6 +736,10 @@ static struct ib_mr *iwch_get_dma_mr(struct ib_pd *pd, 
  int acc)
  /*
   * T3 only supports 32 bits of size.
   */
  +   if (sizeof(phys_addr_t)  4) {
  +   pr_warn_once(MOD Cannot support dma_mrs on this platform.\n);
  +   return ERR_PTR(-ENOTSUPP);
  +   }
  bl.size = 0x;
  bl.addr = 0;
  kva = 0;

 Should this be a static check of the pointer size versus installed
 memory?  Would it be possible to have this work for machines with less
 than 4GB of physical memory even if they have 64bit pointers, or are you
 concerned that hotplug memory could take us over the limit after
 registration and cause problems?

NFSRDMA doesn't need dma-mrs for T3 since it has FRMR + local dma lkey support. 
 And since the deficiency really can cause problems on 64b systems if the 
memory grows  4GB after dma-mr allocation, I decided to just not allow them 
for potential large memory systems.

Steve

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API

2015-07-23 Thread Steve Wise



 -Original Message-
 From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il]
 Sent: Thursday, July 23, 2015 5:21 AM
 To: Steve Wise; Sagi Grimberg; linux-rdma@vger.kernel.org
 Cc: Liran Liss; Oren Duer
 Subject: Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
 
 On 7/22/2015 10:21 PM, Steve Wise wrote:
 
  On 7/22/2015 1:55 AM, Sagi Grimberg wrote:
  Signed-off-by: Sagi Grimberg sa...@mellanox.com
  ---
net/sunrpc/xprtrdma/frwr_ops.c  | 80
  ++---
net/sunrpc/xprtrdma/xprt_rdma.h |  4 ++-
2 files changed, 47 insertions(+), 37 deletions(-)
 
  Did you intend to change svcrdma as well?
 
 All the ULPs need to convert. I didn't have a chance to convert
 svcrdma yet. Want to take it?

Not right now.  My focus is still on enabling iSER.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-23 Thread Steve Wise

 -Original Message-
 From: Hefty, Sean [mailto:sean.he...@intel.com]
 Sent: Thursday, July 23, 2015 1:53 PM
 To: Steve Wise; 'Christoph Hellwig'
 Cc: 'Sagi Grimberg'; 'Steve Wise'; 'Jason Gunthorpe'; 'Tom Talpey'; 'Doug 
 Ledford'; sa...@mellanox.com; ogerl...@mellanox.com;
 r...@mellanox.com; linux-rdma@vger.kernel.org; e...@mellanox.com; 
 target-de...@vger.kernel.org; linux-...@vger.kernel.org;
 trond.mykleb...@primarydata.com; bfie...@fieldses.org; 'Oren Duer'
 Subject: RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

  There is confusion about lkeys and rkeys with regard to iWARP.  In the
  iWARP verbs, there is no distinction between an lkey and
  rkey: they are the same key, called a Steering Tag or STAG.  When you
  create a MR, the lkey == rkey == STAG for iwarp transports.
  Somewhat related, but really a different issue, is that SGEs that are the
  target of a read need REMOTE_WRITE access flags on their
  STAG for iWARP.

  Clear as mud? :)

 This may be a nit, but IMO, the use of the term 'rkey' versus 'stag' matters. 
  They convey different ways of finding a data
buffer.  For
 example, do you locate a buffer using the stag, then verify that the offset + 
 length fits into the target buffer?  

Yes.  HW always uses the stag to locate a record that contains the stag state 
(valid or invalid), the access flags, the 8b key, the
va_base, length, PBL describing the host pages, etc.  HW validates all that 
before using the buffer.  NOTE: An stag of 0 is the
special local-dma-lkey which HW treats differently: If the stag is 0, then the 
address in the SGE is the bus/dma address itself and
no lookup of a MR/PBL/etc is needed.   Stag 0 can ONLY be used by kernel users 
and MUST never be accepted/used from an ingress
packet and MUST never be emitted on the wire in a READ or WRITE.

 Or do you locate the buffer
 by address, then verify that the key matches?

This is never done.

 Consider if we allow an app to specify the rkey/stag, or reference the buffer 
 using an offset, rather than a virtual address.

 This seems to be part of the difference between an lkey and an rkey.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb

2015-07-22 Thread Steve Wise


On 7/22/2015 12:22 PM, Sagi Grimberg wrote:

On 7/22/2015 7:58 PM, Jason Gunthorpe wrote:

On Wed, Jul 22, 2015 at 09:55:02AM +0300, Sagi Grimberg wrote:


+struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
+   enum ib_mr_type mr_type,
+   u32 max_entries,
+   u32 flags)
+{


This is just a copy of mlx4_ib_alloc_fast_reg_mr with
this added:


+if (mr_type != IB_MR_TYPE_FAST_REG || flags)
+return ERR_PTR(-EINVAL);


Are all the driver updates the same? It looks like it.

I'd suggest shortening this patch series, have the core provide the
wrapper immediately:

struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
{
...

 if (pd-device-alloc_mr) {
mr = pd-device-alloc_mr(pd, mr_type, max_entries, flags);
 } else {
if (mr_type != IB_MR_TYPE_FAST_REG || flags ||
!ib_dev-alloc_fast_reg_mr)
return ERR_PTR(-ENOSYS);
mr = pd-device-alloc_fast_reg_mr(..);
 }
}

Then go through the series to remove ib_alloc_fast_reg_mr

Then go through one series to migrate the drivers from
alloc_fast_reg_mr to alloc_mr

Then entirely drop alloc_fast_reg_mr from the driver API.

That should be shorter and easier to read the driver diffs, which is
the major change here.


Yea, it would be better...


43 patches overflows my stack ;)  I agree with Jason's suggestion.

Steve.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] RDMA/cxgb3: fail get_dma_mr if the memory footprint can exceed 32b

2015-07-22 Thread Steve Wise

T3 HW only supports MRs of length  4GB.  If the system can have more
than that we need to fail dma mr allocation so we con't create a MR that
cannot span the entire possible memory space.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/hw/cxgb3/iwch_provider.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index b1b7323..bbbe018 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -736,6 +736,10 @@ static struct ib_mr *iwch_get_dma_mr(struct ib_pd *pd, int 
acc)
/*
 * T3 only supports 32 bits of size.
 */
+   if (sizeof(phys_addr_t)  4) {
+   pr_warn_once(MOD Cannot support dma_mrs on this platform.\n);
+   return ERR_PTR(-ENOTSUPP);
+   }
bl.size = 0x;
bl.addr = 0;
kva = 0;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API

2015-07-22 Thread Steve Wise



On 7/22/2015 1:55 AM, Sagi Grimberg wrote:

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
  net/sunrpc/xprtrdma/frwr_ops.c  | 80 ++---
  net/sunrpc/xprtrdma/xprt_rdma.h |  4 ++-
  2 files changed, 47 insertions(+), 37 deletions(-)


Did you intend to change svcrdma as well?


diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 517efed..e28246b 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct 
ib_device *device,
f-fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0);
if (IS_ERR(f-fr_mr))
goto out_mr_err;
-   f-fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
-   if (IS_ERR(f-fr_pgl))
+
+   f-sg = kcalloc(sizeof(*f-sg), depth, GFP_KERNEL);
+   if (IS_ERR(f-sg))
goto out_list_err;
+
+   sg_init_table(f-sg, depth);
+
return 0;
  
  out_mr_err:

@@ -163,7 +167,7 @@ out_mr_err:
return rc;
  
  out_list_err:

-   rc = PTR_ERR(f-fr_pgl);
+   rc = -ENOMEM;
dprintk(RPC:   %s: ib_alloc_fast_reg_page_list status %i\n,
__func__, rc);
ib_dereg_mr(f-fr_mr);
@@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r)
if (rc)
dprintk(RPC:   %s: ib_dereg_mr status %i\n,
__func__, rc);
-   ib_free_fast_reg_page_list(r-r.frmr.fr_pgl);
+   kfree(r-r.frmr.sg);
  }
  
  static int

@@ -320,10 +324,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg,
struct ib_send_wr fastreg_wr, *bad_wr;
u8 key;
int len, pageoff;
-   int i, rc;
-   int seg_len;
-   u64 pa;
-   int page_no;
+   int i, rc, access;
  
  	mw = seg1-rl_mw;

seg1-rl_mw = NULL;
@@ -344,39 +345,46 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg,
if (nsegs  ia-ri_max_frmr_depth)
nsegs = ia-ri_max_frmr_depth;
  
-	for (page_no = i = 0; i  nsegs;) {

-   rpcrdma_map_one(device, seg, direction);
-   pa = seg-mr_dma;
-   for (seg_len = seg-mr_len; seg_len  0; seg_len -= PAGE_SIZE) {
-   frmr-fr_pgl-page_list[page_no++] = pa;
-   pa += PAGE_SIZE;
-   }
+   for (i = 0; i  nsegs;) {
+   sg_set_page(frmr-sg[i], seg-mr_page,
+   seg-mr_len, offset_in_page(seg-mr_offset));
len += seg-mr_len;
-   ++seg;
++i;
-   /* Check for holes */
+   ++seg;
+
+   /* Check for holes - needed?? */
if ((i  nsegs  offset_in_page(seg-mr_offset)) ||
offset_in_page((seg-1)-mr_offset + (seg-1)-mr_len))
break;
}
+
+   frmr-sg_nents = i;
+   frmr-dma_nents = ib_dma_map_sg(device, frmr-sg,
+   frmr-sg_nents, direction);
+   if (!frmr-dma_nents) {
+   pr_err(RPC:   %s: failed to dma map sg %p sg_nents %d\n,
+   __func__, frmr-sg, frmr-sg_nents);
+   return -ENOMEM;
+   }
+
dprintk(RPC:   %s: Using frmr %p to map %d segments (%d bytes)\n,
__func__, mw, i, len);
  
-	memset(fastreg_wr, 0, sizeof(fastreg_wr));

-   fastreg_wr.wr_id = (unsigned long)(void *)mw;
-   fastreg_wr.opcode = IB_WR_FAST_REG_MR;
-   fastreg_wr.wr.fast_reg.iova_start = seg1-mr_dma + pageoff;
-   fastreg_wr.wr.fast_reg.page_list = frmr-fr_pgl;
-   fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
-   fastreg_wr.wr.fast_reg.page_list_len = page_no;
-   fastreg_wr.wr.fast_reg.length = len;
-   fastreg_wr.wr.fast_reg.access_flags = writing ?
-   IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
-   IB_ACCESS_REMOTE_READ;
mr = frmr-fr_mr;
+   access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
+  IB_ACCESS_REMOTE_READ;
+   rc = ib_map_mr_sg(mr, frmr-sg, frmr-sg_nents, access);
+   if (rc) {
+   pr_err(RPC:   %s: failed to map mr %p rc %d\n,
+   __func__, frmr-fr_mr, rc);
+   return rc;
+   }
+
key = (u8)(mr-rkey  0x00FF);
ib_update_fast_reg_key(mr, ++key);
-   fastreg_wr.wr.fast_reg.rkey = mr-rkey;
+
+   memset(fastreg_wr, 0, sizeof(fastreg_wr));
+   ib_set_fastreg_wr(mr, mr-rkey, (uintptr_t)mw, false, fastreg_wr);
  
  	DECR_CQCOUNT(r_xprt-rx_ep);

rc = ib_post_send(ia-ri_id-qp, fastreg_wr, bad_wr);
@@ -385,15 +393,14 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg,
  
  	seg1-rl_mw = mw;

seg1-mr_rkey = mr-rkey;
-   seg1-mr_base = seg1-mr_dma + pageoff;
+   seg1-mr_base = mr-iova;

RE: [PATCH v3 05/15] xprtrdma: Remove last ib_reg_phys_mr() call site

2015-07-22 Thread Steve Wise

 -Original Message-
 From: linux-nfs-ow...@vger.kernel.org 
 [mailto:linux-nfs-ow...@vger.kernel.org] On Behalf Of Jason Gunthorpe
 Sent: Tuesday, July 21, 2015 5:55 PM
 To: Steve Wise
 Cc: 'Tom Talpey'; 'Chuck Lever'; linux-rdma@vger.kernel.org; 
 linux-...@vger.kernel.org
 Subject: Re: [PATCH v3 05/15] xprtrdma: Remove last ib_reg_phys_mr() call site

 On Tue, Jul 21, 2015 at 05:41:22PM -0500, Steve Wise wrote:
  On 7/20/2015 5:42 PM, Jason Gunthorpe wrote:
  On Mon, Jul 20, 2015 at 05:41:27PM -0500, Steve Wise wrote:
  B) why bother to check? Are machines with 4GB interesting, and worth
  supporting a special optimization?
  No, but cxgb3 is still interesting to user applications, and perhaps 
  NFSRDMA using FRMRs.
  Doesn't look like the NFS client will work. It requires an all
  physical memory lkey for SEND and RECV buffers..

  Jason

  Looks like cxgb3 supports LOCAL_DMA_LKEY and MEM_MGT_EXTENSIONS so dma mrs
  aren't required for NFSRDMA:

  t4:~/linux-2.6/drivers/infiniband/hw/cxgb3 # grep IB_DEVICE_ iwch_provider.c
  strlcpy(dev-ibdev.name, cxgb3_%d, IB_DEVICE_NAME_MAX);
  dev-device_cap_flags = IB_DEVICE_LOCAL_DMA_LKEY |
  IB_DEVICE_MEM_WINDOW |
  IB_DEVICE_MEM_MGT_EXTENSIONS;

 Neat. Is the dma_mask set properly (I don't see any set at all)?

iw_cxgb3 isn't a PCI driver.  It sits on top of cxgb3 which is the pci device 
driver and calls pci_set_dma_mask().

  So cxgb3 can still support NFSRDMA and user verbs w/o get_dma_mr(). I'll
  submit a patch soon to only support get_dma_mr() if unsigned long is 4
  bytes...

 So, NFS and RDS seem to be the only iWarp compatible ULPs?

 NFS has worked, and will continue to work with the global lkey.

 RDS looks like it relies on an insecure all physical rkey, so it won't
 work until that is fixed.

 So, I'd just use sizeof(physaddr_t)  4 as the test. The only people
 that could be impacted are RDS users using distro kernels on machines
 with less than 4G of ram. I somehow doubt there are any of those...

 Jason
 --
 To unsubscribe from this list: send the line unsubscribe linux-nfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

deprecating amso1100

2015-07-21 Thread Steve Wise


Hey Doug,

How should I submit changes to deprecate amso1100?  The HW hasn't been 
sold since 2005, and the SW has definite bit rot.  Its time to remove it...


Thanks,

Steve.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] RDMA/amso1100: deprecate the amso1100 provider

2015-07-21 Thread Steve Wise

The HW hasn't been sold since 2005, and the SW has definite bit rot.
Its time to remove it.  So move it to staging for a few releases and
then remove it after that.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---
 drivers/infiniband/Kconfig |1 -
 drivers/infiniband/hw/Makefile |1 -
 drivers/staging/Kconfig|2 ++
 drivers/staging/Makefile   |1 +
 drivers/{infiniband/hw = staging}/amso1100/Kbuild |0
 .../{infiniband/hw = staging}/amso1100/Kconfig|0
 drivers/{infiniband/hw = staging}/amso1100/c2.c   |0
 drivers/{infiniband/hw = staging}/amso1100/c2.h   |0
 .../{infiniband/hw = staging}/amso1100/c2_ae.c|0
 .../{infiniband/hw = staging}/amso1100/c2_ae.h|0
 .../{infiniband/hw = staging}/amso1100/c2_alloc.c |0
 .../{infiniband/hw = staging}/amso1100/c2_cm.c|0
 .../{infiniband/hw = staging}/amso1100/c2_cq.c|0
 .../{infiniband/hw = staging}/amso1100/c2_intr.c  |0
 .../{infiniband/hw = staging}/amso1100/c2_mm.c|0
 .../{infiniband/hw = staging}/amso1100/c2_mq.c|0
 .../{infiniband/hw = staging}/amso1100/c2_mq.h|0
 .../{infiniband/hw = staging}/amso1100/c2_pd.c|0
 .../hw = staging}/amso1100/c2_provider.c  |0
 .../hw = staging}/amso1100/c2_provider.h  |0
 .../{infiniband/hw = staging}/amso1100/c2_qp.c|0
 .../{infiniband/hw = staging}/amso1100/c2_rnic.c  |0
 .../hw = staging}/amso1100/c2_status.h|0
 .../{infiniband/hw = staging}/amso1100/c2_user.h  |0
 .../{infiniband/hw = staging}/amso1100/c2_vq.c|0
 .../{infiniband/hw = staging}/amso1100/c2_vq.h|0
 .../{infiniband/hw = staging}/amso1100/c2_wr.h|0
 27 files changed, 3 insertions(+), 2 deletions(-)
 rename drivers/{infiniband/hw = staging}/amso1100/Kbuild (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/Kconfig (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_ae.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_ae.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_alloc.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_cm.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_cq.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_intr.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_mm.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_mq.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_mq.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_pd.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_provider.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_provider.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_qp.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_rnic.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_status.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_user.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_vq.c (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_vq.h (100%)
 rename drivers/{infiniband/hw = staging}/amso1100/c2_wr.h (100%)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index b899531..c3c184e 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -58,7 +58,6 @@ source drivers/infiniband/hw/mthca/Kconfig
 source drivers/infiniband/hw/ipath/Kconfig
 source drivers/infiniband/hw/qib/Kconfig
 source drivers/infiniband/hw/ehca/Kconfig
-source drivers/infiniband/hw/amso1100/Kconfig
 source drivers/infiniband/hw/cxgb3/Kconfig
 source drivers/infiniband/hw/cxgb4/Kconfig
 source drivers/infiniband/hw/mlx4/Kconfig
diff --git a/drivers/infiniband/hw/Makefile b/drivers/infiniband/hw/Makefile
index e900b03..e179dfb 100644
--- a/drivers/infiniband/hw/Makefile
+++ b/drivers/infiniband/hw/Makefile
@@ -2,7 +2,6 @@ obj-$(CONFIG_INFINIBAND_MTHCA)  += mthca/
 obj-$(CONFIG_INFINIBAND_IPATH) += ipath/
 obj-$(CONFIG_INFINIBAND_QIB)   += qib/
 obj-$(CONFIG_INFINIBAND_EHCA)  += ehca/
-obj-$(CONFIG_INFINIBAND_AMSO1100)  += amso1100/
 obj-$(CONFIG_INFINIBAND_CXGB3) += cxgb3/
 obj-$(CONFIG_INFINIBAND_CXGB4) += cxgb4/
 obj-$(CONFIG_MLX4_INFINIBAND)  += mlx4/
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 7f6cae5..cec20d2 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -112,4 +112,6 @@ source drivers/staging/fsl-mc/Kconfig
 
 source drivers/staging/wilc1000/Kconfig
 
+source drivers/staging/amso1100/Kconfig
+
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index 347f647..4ca8633 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile

RE: [PATCH v3 05/15] xprtrdma: Remove last ib_reg_phys_mr() call site

2015-07-21 Thread Steve Wise

 -Original Message-
 From: Tom Talpey [mailto:t...@talpey.com]
 Sent: Tuesday, July 21, 2015 3:47 PM
 To: Steve Wise; 'Jason Gunthorpe'
 Cc: 'Chuck Lever'; linux-rdma@vger.kernel.org; linux-...@vger.kernel.org
 Subject: Re: [PATCH v3 05/15] xprtrdma: Remove last ib_reg_phys_mr() call site

 On 7/21/2015 7:33 AM, Steve Wise wrote:

  -Original Message-
  From: linux-rdma-ow...@vger.kernel.org 
  [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Tom Talpey
  Sent: Monday, July 20, 2015 7:15 PM
  To: Steve Wise; 'Jason Gunthorpe'
  Cc: 'Chuck Lever'; linux-rdma@vger.kernel.org; linux-...@vger.kernel.org
  Subject: Re: [PATCH v3 05/15] xprtrdma: Remove last ib_reg_phys_mr() call 
  site

  On 7/20/2015 3:41 PM, Steve Wise wrote:

  -Original Message-
  From: Tom Talpey [mailto:t...@talpey.com]
  Sent: Monday, July 20, 2015 5:04 PM
  To: Steve Wise; 'Jason Gunthorpe'
  Cc: 'Chuck Lever'; linux-rdma@vger.kernel.org; linux-...@vger.kernel.org
  Subject: Re: [PATCH v3 05/15] xprtrdma: Remove last ib_reg_phys_mr() 
  call site

  On 7/20/2015 2:16 PM, Steve Wise wrote:

  -Original Message-
  From: linux-nfs-ow...@vger.kernel.org 
  [mailto:linux-nfs-ow...@vger.kernel.org] On Behalf Of Jason Gunthorpe
  Sent: Monday, July 20, 2015 4:06 PM
  To: Tom Talpey; Steve Wise
  Cc: Chuck Lever; linux-rdma@vger.kernel.org; linux-...@vger.kernel.org
  Subject: Re: [PATCH v3 05/15] xprtrdma: Remove last ib_reg_phys_mr() 
  call site

  On Mon, Jul 20, 2015 at 01:34:16PM -0700, Tom Talpey wrote:
  On 7/20/2015 12:03 PM, Chuck Lever wrote:
  All HCA providers have an ib_get_dma_mr() verb. Thus
  rpcrdma_ia_open() will either grab the device's local_dma_key if one
  is available, or it will call ib_get_dma_mr() which is a 100%
  guaranteed fallback.

  I recall that in the past, some providers did not support mapping
  all of the machine's potential physical memory with a single dma_mr.
  If an rnic did/does not support 44-ish bits of length per region,
  for example.

  Looks like you are right, but the standard in kernel is to require
  ib_get_dma_mr, if the HCA can't do that, then it cannot be used on a
  big memory machine with kernel ULPs.

  Looking deeper, both amso1100 and cxgb3 seem limited to 32 bits of
  physical memory, and silently break all kernel ULPs if they are used
  on a modern machine with  4G.

  Is that right Steve?

  Yes.

  Based on that, should we remove the cxgb3 driver as well? Or at least
  can you fix it up to at least fail get_dma_mr if there is too much
  ram?

  I would like to keep cxgb3 around.  I can add code to fail if the 
  memory is  32b.  Do you know how I get the amount of
  available
  ram?

  A) are you sure it's an unsigned length, i.e. is it really 31 bits?

  yes.

  B) why bother to check? Are machines with 4GB interesting, and worth
  supporting a special optimization?

  No, but cxgb3 is still interesting to user applications, and perhaps 
  NFSRDMA using FRMRs.

  I'm obviously not making myself clear. I am suggesting that cxgb3 fail
  the ib_get_dma_mr() verb, regardless of installed memory.

  I am not suggesting it fail to load, or fail other memreg requests. It
  should work normally in all other respects.

  Even with its limitation, doesn't it have utility for someone using cxgb3 
  in an embedded 32b environment?

 Sure, do you mean making it conditional on #if sizeof(physaddr) = 32?
 That would make sense I guess.

No, a runtime check.  x64 platforms will work too if the mem size takes = 32b 
to describe. 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: deprecating amso1100

2015-07-21 Thread Steve Wise

On 7/21/2015 3:42 PM, Dalessandro, Dennis wrote:

-Original Message-
From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
ow...@vger.kernel.org] On Behalf Of Doug Ledford
Sent: Tuesday, July 21, 2015 2:49 PM
To: Steve Wise
Cc: linux-rdma@vger.kernel.org
Subject: Re: deprecating amso1100

On Jul 21, 2015, at 2:48 PM, Steve Wise sw...@opengridcomputing.com

wrote:

Hey Doug,

How should I submit changes to deprecate amso1100?  The HW hasn't been

sold since 2005, and the SW has definite bit rot.  Its time to remove it...

Thanks,

Steve.

Send me a git patch that uses git mv to move it to staging and we’ll leave it
there for a release or two and then remove it after that.

Steve, git format-patch -M does a nice job creating a patch that can actually 
be looked at via email.

http://marc.info/?l=linux-kernelm=143593782206479w=2

For the Ipath driver I was going to send directly to the staging 
maintainer/list. Doug, should that come to you through linux-rdma instead?

-Denny

Thanks.  Yes,  I originally sent the full patch and it didn't hit 
linux-rdma due to its size.  I just sent out a new patch via 
format-patch -M.

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v3 05/15] xprtrdma: Remove last ib_reg_phys_mr() call site

2015-07-21 Thread Steve Wise

   B) why bother to check? Are machines with 4GB interesting, and worth
   supporting a special optimization?
  
   No, but cxgb3 is still interesting to user applications, and perhaps 
   NFSRDMA using FRMRs.
  
   I'm obviously not making myself clear. I am suggesting that cxgb3 fail
   the ib_get_dma_mr() verb, regardless of installed memory.
  
   I am not suggesting it fail to load, or fail other memreg requests. It
   should work normally in all other respects.
  
   Even with its limitation, doesn't it have utility for someone using cxgb3 
   in an embedded 32b environment?
 
  Sure, do you mean making it conditional on #if sizeof(physaddr) = 32?
  That would make sense I guess.
 
 No, a runtime check.  x64 platforms will work too if the mem size takes = 
 32b to describe.

Jason/Doug, do you think it should allow dma mr allocaton iff totalmem_pages  
4GB?  Or do a check on the sizeof totalmem_pages and
only allow dma mrs if it is = 4?


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 5 6 7 8 9 >

1 - 100 of 833 matches

Mail list logo