RE: [PATCH V2 for-next 2/6] IB/core: Add RSS and TSS QP groups

2013-02-11 Thread Hefty, Sean
> RSS (Receive Side Scaling) TSS (Transmit Side Scaling, better known as
> MQ/Multi-Queue) are common networking techniques which allow to use
> contemporary NICs that support multiple receive and transmit descriptor
> queues (multi-queue), see also Documentation/networking/scaling.txt

If TSS is better known as MQ, then why not use that term instead?

> - qp group type attribute for qp creation saying whether this is a parent QP
> or rx/tx (rss/tss) child QP or none of the above for non rss/tss QPs.

Can we either define this as a new QP type or some QP creation flag, so that 
every user who wants to create a QP doesn't need to figure out what a QP group 
is and if their QP needs to be part of one?

Then you wouldn't need to define IB_QPG_NONE.
 
> - per qp group type, another attribute is added, for parent QPs, the number
> of rx/tx child QPs and for child QPs pointer to the parent.

If I understand the interface correctly, the user calls ib_create_qp() to 
create a parent QP and reserve space for all of the children.  They then call 
ib_create_qp() to allocate the children.  Is this correct?

What restrictions does a child QP have based on the parent?  E.g. same PD, CQ, 
QP size <= parent, number SGEs <= parent, destroyed with parent, etc.  And how 
independent is a child QP?  E.g. joins own multicast groups, different CQs, 
transitions states independently, etc.

It's not clear to me if using the existing interfaces are the best approach, if 
MQ is best handled as different QPs, if MQ is better abstracted as a 'QP' that 
has multiple send/receive queues, if MQ should just be completely hidden 
beneath verbs, or what.

The XRC model of creating the parent and using open to associated related QPs 
still seems more appropriate, but it depends on how independent the parent and 
child QPs are.  We don't have a spec (formal or informal) that defines how the 
verbs function with these new QP types, which makes reviewing these changes 
difficult.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V2 for-next 1/6] IB/ipoib: Fix ipoib_neigh hashing to use the correct daddr octets

2013-02-11 Thread Hefty, Sean
> The hash function introduced in commit b63b70d877 "IPoIB: Use a private hash
> table for path lookup in xmit path" was designd to use the 3 octets of the
> IPoIB HW address that holds the remote QPN. However, this currently isn't
> the case under little endian machines as the code there uses the flags part
> (octet[0]) and not the last octet of the QPN (octet[3]), fix that.
> 
> The fix caused a checkpatch warning on line over 80 characters, to
> solve that changed the name of the temp variable that holds the daddr.
> 
> Signed-off-by: Shlomo Pongratz 
> Signed-off-by: Or Gerlitz 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 6fdc9e7..e459fa7 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -844,10 +844,10 @@ static u32 ipoib_addr_hash(struct ipoib_neigh_hash 
> *htbl,
> u8 *daddr)
>* different subnets.
>*/
>/* qpn octets[1:4) & port GUID octets[12:20) */
> - u32 *daddr_32 = (u32 *) daddr;
> + u32 *d32 = (u32 *)daddr;
>   u32 hv;
> 
> - hv = jhash_3words(daddr_32[3], daddr_32[4], 0xFF & daddr_32[0], 0);
> + hv = jhash_3words(d32[3], d32[4], cpu_to_be32(0xFF) & d32[0], 0);

Should d32 be declared as __be32 *?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


interest in single core rdma library

2013-02-11 Thread Hefty, Sean
I wanted to gauge if there was support or opposition in creating a single RDMA 
library, librdma.

The intent would be to construct a single library containing the core 
functionality -- data transfers, connection establishment, and subnet 
management -- required by most applications.  It could target combining 
libibverbs, librdmacm, and libibumad APIs.
 
- Sean 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA crashing

2013-02-11 Thread J. Bruce Fields
On Mon, Feb 11, 2013 at 03:19:42PM +, Yan Burman wrote:
> > -Original Message-
> > From: J. Bruce Fields [mailto:bfie...@fieldses.org]
> > Sent: Thursday, February 07, 2013 18:42
> > To: Yan Burman
> > Cc: linux-...@vger.kernel.org; sw...@opengridcomputing.com; linux-
> > r...@vger.kernel.org; Or Gerlitz
> > Subject: Re: NFS over RDMA crashing
> > 
> > On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote:
> > > On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
> > > > When killing mount command that got stuck:
> > > > ---
> > > >
> > > > BUG: unable to handle kernel paging request at 880324dc7ff8
> > > > IP: [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD
> > > > 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161
> > > > Oops: 0003 [#1] PREEMPT SMP
> > > > Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm
> > iw_cm
> > > > ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
> > > > iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables
> > > > nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
> > > > target_core_file target_core_pscsi target_core_mod configfs 8021q
> > > > bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net
> > > > macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel
> > > > kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core
> > > > ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core
> > > > mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3
> > jbd
> > > > sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6
> > > > Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
> > > > X8DTH-i/6/iF/6F/X8DTH
> > > > RIP: 0010:[]  []
> > > > rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > > > RSP: 0018:880324c3dbf8  EFLAGS: 00010297
> > > > RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428
> > > > RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618
> > > > RBP: 880324c3dd78 R08: 60f9c860 R09: 0001
> > > > R10: 880324dd8000 R11: 0001 R12: 8806299dcb10
> > > > R13: 0003 R14: 0001 R15: 0010
> > > > FS:  () GS:88063fc0()
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 8005003b
> > > > CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0
> > > > DR0:  DR1:  DR2: 
> > > > DR3:  DR6: 0ff0 DR7: 0400
> > > > Process nfsd (pid: 4744, threadinfo 880324c3c000, task
> > > > 88033055)
> > > > Stack:
> > > >  880324c3dc78 880324c3dcd8 0282 880631cec000
> > > >  880324dd8000 88062ed33040 000124c3dc48 880324dd8000
> > > >  88062ed33058 880630ce2b90 8806299e8000 0003
> > > > Call Trace:
> > > >  [] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
> > > > [] ? try_to_wake_up+0x2f0/0x2f0
> > > > [] svc_recv+0x3ef/0x4b0 [sunrpc]
> > > > [] ? nfsd_svc+0x740/0x740 [nfsd]
> > > > [] nfsd+0xad/0x130 [nfsd]  [] ?
> > > > nfsd_svc+0x740/0x740 [nfsd]  [] kthread+0xd6/0xe0
> > > > [] ? __init_kthread_worker+0x70/0x70
> > > > [] ret_from_fork+0x7c/0xb0  [] ?
> > > > __init_kthread_worker+0x70/0x70
> > > > Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00
> > > > 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00
> > > > <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP
> > > > [] rdma_read_xdr+0x8bb/0xd40 [svcrdma]  RSP
> > > > 
> > > > CR2: 880324dc7ff8
> > > > ---[ end trace 06d0384754e9609a ]---
> > > >
> > > >
> > > > It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
> > > > "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
> > > > is responsible for the crash (it seems to be crashing in
> > > > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
> > > > It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
> > > > CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
> > > >
> > > > When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
> > > > was no longer getting the server crashes, so the reset of my tests
> > > > were done using that point (it is somewhere in the middle of
> > > > 3.7.0-rc2).
> > >
> > > OK, so this part's clearly my fault--I'll work on a patch, but the
> > > rdma's use of the ->rq_pages array is pretty confusing.
> > 
> > Does this help?
> > 
> > They must have added this for some reason, but I'm not seeing how it could
> > have ever done anything
> > 
> > --b.
> > 
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > index 0ce7552..e8f25ec 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > @@ -520,13 +520,6 @@ next_sge:
> > for (ch_no = 0; &rqst

RE: [PATCH v4 1/9] rdma/cm: define native IB address

2013-02-11 Thread Hefty, Sean
> Define AF_IB and sockaddr_ib to allow the rdma_cm to use native IB
> addressing.
> 
> Signed-off-by: Sean Hefty 
> ---
>  include/linux/socket.h |2 +
>  include/rdma/ib.h  |   89 
> 
>  2 files changed, 91 insertions(+), 0 deletions(-)
>  create mode 100644 include/rdma/ib.h
> 
> diff --git a/include/linux/socket.h b/include/linux/socket.h
> index 9a546ff..17a33f7 100644
> --- a/include/linux/socket.h
> +++ b/include/linux/socket.h
> @@ -167,6 +167,7 @@ struct ucred {
>  #define AF_PPPOX 24  /* PPPoX sockets*/
>  #define AF_WANPIPE   25  /* Wanpipe API Sockets */
>  #define AF_LLC   26  /* Linux LLC*/
> +#define AF_IB27  /* Native InfiniBand address*/

...

> diff --git a/include/rdma/ib.h b/include/rdma/ib.h

...

> +struct sockaddr_ib {
> + unsigned short int  sib_family; /* AF_IB */
> + __be16  sib_pkey;
> + __be32  sib_flowinfo;
> + struct ib_addr  sib_addr;
> + __be64  sib_sid;
> + __be64  sib_sid_mask;
> + __u64   sib_scope_id;
> +};

Dave/Roland/anyone, is there any feedback on this approach?

If there's hesitation to add new address families to socket.h, I could instead 
use definitions local to the rdma_cm.  This has the potential to result in 
conflicts if the rdma_cm is expanded for other address families, though such 
conflicts seem unlikely.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: NFS over RDMA crashing

2013-02-11 Thread Yan Burman
> -Original Message-
> From: J. Bruce Fields [mailto:bfie...@fieldses.org]
> Sent: Thursday, February 07, 2013 18:42
> To: Yan Burman
> Cc: linux-...@vger.kernel.org; sw...@opengridcomputing.com; linux-
> r...@vger.kernel.org; Or Gerlitz
> Subject: Re: NFS over RDMA crashing
> 
> On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote:
> > On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
> > > When killing mount command that got stuck:
> > > ---
> > >
> > > BUG: unable to handle kernel paging request at 880324dc7ff8
> > > IP: [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD
> > > 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161
> > > Oops: 0003 [#1] PREEMPT SMP
> > > Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm
> iw_cm
> > > ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
> > > iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables
> > > nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
> > > target_core_file target_core_pscsi target_core_mod configfs 8021q
> > > bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net
> > > macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel
> > > kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core
> > > ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core
> > > mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3
> jbd
> > > sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6
> > > Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
> > > X8DTH-i/6/iF/6F/X8DTH
> > > RIP: 0010:[]  []
> > > rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > > RSP: 0018:880324c3dbf8  EFLAGS: 00010297
> > > RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428
> > > RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618
> > > RBP: 880324c3dd78 R08: 60f9c860 R09: 0001
> > > R10: 880324dd8000 R11: 0001 R12: 8806299dcb10
> > > R13: 0003 R14: 0001 R15: 0010
> > > FS:  () GS:88063fc0()
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 8005003b
> > > CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0
> > > DR0:  DR1:  DR2: 
> > > DR3:  DR6: 0ff0 DR7: 0400
> > > Process nfsd (pid: 4744, threadinfo 880324c3c000, task
> > > 88033055)
> > > Stack:
> > >  880324c3dc78 880324c3dcd8 0282 880631cec000
> > >  880324dd8000 88062ed33040 000124c3dc48 880324dd8000
> > >  88062ed33058 880630ce2b90 8806299e8000 0003
> > > Call Trace:
> > >  [] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
> > > [] ? try_to_wake_up+0x2f0/0x2f0
> > > [] svc_recv+0x3ef/0x4b0 [sunrpc]
> > > [] ? nfsd_svc+0x740/0x740 [nfsd]
> > > [] nfsd+0xad/0x130 [nfsd]  [] ?
> > > nfsd_svc+0x740/0x740 [nfsd]  [] kthread+0xd6/0xe0
> > > [] ? __init_kthread_worker+0x70/0x70
> > > [] ret_from_fork+0x7c/0xb0  [] ?
> > > __init_kthread_worker+0x70/0x70
> > > Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00
> > > 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00
> > > <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP
> > > [] rdma_read_xdr+0x8bb/0xd40 [svcrdma]  RSP
> > > 
> > > CR2: 880324dc7ff8
> > > ---[ end trace 06d0384754e9609a ]---
> > >
> > >
> > > It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
> > > "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
> > > is responsible for the crash (it seems to be crashing in
> > > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
> > > It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
> > > CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
> > >
> > > When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
> > > was no longer getting the server crashes, so the reset of my tests
> > > were done using that point (it is somewhere in the middle of
> > > 3.7.0-rc2).
> >
> > OK, so this part's clearly my fault--I'll work on a patch, but the
> > rdma's use of the ->rq_pages array is pretty confusing.
> 
> Does this help?
> 
> They must have added this for some reason, but I'm not seeing how it could
> have ever done anything
> 
> --b.
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index 0ce7552..e8f25ec 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -520,13 +520,6 @@ next_sge:
>   for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages;
> ch_no++)
>   rqstp->rq_pages[ch_no] = NULL;
> 
> - /*
> -  * Detach res pages. If svc_release sees any it will attempt to
> -  * put them.
> -  */
> - while (rqstp->rq_next_page != rq