Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-30 Thread Shlomo Pongratz

On 4/29/2013 11:36 PM, Jason Gunthorpe wrote:

On Mon, Apr 29, 2013 at 10:52:21PM +0300, Or Gerlitz wrote:

On Fri, Apr 26, 2013 at 12:40 AM, Jason Gunthorpe wrote:

But I don't follow why the send QPNs have to be sequential for
IPoIB. It looks like this is being motivated by RSS and RSS QPNs are
just being reused for TSS?

Go read It turns out that there are IPoIB drivers used by some
operating-systems
and/or Hypervisors in a para-virtualization (PV) scheme which extract the
source QPN from the CQ WC associated with an incoming packets in order
to.. and what follows in the change-log of patch 4/5
http://marc.info/?l=linux-rdmam=136412901621797w=2

This is what I said in the first place, the RFC is premised on the
src.QPN to be set properly, you can't just mess with it, because stuff
needs it.

I think you should have split this patch up, there is lots going on
here.

- Add proper TSS that doesn't change the wire protocol
- Add fake TSS that does change the wire protocol, and
   properly document those changes so other people can
   follow/implement them
- Add RSS

And.. 'tss_qpn_mask_sz' seems unnecessarily limiting, using
  WC.srcQPN + ipoib_header.tss_qpn_offset == real QPN
  (ie use a signed offset, not a mask)
Seems much better than
  Wc.srcQPN  ~((1(ipoib_header.tss_qpn_mask_sz  12))-1) == real QPN
  (Did I even get that right?)

Specifically it means the requirements for alignment and
contiguous-ness are gone. This means you can implement it without
using the QP groups API and it will work immediately with every HCA
out there. I think if we are going to actually mess with the wire
protocol this sort of broad applicability is important.

As for the other two questions: seems reasonable to me. Without a
consensus among HW vendors how to do this it makes sense to move ahead
*in the kernel* with a minimal API. Userspace is a different question
of course..

Jason

Hi Jason,

Your suggestion could have been valid if the the IPoIB header was larger.
Please note that the a QPN occupies 3 octets and thus its value lies in 
the range of [0..0xFF].
On the other hand the reserved field in the IPoIB header occupies only 2 
octets, so given an arbitrary group of source QPN it may be not possible 
to recover the real QPN.
This is why the real QPN should be a power of two and the rest should 
have consecutive numbers. And since the number of the TSS QP is 
relatively small, that is, in the order of the number of the cores than 
masking the lower bits of the Wc.srcQPN will recover the real QPN 
number.
Also by sending only the mask length we don't use the entire reserved 
filed but only 4 bits leaving 12 bits to future use.


Best regards,

S.P.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-30 Thread Jason Gunthorpe
On Tue, Apr 30, 2013 at 12:04:25PM +0300, Shlomo Pongratz wrote:

 And.. 'tss_qpn_mask_sz' seems unnecessarily limiting, using
   WC.srcQPN + ipoib_header.tss_qpn_offset == real QPN
   (ie use a signed offset, not a mask)
 Seems much better than
   Wc.srcQPN  ~((1(ipoib_header.tss_qpn_mask_sz  12))-1) == real QPN
   (Did I even get that right?)

 Your suggestion could have been valid if the the IPoIB header was larger.
 Please note that the a QPN occupies 3 octets and thus its value lies
 in the range of [0..0xFF].

I am aware of this, and it isn't really a problem, adaptors that
allocate randomly across the entire QPN space would not be compatible
with this approach, but most adaptors allocate QPNs
quasi-contiguously.

Basically, at startup, IPoIB would allocate a TX QP, then allocate TSS
QPs, and throw away any that can't fit in the encoding, until it
reaches the target number or tries too long. No need for a special API
to the driver.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-30 Thread Or Gerlitz
Jason Gunthorpe jguntho...@obsidianresearch.com wrote:

 For the TSS case, I'd say just allocate normal QPs and provide
 something like ibv_override_ud_src_qpn(). This is very general and
 broadly useful for any application using UD QPs.


I've lost you, how you suggest to implement ibv_override_ud_src_qpn(), is
that for future HW or you have a method to get work today.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-30 Thread Jason Gunthorpe
On Tue, Apr 30, 2013 at 11:08:19PM +0300, Or Gerlitz wrote:
 Jason Gunthorpe jguntho...@obsidianresearch.com wrote:
 
 For the TSS case, I'd say just allocate normal QPs and provide
 something like ibv_override_ud_src_qpn(). This is very general and
 broadly useful for any application using UD QPs.
 
 I've lost you, how you suggest to implement ibv_override_ud_src_qpn(), is that
 for future HW or you have a method to get work today.

I meant as a user space API alternative to the parent/child group API
for transmit. It would require some level of driver/FW/HW support of
course.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-29 Thread Or Gerlitz
On Fri, Apr 26, 2013 at 12:40 AM, Jason Gunthorpe

 Also, I feel what happens inside the kernel is more flexable API
 wise, so dropping the uverbs component may also be something you want to look 
 at.

We didn't submit any uverbs exporting of these verbs on this series. I
am  OK if the series is accepted for kernel use only (as was
submitted) and later we open a discussion on the user space API where
once converges, we can decide if to port the kernel RSS/TSS bits to
the newly agreed API.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-29 Thread Or Gerlitz
On Fri, Apr 26, 2013 at 12:40 AM, Jason Gunthorpe wrote:
 But I don't follow why the send QPNs have to be sequential for
 IPoIB. It looks like this is being motivated by RSS and RSS QPNs are
 just being reused for TSS?

Go read It turns out that there are IPoIB drivers used by some
operating-systems
and/or Hypervisors in a para-virtualization (PV) scheme which extract the
source QPN from the CQ WC associated with an incoming packets in order
to.. and what follows in the change-log of patch 4/5
http://marc.info/?l=linux-rdmam=136412901621797w=2
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-29 Thread Or Gerlitz
On Fri, Apr 26, 2013 at 12:40 AM, Jason Gunthorpe wrote:

 As Sean said earlier, please think about a single QP, multiple RQ/SQ
 style API - that seems much more general to me and also could
 reasonably be defined for other transport types.

I find it to have too much of an abstraction for kernel level API,
since that single QP isn't really a HW construct but rather something
artificial. For UD/RAW PACKET QPs, RSS is natual and done on maybe 
100 Ethernet NIC drivers, where a special steering rule sends RX
packet to a dispatcher QP who applies hash and does 2nd dispatching to
QPs/rings depending on the hash results, so now we want to bring that
to IPoIB too, and we allow to specify that parent etc etc.

Or.



 For instance, someday supporting multiple RQ on a RC transport, with
 content-based steering, is a limited form of tag matching.. From a
 longer-term user space API design standpoint the concept seems to have
 more longevity.

 Also, I feel what happens inside the kernel is more flexable API
 wise, so dropping the uverbs component may also be something you want
 to look at.

 Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-29 Thread Jason Gunthorpe
On Mon, Apr 29, 2013 at 10:52:21PM +0300, Or Gerlitz wrote:
 On Fri, Apr 26, 2013 at 12:40 AM, Jason Gunthorpe wrote:
  But I don't follow why the send QPNs have to be sequential for
  IPoIB. It looks like this is being motivated by RSS and RSS QPNs are
  just being reused for TSS?
 
 Go read It turns out that there are IPoIB drivers used by some
 operating-systems
 and/or Hypervisors in a para-virtualization (PV) scheme which extract the
 source QPN from the CQ WC associated with an incoming packets in order
 to.. and what follows in the change-log of patch 4/5
 http://marc.info/?l=linux-rdmam=136412901621797w=2

This is what I said in the first place, the RFC is premised on the
src.QPN to be set properly, you can't just mess with it, because stuff
needs it.

I think you should have split this patch up, there is lots going on
here.

- Add proper TSS that doesn't change the wire protocol
- Add fake TSS that does change the wire protocol, and
  properly document those changes so other people can
  follow/implement them
- Add RSS

And.. 'tss_qpn_mask_sz' seems unnecessarily limiting, using
 WC.srcQPN + ipoib_header.tss_qpn_offset == real QPN
 (ie use a signed offset, not a mask)
Seems much better than
 Wc.srcQPN  ~((1(ipoib_header.tss_qpn_mask_sz  12))-1) == real QPN
 (Did I even get that right?)

Specifically it means the requirements for alignment and
contiguous-ness are gone. This means you can implement it without
using the QP groups API and it will work immediately with every HCA
out there. I think if we are going to actually mess with the wire
protocol this sort of broad applicability is important.

As for the other two questions: seems reasonable to me. Without a
consensus among HW vendors how to do this it makes sense to move ahead
*in the kernel* with a minimal API. Userspace is a different question
of course..

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-25 Thread Jason Gunthorpe
On Wed, Apr 24, 2013 at 02:24:45AM +, Hefty, Sean wrote:

 Conceptually, RSS/TSS are a set of send/receive work queues all
 belonging to the same transport level address.  There's no
 parent-child relationship or needed pairing of send and receive
 queues together in order to form a group.

This view makes sense to me as well. Can someone also confirm that
using TSS doesn't affect the on-the-wire packets vs the non-TSS cases?
I heard a few comments that sounded like TSS users get a per-queue QPN
in the outgoing packet rather than a single QPN for the group, which
would be pretty ugly.

IMHO, this sort of stuff needs to have a very well defined on-the-wire
behaviour, even if it is just documented in the ibverbs man pages.

 Personally, I'd like to see a way that better captures the notion of
 a 'set of work queues with the same address'.  For example, it makes
 more sense to me if a user created/destroyed the work queues
 together, and if the WQs were viewed as being in a single state
 (INIT, RTR, RTS...).

Yah, an API that made work queues a sub object of the QP seems to make
much more sense than trying to manage an array of QPs.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-25 Thread Hefty, Sean
  Conceptually, RSS/TSS are a set of send/receive work queues all
  belonging to the same transport level address.  There's no
  parent-child relationship or needed pairing of send and receive
  queues together in order to form a group.
 
 This view makes sense to me as well. Can someone also confirm that
 using TSS doesn't affect the on-the-wire packets vs the non-TSS cases?
 I heard a few comments that sounded like TSS users get a per-queue QPN
 in the outgoing packet rather than a single QPN for the group, which
 would be pretty ugly.

After speaking with Tzahi, my understanding is that the receive work queues all 
receive on the same QPN, but the send work queues use different QPNs.  The 
on-wire packets are affected, specifically the ipoib header.  This is why the 
send QPNs must be sequential, so that a mask can be applied at the receiving 
side to determine a single source QPN.
 
- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-25 Thread Jason Gunthorpe
On Thu, Apr 25, 2013 at 08:26:45PM +, Hefty, Sean wrote:
   Conceptually, RSS/TSS are a set of send/receive work queues all
   belonging to the same transport level address.  There's no
   parent-child relationship or needed pairing of send and receive
   queues together in order to form a group.
  
  This view makes sense to me as well. Can someone also confirm that
  using TSS doesn't affect the on-the-wire packets vs the non-TSS cases?
  I heard a few comments that sounded like TSS users get a per-queue QPN
  in the outgoing packet rather than a single QPN for the group, which
  would be pretty ugly.
 
 After speaking with Tzahi, my understanding is that the receive work
 queues all receive on the same QPN, but the send work queues use
 different QPNs.  The on-wire packets are affected, specifically the
 ipoib header.  This is why the send QPNs must be sequential, so that
 a mask can be applied at the receiving side to determine a single
 source QPN.

Ah, this seems contrary to the IPoIB specification? Someone should
probably talk about how sending from the wrong QPN is acceptable..

As I said, that is ugly. 'TSS' that changes the on-the-wire packet is
not TSS. It is just ganging QPs together.

Allocating sequential TSS QPNs is an awful hack, what we really need
is a way to force a UD QP's outgoing QPN.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-25 Thread Or Gerlitz
Jason Gunthorpe jguntho...@obsidianresearch.com wrote:

 On Thu, Apr 25, 2013 at 08:26:45PM +, Hefty, Sean wrote:
  After speaking with Tzahi, my understanding is that the receive work
  queues all receive on the same QPN, but the send work queues use
  different QPNs.  The on-wire packets are affected, specifically the
  ipoib header.  This is why the send QPNs must be sequential, so that
  a mask can be applied at the receiving side to determine a single
  source QPN.

 Ah, this seems contrary to the IPoIB specification? Someone should probably 
 talk about how sending from the wrong QPN is acceptable..



AFAIK the IPoIB specification doesn't mandate the QPN of the sender


 As I said, that is ugly. 'TSS' that changes the on-the-wire packet is not 
 TSS. It is just ganging QPs together.

 Allocating sequential TSS QPNs is an awful hack, what we really need is a way 
 to force a UD QP's outgoing QPN.


INDEED, but this must be supported by the HW. The patch set is already
supporting the case of HW the knows to do that forcing, quoting  ---
IB_DEVICE_UD_TSS which is set to indicate that the device supports HW
TSS which means that the HW is capable of over-riding the source UD
QPN present in sent IB datagram header (DTH) with the parent's QPN
--- where over such HW the on-the-wire IPoIB header isn't touched.

BUT for the sake of improving performance and being competitive with
tons of Linux Ethernet drivers that support TSS/MQ we still need IPoIB
to support MQ/TSS before such HW is introduced, and as such the chosen
solution was to use reserved fields of the wire header.

How about we discuss RSS 1st? for RSS no wire change is introduced,
lets see if/how we can come to an agreement how the RSS related verbs
should look like and we'll take it from there to TSS.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-25 Thread Jason Gunthorpe
On Thu, Apr 25, 2013 at 11:56:16PM +0300, Or Gerlitz wrote:

  Ah, this seems contrary to the IPoIB specification? Someone should
  probably talk about how sending from the wrong QPN is acceptable..

 AFAIK the IPoIB specification doesn't mandate the QPN of the sender

I'd have to read it again very carefully.. However, checking the src
QPN of every UD packet is the only way to detect if the packet was
generated by the authentic kernel or from an unprivileged user space
process, so there is a certainly importance in the value.

But I don't follow why the send QPNs have to be sequential for
IPoIB. It looks like this is being motivated by RSS and RSS QPNs are
just being reused for TSS?

  As I said, that is ugly. 'TSS' that changes the on-the-wire packet
  is not TSS. It is just ganging QPs together.
 
  Allocating sequential TSS QPNs is an awful hack, what we really
  need is a way to force a UD QP's outgoing QPN.
 
 INDEED, but this must be supported by the HW. The patch set is already
 supporting the case of HW the knows to do that forcing, quoting  ---
 IB_DEVICE_UD_TSS which is set to indicate that the device supports HW
 TSS which means that the HW is capable of over-riding the source UD
 QPN present in sent IB datagram header (DTH) with the parent's QPN
 --- where over such HW the on-the-wire IPoIB header isn't touched.

For the TSS case, I'd say just allocate normal QPs and provide
something like ibv_override_ud_src_qpn(). This is very general and
broadly useful for any application using UD QPs.

 BUT for the sake of improving performance and being competitive with
 tons of Linux Ethernet drivers that support TSS/MQ we still need IPoIB
 to support MQ/TSS before such HW is introduced, and as such the chosen
 solution was to use reserved fields of the wire header.

You've lost me again, what reserved bits?

If a new uverb is introduced the on-the-wire behaviour needs to be
fully documented..

 How about we discuss RSS 1st? for RSS no wire change is introduced,
 lets see if/how we can come to an agreement how the RSS related verbs
 should look like and we'll take it from there to TSS.

Well, to me, TSS is pretty simple. RSS is where things got really
complicated..

As Sean said earlier, please think about a single QP, multiple RQ/SQ
style API - that seems much more general to me and also could
reasonably be defined for other transport types.

For instance, someday supporting multiple RQ on a RC transport, with
content-based steering, is a limited form of tag matching.. From a
longer-term user space API design standpoint the concept seems to have
more longevity.

Also, I feel what happens inside the kernel is more flexable API
wise, so dropping the uverbs component may also be something you want
to look at.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-23 Thread Or Gerlitz
On Mon, Apr 22, 2013 at 7:46 PM, Or Gerlitz or.gerl...@gmail.com wrote:
 Sean, Tzahi -- I understand now that there have been few talkings @
 the OFA meeting re this patch set. So what's the way to move forward,
 any concrete feedback that needs to be addressed here?  This series is
 hanging since May 2012 and I'd like to see it gets in for 3.10, now if
 indeed Sean is OK with the general framework, please suggest.

Sean,

I understand that following some conversations help at the OFA
meetings you kind of took back the concerns you raised regarding the
concept of the verbs level QP group which is used by this series to
implement RSS and TSS, can you acknoledge that?

Roland, this series is been around for about a year now, any feedback
or comments from your side that we need to address for it to get
accepted?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-23 Thread Hefty, Sean
 On Mon, Apr 22, 2013 at 7:46 PM, Or Gerlitz or.gerl...@gmail.com wrote:
  Sean, Tzahi -- I understand now that there have been few talkings @
  the OFA meeting re this patch set. So what's the way to move forward,
  any concrete feedback that needs to be addressed here?  This series is
  hanging since May 2012 and I'd like to see it gets in for 3.10, now if
  indeed Sean is OK with the general framework, please suggest.
 
 Sean,
 
 I understand that following some conversations help at the OFA
 meetings you kind of took back the concerns you raised regarding the
 concept of the verbs level QP group which is used by this series to
 implement RSS and TSS, can you acknoledge that?

No - I agree with the RSS/TSS concept.  That I've never had an issue with.  My 
issue is that the current verbs API appears to be a poor fit.  I don't have a 
good answer for an alternative.

Conceptually, RSS/TSS are a set of send/receive work queues all belonging to 
the same transport level address.  There's no parent-child relationship or 
needed pairing of send and receive queues together in order to form a group.

Personally, I'd like to see a way that better captures the notion of a 'set of 
work queues with the same address'.  For example, it makes more sense to me if 
a user created/destroyed the work queues together, and if the WQs were viewed 
as being in a single state (INIT, RTR, RTS...).

I'm just thinking out loud here, hoping that it spurs ideas, but if we added a 
call like:

struct ib_qp *ib_create_wq_array/set/group(...);

then added the ability to specify which WQ a specific send or receive should be 
posted to, it may do a better job of capturing RSS/TSS concepts, but still make 
use of the existing calls.  (Underneath this, the driver can allocate actual 
QPs  with sequential QPNs or whatever is required, but that's not exposed.)  
Obviously, I haven't thought through specifics.

I'll try to meet up with Diego and Tzahi tonight or tomorrow to discuss this 
further.

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-22 Thread Or Gerlitz
On Mon, Apr 15, 2013 at 4:21 PM, Or Gerlitz or.gerl...@gmail.com wrote:
 Actually these comments and questions on the series come just a week
 before the annual OFA gathering, personally, I will not be there nor
 Shlomo who is the author of the patches, but Tzahi Oved from Mellanox
 who lead the architecture for the QP group concept is planned to
 attend and same for Sean, Roland and I hope you (Ira) too, same for
 Ali Ayoub and Liran Liss from Mellanox who are attending too, all in
 all, nice quorum to get into a room and do white boarding, open
 discussion, laughing, yelling and what ever needed to get a consensus.
[...]

Sean, Tzahi -- I understand now that there have been few talkings @
the OFA meeting re this patch set. So what's the way to move forward,
any concrete feedback that needs to be addressed here?  This series is
hanging since May 2012 and I'd like to see it gets in for 3.10, now if
indeed Sean is OK with the general framework, please suggest.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups - suggesting BOF during OFA conf to further discuss that

2013-04-15 Thread Or Gerlitz
Weiny, Ira ira.we...@intel.com wrote:
 ow...@vger.kernel.org] On Behalf Of Or Gerlitz


 RSS child QPs are plain UD or RAW Packet QPs that only have consecutive
 QPNs which is common requirement of HW for configuring the RSS parent
 which in networking is called the RSS indirection or dispatching QP. You
 can send and receive on them.

 How do you ensure that the QPN's are consecutive?

Quoting from this patch change-log:

start A QP group is a set of QPs consists of a parent QP and two disjoint sets
of RSS and TSS QPs. The creation of a QP group is a two stage process:

In the the 1st stage, the parent QP is created.

In the 2nd stage the children QPs of the parent are created.

Each child QP indicates if its a RSS or TSS QP. Both the TSS
and RSS sets of QPs should have contiguous QP numbers. end

When the parent is created we (the driver) are being told by the
consumer (providing instance of struct  ib_qpg_init_attrib) how many
child QPs they would need, so we can internally act up front and make
sure there's a consecutive chain of QPNs reserved for that group.

 If an RSS child goes to the error state it will not receive data.

 If you transition it back to RTS would it start working again?

YES

 Could you remove it and add a new one?  (I guess not because the new QPN
 would likely not be consecutive.)

NO, its disallowed to destroy any of the child QPs as long as  the
parent is there, quoting from the change log:

start It is forbidden to modify parent QP state before all RSS/TSS children
were created. In the same manner it is disallowed to destroy the parent
QP unless all RSS/TSS children were destroyed. end

 Packets are routed to RSS childs only per the hash function output, not per
 the state of that child.

 So if the QP chosen by the hash is in error state the packets get lost?
 Above you said they would not receive data.

indeed, get lost, which means data will not be received, not sure I am
following what isn't aligned to what I said in that above comment.

Actually these comments and questions on the series come just a week
before the annual OFA gathering, personally, I will not be there nor
Shlomo who is the author of the patches, but Tzahi Oved from Mellanox
who lead the architecture for the QP group concept is planned to
attend and same for Sean, Roland and I hope you (Ira) too, same for
Ali Ayoub and Liran Liss from Mellanox who are attending too, all in
all, nice quorum to get into a room and do white boarding, open
discussion, laughing, yelling and what ever needed to get a consensus.

It would be good if a BOF would be set to discuss the QP groups
concept and how to proceed with getting the verbs layer to support
RSS/TSS so we can finally

1. embed them within the verbs language
2. support MQ/RSS in the IPoIB network driver

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html