[RFC] Generic InfiniBand transport done in software

2015-12-22 Thread Moni Shoua
Hi,

In the past months the need for a kernel module that implements the InfiniBand
transport in software and unify all the InfiniBand software drivers has
been raised. Since then, nobody has submitted any design proposal that satisfy
the initial thoughts and can serve various back-ends.

The following is a RFC that presents a solution made of a single
generic InfiniBand
driver and many hardware specific back-end drivers. The RFC defines
the requirements
and the interfaces that have to be part of any generic InfiniBand driver.
A generic InfiniBand driver that is not compliant with this RFC wouldn't be able
to serve different back-ends and therefore would miss its target.



A. Introduction

In Linux kernel, the responsibility to implement the InfiniBand protocol is
roughly divided between 2 elements. The first are the core drivers which expose
an abstract verbs interface to the upper layer as well as interfaces to some
common IB services like MAD, SA and CM. The second are vendor drivers and
hardware which implement the abstract verbs interface.

A high level view of the model is in Figure A1

 +-+
 | |
 |IB core  |
 |drivers  |
 | |
 +++
  | Common

  | Vendor
 +++
 | |
 |  Hardware   |
 |  drivers|
 | |
 +++
 | |
 |  Hardware   |
 | |
 +-+

A1 - IB implementation model in Linux kernel

In the vendor part of the model, the devision of work between software and
hardware is undefined and is usually one of the two below

- Context and logic are  managed in software. Hardware role is limited to
  lower layer protocols (depending on the link layer) and maybe some offloads
- Context and logic are managed in hardware while software role is to create
  or destroy a context in the hardware and gets notified when hardware reports
  about a completions tasks.

The following examples demonstrates the difference between the approaches above.

- Send flow: application calls post_send() with QP and a WR. In the software
  based approach the QP context is retrieved, the WR is parsed and a proper IB
  packet is formed to be put in hardware buffers. In hardware based approach the
  driver puts the WR in hardware buffers with a handle to the QP and hardware
  does the rest.
- Receive flow: a data packet is received and put in hardware buffers (assume
  that the destination is a RC QP). In software based approach the packet is
  handed to the driver. The driver retrieves the context by QPN, decides if
  ACK/NACK packet is required (if so, generates it) and decides if CQE is
  required on the CQ of the QP. In the other approach the hardware compares
  the packet to the state in the context, generates the ACK/NACK and raises an
  event to the driver that CQE is ready for reading.

Figure A2 illustrates hardware/software relationship in the implementation of a
IB transport solution in the software based approach.

  ++
  |  IB transport  |
  |  capable driver|
  ||
  |  +--+  |
  |  |Context (deep)|  |
  |  |  |  |
  |  +--+  |
  ||
  |  +--+  |
  |  | Logic (IB transport) |  |
  |  |  |  |
  |  +--+  |
  ||
  |  +--+  |
  |  | Interface|  |
  |  |  |  |
  |  +--+  |
  ++
|
|
  ++
  |  +--+  |
  |  | Interface|  |
  |  |  |  |
  |  +--+  |
  ||
  |  +--+  |
  |  | Context (shallow)|  |
  |  |  |  |
  |  +--+  |
  ||
  |  +--+  |
  |  | Logic (link layer)   |  |
  |  |  |  |
  |  +--+  |
  ||
  |  IB transport  |
  |  incapable 

Re: [RFC] Generic InfiniBand transport done in software

2015-12-22 Thread Dennis Dalessandro

On Tue, Dec 22, 2015 at 02:38:10PM +0200, Moni Shoua wrote:

Hi,

In the past months the need for a kernel module that implements the InfiniBand
transport in software and unify all the InfiniBand software drivers has
been raised. Since then, nobody has submitted any design proposal that satisfy
the initial thoughts and can serve various back-ends.

The following is a RFC that presents a solution made of a single
generic InfiniBand
driver and many hardware specific back-end drivers. The RFC defines
the requirements
and the interfaces that have to be part of any generic InfiniBand driver.
A generic InfiniBand driver that is not compliant with this RFC wouldn't be able
to serve different back-ends and therefore would miss its target.


This makes no mention of the already posted work which aims to consolidate 
the qib and hfi1 drivers verbs implementation. However it does seem to be 
mostly in line with what we have already presented for rdmavt and the 
direction the next set of patches is going in. Have you seen something in 
rdmavt that contradicts this? I have not seen the need to write a detailed 
RFC because it's pretty straight forward what needs to happen. Take the 
common parts of qib and hfi and move to another kmod. Don't do anything that 
prevents soft-roce from using the same infrastructure. I think we are 
accomplishing that and have been very open during the process.


Specifically this RFC does not capture much more detail beyond what has 
already been posted. There are also a number of details regarding hardware 
specifics that are not dealt with here but are in the current rdmavt patch 
series, as well as patches that are yet to be posted. The bottom line is we 
can't sacrifice performance for a perfect API.


Previous discussion threads:
http://marc.info/?l=linux-rdma&m=144353141420065&w=2
http://marc.info/?l=linux-rdma&m=144563342718705&w=2
http://marc.info/?l=linux-rdma&m=144614769419528&w=2

Rdmavt & Qib code submissions:
http://marc.info/?l=linux-rdma&m=144968107508268&w=2
http://marc.info/?l=linux-rdma&m=144952133926970&w=2
Feedback has been received on these patches and unless otherwise noted in 
the relevant threads will be incorporated into the V2 of the patch sets.


hfi1 RFC sent to staging list:
http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2015-December/082896.html

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-23 Thread Moni Shoua
>
>
> This makes no mention of the already posted work which aims to consolidate
> the qib and hfi1 drivers verbs implementation. However it does seem to be
> mostly in line with what we have already presented for rdmavt and the
> direction the next set of patches is going in. Have you seen something in
> rdmavt that contradicts this? I have not seen the need to write a detailed
> RFC because it's pretty straight forward what needs to happen. Take the
> common parts of qib and hfi and move to another kmod. Don't do anything that
> prevents soft-roce from using the same infrastructure. I think we are
> accomplishing that and have been very open during the process.
>
> Specifically this RFC does not capture much more detail beyond what has
> already been posted. There are also a number of details regarding hardware
> specifics that are not dealt with here but are in the current rdmavt patch
> series, as well as patches that are yet to be posted. The bottom line is we
> can't sacrifice performance for a perfect API.
>
> Previous discussion threads:
> http://marc.info/?l=linux-rdma&m=144353141420065&w=2
> http://marc.info/?l=linux-rdma&m=144563342718705&w=2
> http://marc.info/?l=linux-rdma&m=144614769419528&w=2
>
> Rdmavt & Qib code submissions:
> http://marc.info/?l=linux-rdma&m=144968107508268&w=2
> http://marc.info/?l=linux-rdma&m=144952133926970&w=2
> Feedback has been received on these patches and unless otherwise noted in
> the relevant threads will be incorporated into the V2 of the patch sets.
>

Hi Denny

http://marc.info/?l=linux-rdma&m=144614769419528&w=2 introduced a
concept (RVT) which was never followed by a discussion and never
agreed upon. Moreover,
http://marc.info/?l=linux-rdma&m=144952098726776&w=2 presents a work
that besides keeping the name RVT is far from the immature concept I
mentioned earlier and its scope was changed from general purpose
solution to Intel and HFI/QIB only. I therefore conclude that the
concept of RVT, as it was supposed to be, was abandoned.

Moni
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-23 Thread Dennis Dalessandro

On Wed, Dec 23, 2015 at 06:28:08PM +0200, Moni Shoua wrote:



This makes no mention of the already posted work which aims to consolidate
the qib and hfi1 drivers verbs implementation. However it does seem to be
mostly in line with what we have already presented for rdmavt and the
direction the next set of patches is going in. Have you seen something in
rdmavt that contradicts this? I have not seen the need to write a detailed
RFC because it's pretty straight forward what needs to happen. Take the
common parts of qib and hfi and move to another kmod. Don't do anything that
prevents soft-roce from using the same infrastructure. I think we are
accomplishing that and have been very open during the process.

Specifically this RFC does not capture much more detail beyond what has
already been posted. There are also a number of details regarding hardware
specifics that are not dealt with here but are in the current rdmavt patch
series, as well as patches that are yet to be posted. The bottom line is we
can't sacrifice performance for a perfect API.

Previous discussion threads:
http://marc.info/?l=linux-rdma&m=144353141420065&w=2
http://marc.info/?l=linux-rdma&m=144563342718705&w=2
http://marc.info/?l=linux-rdma&m=144614769419528&w=2

Rdmavt & Qib code submissions:
http://marc.info/?l=linux-rdma&m=144968107508268&w=2
http://marc.info/?l=linux-rdma&m=144952133926970&w=2
Feedback has been received on these patches and unless otherwise noted in
the relevant threads will be incorporated into the V2 of the patch sets.



Hi Denny

http://marc.info/?l=linux-rdma&m=144614769419528&w=2 introduced a
concept (RVT) which was never followed by a discussion and never
agreed upon. Moreover


There were discussions, and Mellanox even contributed code to the effort.  
See Kamal's patches in the patch set I provided.



http://marc.info/?l=linux-rdma&m=144952098726776&w=2 presents a work
that besides keeping the name RVT is far from the immature concept I
mentioned earlier and its scope was changed from general purpose
solution to Intel and HFI/QIB only. 


The scope has never changed. Our goal is, and has always been to remove the 
code duplication between qib and hfi1. We are doing that by way of rdmavt.  
It is limited in scope to Intel's drivers currently for what I hope are 
obvious reasons.


I think it makes sense that soft-roce be added as well and hope that 
Mellanox decides to contribute rather than reinventing the wheel.


Is there something in rdmavt that would not work for soft-roce, or is 
something fundamental missing? I have asked this a number of times and 
nothing has been raised so I assume there are no issues. If there are lets 
discuss them.


Reading through your RFC here, perhaps something like the multicast add
and delete is concerning?  This is something that is not really needed by 
qib and hfi1 but may be for soft-roce. All that means is soft-roce needs to 
provide it and it would be optional for qib and hfi1. The rdmavt 
architecture is flexible and allows exactly this.



I therefore conclude that the
concept of RVT, as it was supposed to be, was abandoned.


This is absolutely incorrect. As mentioned above, nothing has changed.

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-23 Thread ira.weiny
On Tue, Dec 22, 2015 at 02:38:10PM +0200, Moni Shoua wrote:
> Hi,
> 
> In the past months the need for a kernel module that implements the InfiniBand
> transport in software and unify all the InfiniBand software drivers has
> been raised. Since then, nobody has submitted any design proposal that satisfy
> the initial thoughts and can serve various back-ends.

I'm not sure I understand what you mean.  Denny has posted several high level
emails similar to this one and has asked for public feedback.  We have been
addressing all the feedback as it has come in and we continue to work toward
this common layer.

To that end we have been asking for you to identify any place you see an
incompatibility with the rdmavt layer for SoftRoCE.  Patches and comments to
rdmavt are encouraged and we have already incorporated patches from outside
Intel.

We are very sensitive to the fact that other drivers will want to use this
layer in the future.  But I don't think this layer is set in stone.  To
that end, any functionality which is specific to SoftRoCE should be added when
SoftRoCE is submitted to the kernel.  To date I have not seen any show stopper
objections except Sagi's comment on the lack of IB_WR_LOCAL_INV which we will
drop in the next submission.

Also, please be aware that we are being very careful with this work to not
sacrifice the performance of either qib or hfi1.  There are a couple of items
you mention below which seem to indicate you would like a more "pure"
separation of the driver.  I hope you understand that any work in this area
which affects our performance must be accounted for and may not result in as
"pure" a separation as you may like.  If that is a show stopper for SoftRoCE
lets work through the specific examples.

More inline.

> 
> The following is a RFC that presents a solution made of a single
> generic InfiniBand
> driver and many hardware specific back-end drivers. The RFC defines
> the requirements
> and the interfaces that have to be part of any generic InfiniBand driver.
> A generic InfiniBand driver that is not compliant with this RFC wouldn't be 
> able
> to serve different back-ends and therefore would miss its target.
> 
> 
> 
> A. Introduction
> 
> In Linux kernel, the responsibility to implement the InfiniBand protocol is
> roughly divided between 2 elements. The first are the core drivers which 
> expose
> an abstract verbs interface to the upper layer as well as interfaces to some
> common IB services like MAD, SA and CM. The second are vendor drivers and
> hardware which implement the abstract verbs interface.
> 
> A high level view of the model is in Figure A1
> 
>  +-+
>  | |
>  |IB core  |
>  |drivers  |
>  | |
>  +++
>   | Common
> 
>   | Vendor
>  +++
>  | |
>  |  Hardware   |
>  |  drivers|
>  | |
>  +++
>  | |
>  |  Hardware   |
>  | |
>  +-+
> 
> A1 - IB implementation model in Linux kernel
> 
> In the vendor part of the model, the devision of work between software and
> hardware is undefined and is usually one of the two below
> 
> - Context and logic are  managed in software. Hardware role is limited to
>   lower layer protocols (depending on the link layer) and maybe some offloads
> - Context and logic are managed in hardware while software role is to create
>   or destroy a context in the hardware and gets notified when hardware reports
>   about a completions tasks.
> 
> The following examples demonstrates the difference between the approaches 
> above.
> 
> - Send flow: application calls post_send() with QP and a WR. In the software
>   based approach the QP context is retrieved, the WR is parsed and a proper IB
>   packet is formed to be put in hardware buffers. In hardware based approach 
> the
>   driver puts the WR in hardware buffers with a handle to the QP and hardware
>   does the rest.

I'm glad you brought this up but I'm a bit confused.  For anything that sends
more than a single packet I think this is an oversimplification.[1]  Do you
perhaps mean WQE rather than packet?

If you are referring to a WQE we were thinking the same thing but have not
gotten the send code posted yet.

[1] For example a 1MB RDMA write will require many packets.


> - Receive flow: a data packet is received and put in hardware buffers (assume
>   that the destination is a RC QP). In software based approach the packet is
>   handed to the driver. The driver retrieves the context by QPN, 

Re: [RFC] Generic InfiniBand transport done in software

2015-12-24 Thread Moni Shoua
>
>
> There were discussions, and Mellanox even contributed code to the effort.
> See Kamal's patches in the patch set I provided.
>
As far as I see it discussions were shallow and never produced an
agreement. Kamal's patches should not be considered as as such.

>> http://marc.info/?l=linux-rdma&m=144952098726776&w=2 presents a work
>> that besides keeping the name RVT is far from the immature concept I
>> mentioned earlier and its scope was changed from general purpose
>> solution to Intel and HFI/QIB only.
>
>
> The scope has never changed. Our goal is, and has always been to remove the
> code duplication between qib and hfi1. We are doing that by way of rdmavt.
> It is limited in scope to Intel's drivers currently for what I hope are
> obvious reasons.
>
So you actually agree that rdmavt was intended to be a solution to
Intel's specific drivers.
Fair, but IMO this is not what we aimed for.
In fact, if this is an Intel specific solution then why put it in
drivers/infiniband/sw and why publish it when it is not ready?

> I think it makes sense that soft-roce be added as well and hope that
> Mellanox decides to contribute rather than reinventing the wheel.
>
> Is there something in rdmavt that would not work for soft-roce, or is
> something fundamental missing? I have asked this a number of times and
> nothing has been raised so I assume there are no issues. If there are lets
> discuss them.
>
Interfaces between rdmavt and its backends are missing. I consider
this as fundamental.
Concerns were raised but answers were not provided, at least not
satisfying answers.

> Reading through your RFC here, perhaps something like the multicast add
> and delete is concerning?  This is something that is not really needed by
> qib and hfi1 but may be for soft-roce. All that means is soft-roce needs to
> provide it and it would be optional for qib and hfi1. The rdmavt
> architecture is flexible and allows exactly this.
>
>> I therefore conclude that the
>> concept of RVT, as it was supposed to be, was abandoned.
>
>
> This is absolutely incorrect. As mentioned above, nothing has changed.
>
>
> -Denny
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-24 Thread Moni Shoua
> I'm not sure I understand what you mean.  Denny has posted several high level
> emails similar to this one and has asked for public feedback.  We have been
> addressing all the feedback as it has come in and we continue to work toward
> this common layer.
>
> To that end we have been asking for you to identify any place you see an
> incompatibility with the rdmavt layer for SoftRoCE.  Patches and comments to
> rdmavt are encouraged and we have already incorporated patches from outside
> Intel.
>
I really don't know how to answer to that because I don't expect that
the stub interface to last and if it does, how the problem of code
duplication is going to be solved. When I know how interfaces look
like plus some other documentation I will be able to answer this.

> We are very sensitive to the fact that other drivers will want to use this
> layer in the future.  But I don't think this layer is set in stone.  To
> that end, any functionality which is specific to SoftRoCE should be added when
> SoftRoCE is submitted to the kernel.  To date I have not seen any show stopper
> objections except Sagi's comment on the lack of IB_WR_LOCAL_INV which we will
> drop in the next submission.
>
SoftRoCE was submitted today to staging. Now, if I want to take it out
of staging by using rdmavt I actually can't unless I'll use the stubs
and watch for rdmavt changes. This is unacceptable. Missing final
interfaces and documentation of how to use them is a show stopper IMO

> Also, please be aware that we are being very careful with this work to not
> sacrifice the performance of either qib or hfi1.  There are a couple of items
> you mention below which seem to indicate you would like a more "pure"
> separation of the driver.  I hope you understand that any work in this area
> which affects our performance must be accounted for and may not result in as
> "pure" a separation as you may like.  If that is a show stopper for SoftRoCE
> lets work through the specific examples.
Eventually you'll have to break WR to packets of MTU size, wouldn't you?
Anyway, this is what we had to finalize in the early discussion and
not after code posting

Ira,
I've read your comments to the RFC and I find them worth a thought
before I answer.
In general, finding a good abstraction of the back-end is the hardest
thing in this project and affects how the interface looks like.
I think that most if not all your comments fall in the category of abstraction.


>
> More inline.
>
>>
>> The following is a RFC that presents a solution made of a single
>> generic InfiniBand
>> driver and many hardware specific back-end drivers. The RFC defines
>> the requirements
>> and the interfaces that have to be part of any generic InfiniBand driver.
>> A generic InfiniBand driver that is not compliant with this RFC wouldn't be 
>> able
>> to serve different back-ends and therefore would miss its target.
>>
>> 
>>
>> A. Introduction
>> 
>> In Linux kernel, the responsibility to implement the InfiniBand protocol is
>> roughly divided between 2 elements. The first are the core drivers which 
>> expose
>> an abstract verbs interface to the upper layer as well as interfaces to some
>> common IB services like MAD, SA and CM. The second are vendor drivers and
>> hardware which implement the abstract verbs interface.
>>
>> A high level view of the model is in Figure A1
>>
>>  +-+
>>  | |
>>  |IB core  |
>>  |drivers  |
>>  | |
>>  +++
>>   | Common
>> 
>>   | Vendor
>>  +++
>>  | |
>>  |  Hardware   |
>>  |  drivers|
>>  | |
>>  +++
>>  | |
>>  |  Hardware   |
>>  | |
>>  +-+
>>
>> A1 - IB implementation model in Linux kernel
>>
>> In the vendor part of the model, the devision of work between software and
>> hardware is undefined and is usually one of the two below
>>
>> - Context and logic are  managed in software. Hardware role is limited to
>>   lower layer protocols (depending on the link layer) and maybe some offloads
>> - Context and logic are managed in hardware while software role is to create
>>   or destroy a context in the hardware and gets notified when hardware 
>> reports
>>   about a completions tasks.
>>
>> The following examples demonstrates the difference between the approaches 
>> above.
>>
>> - Send flow: application calls post_send() with QP and a WR. In the software
>>   based approach the QP context is retrieved, the WR is parsed and a proper 
>> IB
>>   packet 

Re: [RFC] Generic InfiniBand transport done in software

2015-12-24 Thread Dennis Dalessandro

On Thu, Dec 24, 2015 at 05:43:11PM +0200, Moni Shoua wrote:



There were discussions, and Mellanox even contributed code to the effort.
See Kamal's patches in the patch set I provided.


As far as I see it discussions were shallow and never produced an
agreement. Kamal's patches should not be considered as as such.


Point is others have looked at the code. No issues have been called out to 
date as to why what is there won't work for everyone.



http://marc.info/?l=linux-rdma&m=144952098726776&w=2 presents a work
that besides keeping the name RVT is far from the immature concept I
mentioned earlier and its scope was changed from general purpose
solution to Intel and HFI/QIB only.



The scope has never changed. Our goal is, and has always been to remove the
code duplication between qib and hfi1. We are doing that by way of rdmavt.
It is limited in scope to Intel's drivers currently for what I hope are
obvious reasons.


So you actually agree that rdmavt was intended to be a solution to
Intel's specific drivers.
Fair, but IMO this is not what we aimed for.
In fact, if this is an Intel specific solution then why put it in
drivers/infiniband/sw and why publish it when it is not ready?


Yes it is specific to Intel *now*, that doesn't mean it should stay that 
way. Rdmavt could, and in my opinion should, be extended to support 
soft-roce. I don't think replicating the same thing is a great idea.


As to the location, where do you think it should go. drivers/infiniband/sw 
makes the most sense to me, but open to suggestions.


And for the question of why publish when it's not ready, the better question 
is why not?  Is it not good to see the work in progress as it evolves so the 
community can provide feedback?



I think it makes sense that soft-roce be added as well and hope that
Mellanox decides to contribute rather than reinventing the wheel.

Is there something in rdmavt that would not work for soft-roce, or is
something fundamental missing? I have asked this a number of times and
nothing has been raised so I assume there are no issues. If there are lets
discuss them.


Interfaces between rdmavt and its backends are missing. I consider
this as fundamental.
Concerns were raised but answers were not provided, at least not
satisfying answers.


No one is arguing that. It is a work in progress and will get there. More 
details are in in Ira's response.


http://marc.info/?l=linux-rdma&m=145091253118395&w=2

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-27 Thread Moni Shoua
>
> Point is others have looked at the code. No issues have been called out to
> date as to why what is there won't work for everyone.
>
http://marc.info/?l=linux-rdma&m=144968107508268&w=2
Your answer to the send() issue is first an agreement with the comment
and later says that it can't be done because of how PIO and SDMA
(Intel specific implementation)
This is an example for a discussion that never ended with an agreement.


>
> Yes it is specific to Intel *now*, that doesn't mean it should stay that
> way. Rdmavt could, and in my opinion should, be extended to support
> soft-roce. I don't think replicating the same thing is a great idea.
>
But you post *now* a so called generic driver so it must now fit any
possible driver (including Soft RoCE)
> As to the location, where do you think it should go. drivers/infiniband/sw
> makes the most sense to me, but open to suggestions.
>
> And for the question of why publish when it's not ready, the better question
> is why not?  Is it not good to see the work in progress as it evolves so the
> community can provide feedback?
>
What kind of a feedback you expect when I don't have an idea about
your plans for rdmavt
Interfaces, flows, data structures... all is missing from the
documentation to rdmavt.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-28 Thread Dennis Dalessandro

On Sun, Dec 27, 2015 at 07:54:46PM +0200, Moni Shoua wrote:


Point is others have looked at the code. No issues have been called out to
date as to why what is there won't work for everyone.


http://marc.info/?l=linux-rdma&m=144968107508268&w=2
Your answer to the send() issue is first an agreement with the comment
and later says that it can't be done because of how PIO and SDMA
(Intel specific implementation)
This is an example for a discussion that never ended with an agreement.


No. PIO and SDMA is driver specific and lives in the driver. Rdmavt has no 
concept of this. I'm agreeing that the send will be generic and have no hw 
specific stuff.



Yes it is specific to Intel *now*, that doesn't mean it should stay that
way. Rdmavt could, and in my opinion should, be extended to support
soft-roce. I don't think replicating the same thing is a great idea.


But you post *now* a so called generic driver so it must now fit any
possible driver (including Soft RoCE)


As I've stated a number of times across multiple threads: It must not do 
anything that would prevent another driver from using it.



As to the location, where do you think it should go. drivers/infiniband/sw
makes the most sense to me, but open to suggestions.

And for the question of why publish when it's not ready, the better question
is why not?  Is it not good to see the work in progress as it evolves so the
community can provide feedback?


What kind of a feedback you expect when I don't have an idea about
your plans for rdmavt
Interfaces, flows, data structures... all is missing from the
documentation to rdmavt.


I expect feedback based on the code submissions. More will be coming 
shortly. I have taken all the feedback from the first post and will be 
sending a v2 shortly.


-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-29 Thread Christoph Hellwig
Hi Moni,

On Sun, Dec 27, 2015 at 07:54:46PM +0200, Moni Shoua wrote:
> But you post *now* a so called generic driver so it must now fit any
> possible driver (including Soft RoCE)

it's never going to fit any possible future driver.  Dennis and folks
have done great work to move code outside the drivers into a shared
library.  So far it's been driven just by the Intel drivers as that's
the only thing they were interested in.

If you are interested in supporting SoftROCE please work with them
by adjusting the code towards your requirements.  In Linux we have
great results with iterative appoaches and I'd suggest you try it
as well.

> What kind of a feedback you expect when I don't have an idea about
> your plans for rdmavt
> Interfaces, flows, data structures... all is missing from the
> documentation to rdmavt.

You've got the code, so let's work based on that.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-29 Thread Moni Shoua
> it's never going to fit any possible future driver.  Dennis and folks
> have done great work to move code outside the drivers into a shared
> library.  So far it's been driven just by the Intel drivers as that's
> the only thing they were interested in.
>

If it's not going to be a solution for anything else but Intel then
why declare it as such?
Where is that shared library? There amount of shared code in rdmavt
that can be considered as shared is very little.

> If you are interested in supporting SoftROCE please work with them
> by adjusting the code towards your requirements.  In Linux we have
> great results with iterative appoaches and I'd suggest you try it
> as well.
>
Exactly. All you asked for  is in the RFC I posted.

>
> You've got the code, so let's work based on that.
> --
I say let's agree on the interfaces and start writing code.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-29 Thread Moni Shoua
> No. PIO and SDMA is driver specific and lives in the driver. Rdmavt has no
> concept of this. I'm agreeing that the send will be generic and have no hw
> specific stuff.
>
I understand that PIO/SDMA are not a concept of RVT. However, making
the send from RVT to driver exactly as the interface from ib_core to
RVT raises the question: What exactly do we achieve by this?
>
>
> As I've stated a number of times across multiple threads: It must not do
> anything that would prevent another driver from using it.
>
The question is not how Soft RoCE fits into this framework but how
does this framework achieve its goals.

>
>
> I expect feedback based on the code submissions. More will be coming
> shortly. I have taken all the feedback from the first post and will be
> sending a v2 shortly.
>
Again, I have no idea about the complete interfaces between both
pieces of the suggested solution.
- If you have them then please publish
- if you don't but plan to have them then why did you submit a half baked idea
- If you say that final interface is what we see now then I say that
the problem of code duplication isn't going to be resolved
So, what it is from the 3?

>
> -Denny
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-29 Thread Doug Ledford
On 12/27/2015 12:54 PM, Moni Shoua wrote:

>> Yes it is specific to Intel *now*, that doesn't mean it should stay that
>> way. Rdmavt could, and in my opinion should, be extended to support
>> soft-roce. I don't think replicating the same thing is a great idea.
>>
> But you post *now* a so called generic driver so it must now fit any
> possible driver (including Soft RoCE)

This is incorrect.  This isn't some public API that we are exporting to
user space.  Nor is it an API that out of tree drivers are using.  This
is a purely kernel internal API for use by a limited number of drivers.
 As such, it need not be finalized before it is submitted or used.  It
can be taken one piece at a time, and if, at some point, it is
determined that there are shortcomings to the API, it can be updated in
place with all of the drivers that use it in a single patch or patch
series.  So a finalized design prior to putting code in place is
specifically *not* needed.

>> As to the location, where do you think it should go. drivers/infiniband/sw
>> makes the most sense to me, but open to suggestions.
>>
>> And for the question of why publish when it's not ready, the better question
>> is why not?  Is it not good to see the work in progress as it evolves so the
>> community can provide feedback?
>>
> What kind of a feedback you expect when I don't have an idea about
> your plans for rdmavt
> Interfaces, flows, data structures... all is missing from the
> documentation to rdmavt.

They released it so that you can start hooking SoftRoCE into it.  As you
hook it in, if it needs changes to work with SoftRoCE, simply make the
changes needed and move on.

I think Dennis' point, and I agree with him, is that you are over
complicating the issue here.  This need not be a highly designed item,
it needs to be a functional item, and we can build it as we go.  If you
have to make changes to rdmavt in order to hook up SoftRoCE, that's
fine, post them to the list, they will get reviewed.  As long as the
change doesn't break or otherwise negatively impact qib and/or hfi1,
then it should be fine.  If it does, then I'm sure Intel will work with
you to find a solution that doesn't negatively impact them.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [RFC] Generic InfiniBand transport done in software

2015-12-29 Thread Moni Shoua
I think that my point is missed. See my answers inline


> This is incorrect.  This isn't some public API that we are exporting to
> user space.  Nor is it an API that out of tree drivers are using.  This
> is a purely kernel internal API for use by a limited number of drivers.
>  As such, it need not be finalized before it is submitted or used.  It
> can be taken one piece at a time, and if, at some point, it is
> determined that there are shortcomings to the API, it can be updated in
> place with all of the drivers that use it in a single patch or patch
> series.  So a finalized design prior to putting code in place is
> specifically *not* needed.
>
This is not a question of future backward comparability where
interfaces must be kept forever. I agree that kernel interfaces may be
changed with kernel moving forward. However, this is not what I'm
arguing against.

When one submits a RFC for a generic Infrastructure he must state what
are the interfaces between blocks of the design.
Soft RoCE block can't start until I know how the final interfaces look
like. This is an unacceptable method of work.

>
> They released it so that you can start hooking SoftRoCE into it.  As you
> hook it in, if it needs changes to work with SoftRoCE, simply make the
> changes needed and move on.

This is not a question if I can hook Soft RoCE driver into this framework.
In fact, I can't think of an IB driver that can't use this framework. What this
framework offers is just another hop from ib_core the real driver.
Where is the removal of duplicated code? This is a list of functions
that for now
must be implemented in the low level driver.

create_cq
destroy_cq
poll_cq
req_notify_cq
resize_cq
create_srq
modify_srq
destroy_srq
query_srq
create_qp
query_device
query_gid
alloc_ucontext
modify_device
modify_qp
dealloc_ucontext
query_port
destroy_qp
get_port_immutable
modify_port
query_qp
post_send
post_recv
post_srq_recv

Most if not all of them have common part in all drivers.
What are the plans to get rid of them? When?
Don't you think that this should be known in advance?

I already asked and never been answered seriously: what was
the purpose of the submission in this premature state of the code
It can't be for feedback because what kind of feedback can you provide
for just a skeleton? Moreover, today they submitted V2 with a changelog
that is almost 100% cosmetic changes. I really don't understand this kind
of work.




>
> I think Dennis' point, and I agree with him, is that you are over
> complicating the issue here.  This need not be a highly designed item,
> it needs to be a functional item, and we can build it as we go.  If you
> have to make changes to rdmavt in order to hook up SoftRoCE, that's
> fine, post them to the list, they will get reviewed.  As long as the
> change doesn't break or otherwise negatively impact qib and/or hfi1,
> then it should be fine.  If it does, then I'm sure Intel will work with
> you to find a solution that doesn't negatively impact them.

A reminder of what the initial goal was - remove code duplicates between
all IB transport drivers. This goal is complicated and in my RFC I explained
why. So, for start, I am not complicating anything that was simple before.

Second, what you are saying here is actually: "this is a project to serves
Intel's needs". So why treat it as a generic infrastructure? I'm not aiming to
hurt performance but Intel should aim for achieving the goals we agreed on
in the begging.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-29 Thread Dennis Dalessandro

On Tue, Dec 29, 2015 at 07:38:30PM +0200, Moni Shoua wrote:

This is not a question if I can hook Soft RoCE driver into this framework.
In fact, I can't think of an IB driver that can't use this framework. What 
this

framework offers is just another hop from ib_core the real driver.
Where is the removal of duplicated code? This is a list of functions
that for now
must be implemented in the low level driver.

create_cq
destroy_cq
poll_cq
req_notify_cq
resize_cq
create_srq
modify_srq
destroy_srq
query_srq
create_qp
query_device
query_gid
alloc_ucontext
modify_device
modify_qp
dealloc_ucontext
query_port
destroy_qp
get_port_immutable
modify_port
query_qp
post_send
post_recv
post_srq_recv

Most if not all of them have common part in all drivers.
What are the plans to get rid of them? When?
Don't you think that this should be known in advance?


We have patch sets that implement all of these which will be posted soon.  
With the holidays things have just been a bit slow to come out.


-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html