[RFC] Generic InfiniBand transport done in software
Hi, In the past months the need for a kernel module that implements the InfiniBand transport in software and unify all the InfiniBand software drivers has been raised. Since then, nobody has submitted any design proposal that satisfy the initial thoughts and can serve various back-ends. The following is a RFC that presents a solution made of a single generic InfiniBand driver and many hardware specific back-end drivers. The RFC defines the requirements and the interfaces that have to be part of any generic InfiniBand driver. A generic InfiniBand driver that is not compliant with this RFC wouldn't be able to serve different back-ends and therefore would miss its target. A. Introduction In Linux kernel, the responsibility to implement the InfiniBand protocol is roughly divided between 2 elements. The first are the core drivers which expose an abstract verbs interface to the upper layer as well as interfaces to some common IB services like MAD, SA and CM. The second are vendor drivers and hardware which implement the abstract verbs interface. A high level view of the model is in Figure A1 +-+ | | |IB core | |drivers | | | +++ | Common | Vendor +++ | | | Hardware | | drivers| | | +++ | | | Hardware | | | +-+ A1 - IB implementation model in Linux kernel In the vendor part of the model, the devision of work between software and hardware is undefined and is usually one of the two below - Context and logic are managed in software. Hardware role is limited to lower layer protocols (depending on the link layer) and maybe some offloads - Context and logic are managed in hardware while software role is to create or destroy a context in the hardware and gets notified when hardware reports about a completions tasks. The following examples demonstrates the difference between the approaches above. - Send flow: application calls post_send() with QP and a WR. In the software based approach the QP context is retrieved, the WR is parsed and a proper IB packet is formed to be put in hardware buffers. In hardware based approach the driver puts the WR in hardware buffers with a handle to the QP and hardware does the rest. - Receive flow: a data packet is received and put in hardware buffers (assume that the destination is a RC QP). In software based approach the packet is handed to the driver. The driver retrieves the context by QPN, decides if ACK/NACK packet is required (if so, generates it) and decides if CQE is required on the CQ of the QP. In the other approach the hardware compares the packet to the state in the context, generates the ACK/NACK and raises an event to the driver that CQE is ready for reading. Figure A2 illustrates hardware/software relationship in the implementation of a IB transport solution in the software based approach. ++ | IB transport | | capable driver| || | +--+ | | |Context (deep)| | | | | | | +--+ | || | +--+ | | | Logic (IB transport) | | | | | | | +--+ | || | +--+ | | | Interface| | | | | | | +--+ | ++ | | ++ | +--+ | | | Interface| | | | | | | +--+ | || | +--+ | | | Context (shallow)| | | | | | | +--+ | || | +--+ | | | Logic (link layer) | | | | | | | +--+ | || | IB transport | | incapable
Re: [RFC] Generic InfiniBand transport done in software
On Tue, Dec 22, 2015 at 02:38:10PM +0200, Moni Shoua wrote: Hi, In the past months the need for a kernel module that implements the InfiniBand transport in software and unify all the InfiniBand software drivers has been raised. Since then, nobody has submitted any design proposal that satisfy the initial thoughts and can serve various back-ends. The following is a RFC that presents a solution made of a single generic InfiniBand driver and many hardware specific back-end drivers. The RFC defines the requirements and the interfaces that have to be part of any generic InfiniBand driver. A generic InfiniBand driver that is not compliant with this RFC wouldn't be able to serve different back-ends and therefore would miss its target. This makes no mention of the already posted work which aims to consolidate the qib and hfi1 drivers verbs implementation. However it does seem to be mostly in line with what we have already presented for rdmavt and the direction the next set of patches is going in. Have you seen something in rdmavt that contradicts this? I have not seen the need to write a detailed RFC because it's pretty straight forward what needs to happen. Take the common parts of qib and hfi and move to another kmod. Don't do anything that prevents soft-roce from using the same infrastructure. I think we are accomplishing that and have been very open during the process. Specifically this RFC does not capture much more detail beyond what has already been posted. There are also a number of details regarding hardware specifics that are not dealt with here but are in the current rdmavt patch series, as well as patches that are yet to be posted. The bottom line is we can't sacrifice performance for a perfect API. Previous discussion threads: http://marc.info/?l=linux-rdma&m=144353141420065&w=2 http://marc.info/?l=linux-rdma&m=144563342718705&w=2 http://marc.info/?l=linux-rdma&m=144614769419528&w=2 Rdmavt & Qib code submissions: http://marc.info/?l=linux-rdma&m=144968107508268&w=2 http://marc.info/?l=linux-rdma&m=144952133926970&w=2 Feedback has been received on these patches and unless otherwise noted in the relevant threads will be incorporated into the V2 of the patch sets. hfi1 RFC sent to staging list: http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2015-December/082896.html -Denny -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
> > > This makes no mention of the already posted work which aims to consolidate > the qib and hfi1 drivers verbs implementation. However it does seem to be > mostly in line with what we have already presented for rdmavt and the > direction the next set of patches is going in. Have you seen something in > rdmavt that contradicts this? I have not seen the need to write a detailed > RFC because it's pretty straight forward what needs to happen. Take the > common parts of qib and hfi and move to another kmod. Don't do anything that > prevents soft-roce from using the same infrastructure. I think we are > accomplishing that and have been very open during the process. > > Specifically this RFC does not capture much more detail beyond what has > already been posted. There are also a number of details regarding hardware > specifics that are not dealt with here but are in the current rdmavt patch > series, as well as patches that are yet to be posted. The bottom line is we > can't sacrifice performance for a perfect API. > > Previous discussion threads: > http://marc.info/?l=linux-rdma&m=144353141420065&w=2 > http://marc.info/?l=linux-rdma&m=144563342718705&w=2 > http://marc.info/?l=linux-rdma&m=144614769419528&w=2 > > Rdmavt & Qib code submissions: > http://marc.info/?l=linux-rdma&m=144968107508268&w=2 > http://marc.info/?l=linux-rdma&m=144952133926970&w=2 > Feedback has been received on these patches and unless otherwise noted in > the relevant threads will be incorporated into the V2 of the patch sets. > Hi Denny http://marc.info/?l=linux-rdma&m=144614769419528&w=2 introduced a concept (RVT) which was never followed by a discussion and never agreed upon. Moreover, http://marc.info/?l=linux-rdma&m=144952098726776&w=2 presents a work that besides keeping the name RVT is far from the immature concept I mentioned earlier and its scope was changed from general purpose solution to Intel and HFI/QIB only. I therefore conclude that the concept of RVT, as it was supposed to be, was abandoned. Moni -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
On Wed, Dec 23, 2015 at 06:28:08PM +0200, Moni Shoua wrote: This makes no mention of the already posted work which aims to consolidate the qib and hfi1 drivers verbs implementation. However it does seem to be mostly in line with what we have already presented for rdmavt and the direction the next set of patches is going in. Have you seen something in rdmavt that contradicts this? I have not seen the need to write a detailed RFC because it's pretty straight forward what needs to happen. Take the common parts of qib and hfi and move to another kmod. Don't do anything that prevents soft-roce from using the same infrastructure. I think we are accomplishing that and have been very open during the process. Specifically this RFC does not capture much more detail beyond what has already been posted. There are also a number of details regarding hardware specifics that are not dealt with here but are in the current rdmavt patch series, as well as patches that are yet to be posted. The bottom line is we can't sacrifice performance for a perfect API. Previous discussion threads: http://marc.info/?l=linux-rdma&m=144353141420065&w=2 http://marc.info/?l=linux-rdma&m=144563342718705&w=2 http://marc.info/?l=linux-rdma&m=144614769419528&w=2 Rdmavt & Qib code submissions: http://marc.info/?l=linux-rdma&m=144968107508268&w=2 http://marc.info/?l=linux-rdma&m=144952133926970&w=2 Feedback has been received on these patches and unless otherwise noted in the relevant threads will be incorporated into the V2 of the patch sets. Hi Denny http://marc.info/?l=linux-rdma&m=144614769419528&w=2 introduced a concept (RVT) which was never followed by a discussion and never agreed upon. Moreover There were discussions, and Mellanox even contributed code to the effort. See Kamal's patches in the patch set I provided. http://marc.info/?l=linux-rdma&m=144952098726776&w=2 presents a work that besides keeping the name RVT is far from the immature concept I mentioned earlier and its scope was changed from general purpose solution to Intel and HFI/QIB only. The scope has never changed. Our goal is, and has always been to remove the code duplication between qib and hfi1. We are doing that by way of rdmavt. It is limited in scope to Intel's drivers currently for what I hope are obvious reasons. I think it makes sense that soft-roce be added as well and hope that Mellanox decides to contribute rather than reinventing the wheel. Is there something in rdmavt that would not work for soft-roce, or is something fundamental missing? I have asked this a number of times and nothing has been raised so I assume there are no issues. If there are lets discuss them. Reading through your RFC here, perhaps something like the multicast add and delete is concerning? This is something that is not really needed by qib and hfi1 but may be for soft-roce. All that means is soft-roce needs to provide it and it would be optional for qib and hfi1. The rdmavt architecture is flexible and allows exactly this. I therefore conclude that the concept of RVT, as it was supposed to be, was abandoned. This is absolutely incorrect. As mentioned above, nothing has changed. -Denny -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
On Tue, Dec 22, 2015 at 02:38:10PM +0200, Moni Shoua wrote: > Hi, > > In the past months the need for a kernel module that implements the InfiniBand > transport in software and unify all the InfiniBand software drivers has > been raised. Since then, nobody has submitted any design proposal that satisfy > the initial thoughts and can serve various back-ends. I'm not sure I understand what you mean. Denny has posted several high level emails similar to this one and has asked for public feedback. We have been addressing all the feedback as it has come in and we continue to work toward this common layer. To that end we have been asking for you to identify any place you see an incompatibility with the rdmavt layer for SoftRoCE. Patches and comments to rdmavt are encouraged and we have already incorporated patches from outside Intel. We are very sensitive to the fact that other drivers will want to use this layer in the future. But I don't think this layer is set in stone. To that end, any functionality which is specific to SoftRoCE should be added when SoftRoCE is submitted to the kernel. To date I have not seen any show stopper objections except Sagi's comment on the lack of IB_WR_LOCAL_INV which we will drop in the next submission. Also, please be aware that we are being very careful with this work to not sacrifice the performance of either qib or hfi1. There are a couple of items you mention below which seem to indicate you would like a more "pure" separation of the driver. I hope you understand that any work in this area which affects our performance must be accounted for and may not result in as "pure" a separation as you may like. If that is a show stopper for SoftRoCE lets work through the specific examples. More inline. > > The following is a RFC that presents a solution made of a single > generic InfiniBand > driver and many hardware specific back-end drivers. The RFC defines > the requirements > and the interfaces that have to be part of any generic InfiniBand driver. > A generic InfiniBand driver that is not compliant with this RFC wouldn't be > able > to serve different back-ends and therefore would miss its target. > > > > A. Introduction > > In Linux kernel, the responsibility to implement the InfiniBand protocol is > roughly divided between 2 elements. The first are the core drivers which > expose > an abstract verbs interface to the upper layer as well as interfaces to some > common IB services like MAD, SA and CM. The second are vendor drivers and > hardware which implement the abstract verbs interface. > > A high level view of the model is in Figure A1 > > +-+ > | | > |IB core | > |drivers | > | | > +++ > | Common > > | Vendor > +++ > | | > | Hardware | > | drivers| > | | > +++ > | | > | Hardware | > | | > +-+ > > A1 - IB implementation model in Linux kernel > > In the vendor part of the model, the devision of work between software and > hardware is undefined and is usually one of the two below > > - Context and logic are managed in software. Hardware role is limited to > lower layer protocols (depending on the link layer) and maybe some offloads > - Context and logic are managed in hardware while software role is to create > or destroy a context in the hardware and gets notified when hardware reports > about a completions tasks. > > The following examples demonstrates the difference between the approaches > above. > > - Send flow: application calls post_send() with QP and a WR. In the software > based approach the QP context is retrieved, the WR is parsed and a proper IB > packet is formed to be put in hardware buffers. In hardware based approach > the > driver puts the WR in hardware buffers with a handle to the QP and hardware > does the rest. I'm glad you brought this up but I'm a bit confused. For anything that sends more than a single packet I think this is an oversimplification.[1] Do you perhaps mean WQE rather than packet? If you are referring to a WQE we were thinking the same thing but have not gotten the send code posted yet. [1] For example a 1MB RDMA write will require many packets. > - Receive flow: a data packet is received and put in hardware buffers (assume > that the destination is a RC QP). In software based approach the packet is > handed to the driver. The driver retrieves the context by QPN,
Re: [RFC] Generic InfiniBand transport done in software
> > > There were discussions, and Mellanox even contributed code to the effort. > See Kamal's patches in the patch set I provided. > As far as I see it discussions were shallow and never produced an agreement. Kamal's patches should not be considered as as such. >> http://marc.info/?l=linux-rdma&m=144952098726776&w=2 presents a work >> that besides keeping the name RVT is far from the immature concept I >> mentioned earlier and its scope was changed from general purpose >> solution to Intel and HFI/QIB only. > > > The scope has never changed. Our goal is, and has always been to remove the > code duplication between qib and hfi1. We are doing that by way of rdmavt. > It is limited in scope to Intel's drivers currently for what I hope are > obvious reasons. > So you actually agree that rdmavt was intended to be a solution to Intel's specific drivers. Fair, but IMO this is not what we aimed for. In fact, if this is an Intel specific solution then why put it in drivers/infiniband/sw and why publish it when it is not ready? > I think it makes sense that soft-roce be added as well and hope that > Mellanox decides to contribute rather than reinventing the wheel. > > Is there something in rdmavt that would not work for soft-roce, or is > something fundamental missing? I have asked this a number of times and > nothing has been raised so I assume there are no issues. If there are lets > discuss them. > Interfaces between rdmavt and its backends are missing. I consider this as fundamental. Concerns were raised but answers were not provided, at least not satisfying answers. > Reading through your RFC here, perhaps something like the multicast add > and delete is concerning? This is something that is not really needed by > qib and hfi1 but may be for soft-roce. All that means is soft-roce needs to > provide it and it would be optional for qib and hfi1. The rdmavt > architecture is flexible and allows exactly this. > >> I therefore conclude that the >> concept of RVT, as it was supposed to be, was abandoned. > > > This is absolutely incorrect. As mentioned above, nothing has changed. > > > -Denny > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
> I'm not sure I understand what you mean. Denny has posted several high level > emails similar to this one and has asked for public feedback. We have been > addressing all the feedback as it has come in and we continue to work toward > this common layer. > > To that end we have been asking for you to identify any place you see an > incompatibility with the rdmavt layer for SoftRoCE. Patches and comments to > rdmavt are encouraged and we have already incorporated patches from outside > Intel. > I really don't know how to answer to that because I don't expect that the stub interface to last and if it does, how the problem of code duplication is going to be solved. When I know how interfaces look like plus some other documentation I will be able to answer this. > We are very sensitive to the fact that other drivers will want to use this > layer in the future. But I don't think this layer is set in stone. To > that end, any functionality which is specific to SoftRoCE should be added when > SoftRoCE is submitted to the kernel. To date I have not seen any show stopper > objections except Sagi's comment on the lack of IB_WR_LOCAL_INV which we will > drop in the next submission. > SoftRoCE was submitted today to staging. Now, if I want to take it out of staging by using rdmavt I actually can't unless I'll use the stubs and watch for rdmavt changes. This is unacceptable. Missing final interfaces and documentation of how to use them is a show stopper IMO > Also, please be aware that we are being very careful with this work to not > sacrifice the performance of either qib or hfi1. There are a couple of items > you mention below which seem to indicate you would like a more "pure" > separation of the driver. I hope you understand that any work in this area > which affects our performance must be accounted for and may not result in as > "pure" a separation as you may like. If that is a show stopper for SoftRoCE > lets work through the specific examples. Eventually you'll have to break WR to packets of MTU size, wouldn't you? Anyway, this is what we had to finalize in the early discussion and not after code posting Ira, I've read your comments to the RFC and I find them worth a thought before I answer. In general, finding a good abstraction of the back-end is the hardest thing in this project and affects how the interface looks like. I think that most if not all your comments fall in the category of abstraction. > > More inline. > >> >> The following is a RFC that presents a solution made of a single >> generic InfiniBand >> driver and many hardware specific back-end drivers. The RFC defines >> the requirements >> and the interfaces that have to be part of any generic InfiniBand driver. >> A generic InfiniBand driver that is not compliant with this RFC wouldn't be >> able >> to serve different back-ends and therefore would miss its target. >> >> >> >> A. Introduction >> >> In Linux kernel, the responsibility to implement the InfiniBand protocol is >> roughly divided between 2 elements. The first are the core drivers which >> expose >> an abstract verbs interface to the upper layer as well as interfaces to some >> common IB services like MAD, SA and CM. The second are vendor drivers and >> hardware which implement the abstract verbs interface. >> >> A high level view of the model is in Figure A1 >> >> +-+ >> | | >> |IB core | >> |drivers | >> | | >> +++ >> | Common >> >> | Vendor >> +++ >> | | >> | Hardware | >> | drivers| >> | | >> +++ >> | | >> | Hardware | >> | | >> +-+ >> >> A1 - IB implementation model in Linux kernel >> >> In the vendor part of the model, the devision of work between software and >> hardware is undefined and is usually one of the two below >> >> - Context and logic are managed in software. Hardware role is limited to >> lower layer protocols (depending on the link layer) and maybe some offloads >> - Context and logic are managed in hardware while software role is to create >> or destroy a context in the hardware and gets notified when hardware >> reports >> about a completions tasks. >> >> The following examples demonstrates the difference between the approaches >> above. >> >> - Send flow: application calls post_send() with QP and a WR. In the software >> based approach the QP context is retrieved, the WR is parsed and a proper >> IB >> packet
Re: [RFC] Generic InfiniBand transport done in software
On Thu, Dec 24, 2015 at 05:43:11PM +0200, Moni Shoua wrote: There were discussions, and Mellanox even contributed code to the effort. See Kamal's patches in the patch set I provided. As far as I see it discussions were shallow and never produced an agreement. Kamal's patches should not be considered as as such. Point is others have looked at the code. No issues have been called out to date as to why what is there won't work for everyone. http://marc.info/?l=linux-rdma&m=144952098726776&w=2 presents a work that besides keeping the name RVT is far from the immature concept I mentioned earlier and its scope was changed from general purpose solution to Intel and HFI/QIB only. The scope has never changed. Our goal is, and has always been to remove the code duplication between qib and hfi1. We are doing that by way of rdmavt. It is limited in scope to Intel's drivers currently for what I hope are obvious reasons. So you actually agree that rdmavt was intended to be a solution to Intel's specific drivers. Fair, but IMO this is not what we aimed for. In fact, if this is an Intel specific solution then why put it in drivers/infiniband/sw and why publish it when it is not ready? Yes it is specific to Intel *now*, that doesn't mean it should stay that way. Rdmavt could, and in my opinion should, be extended to support soft-roce. I don't think replicating the same thing is a great idea. As to the location, where do you think it should go. drivers/infiniband/sw makes the most sense to me, but open to suggestions. And for the question of why publish when it's not ready, the better question is why not? Is it not good to see the work in progress as it evolves so the community can provide feedback? I think it makes sense that soft-roce be added as well and hope that Mellanox decides to contribute rather than reinventing the wheel. Is there something in rdmavt that would not work for soft-roce, or is something fundamental missing? I have asked this a number of times and nothing has been raised so I assume there are no issues. If there are lets discuss them. Interfaces between rdmavt and its backends are missing. I consider this as fundamental. Concerns were raised but answers were not provided, at least not satisfying answers. No one is arguing that. It is a work in progress and will get there. More details are in in Ira's response. http://marc.info/?l=linux-rdma&m=145091253118395&w=2 -Denny -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
> > Point is others have looked at the code. No issues have been called out to > date as to why what is there won't work for everyone. > http://marc.info/?l=linux-rdma&m=144968107508268&w=2 Your answer to the send() issue is first an agreement with the comment and later says that it can't be done because of how PIO and SDMA (Intel specific implementation) This is an example for a discussion that never ended with an agreement. > > Yes it is specific to Intel *now*, that doesn't mean it should stay that > way. Rdmavt could, and in my opinion should, be extended to support > soft-roce. I don't think replicating the same thing is a great idea. > But you post *now* a so called generic driver so it must now fit any possible driver (including Soft RoCE) > As to the location, where do you think it should go. drivers/infiniband/sw > makes the most sense to me, but open to suggestions. > > And for the question of why publish when it's not ready, the better question > is why not? Is it not good to see the work in progress as it evolves so the > community can provide feedback? > What kind of a feedback you expect when I don't have an idea about your plans for rdmavt Interfaces, flows, data structures... all is missing from the documentation to rdmavt. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
On Sun, Dec 27, 2015 at 07:54:46PM +0200, Moni Shoua wrote: Point is others have looked at the code. No issues have been called out to date as to why what is there won't work for everyone. http://marc.info/?l=linux-rdma&m=144968107508268&w=2 Your answer to the send() issue is first an agreement with the comment and later says that it can't be done because of how PIO and SDMA (Intel specific implementation) This is an example for a discussion that never ended with an agreement. No. PIO and SDMA is driver specific and lives in the driver. Rdmavt has no concept of this. I'm agreeing that the send will be generic and have no hw specific stuff. Yes it is specific to Intel *now*, that doesn't mean it should stay that way. Rdmavt could, and in my opinion should, be extended to support soft-roce. I don't think replicating the same thing is a great idea. But you post *now* a so called generic driver so it must now fit any possible driver (including Soft RoCE) As I've stated a number of times across multiple threads: It must not do anything that would prevent another driver from using it. As to the location, where do you think it should go. drivers/infiniband/sw makes the most sense to me, but open to suggestions. And for the question of why publish when it's not ready, the better question is why not? Is it not good to see the work in progress as it evolves so the community can provide feedback? What kind of a feedback you expect when I don't have an idea about your plans for rdmavt Interfaces, flows, data structures... all is missing from the documentation to rdmavt. I expect feedback based on the code submissions. More will be coming shortly. I have taken all the feedback from the first post and will be sending a v2 shortly. -Denny -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
Hi Moni, On Sun, Dec 27, 2015 at 07:54:46PM +0200, Moni Shoua wrote: > But you post *now* a so called generic driver so it must now fit any > possible driver (including Soft RoCE) it's never going to fit any possible future driver. Dennis and folks have done great work to move code outside the drivers into a shared library. So far it's been driven just by the Intel drivers as that's the only thing they were interested in. If you are interested in supporting SoftROCE please work with them by adjusting the code towards your requirements. In Linux we have great results with iterative appoaches and I'd suggest you try it as well. > What kind of a feedback you expect when I don't have an idea about > your plans for rdmavt > Interfaces, flows, data structures... all is missing from the > documentation to rdmavt. You've got the code, so let's work based on that. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
> it's never going to fit any possible future driver. Dennis and folks > have done great work to move code outside the drivers into a shared > library. So far it's been driven just by the Intel drivers as that's > the only thing they were interested in. > If it's not going to be a solution for anything else but Intel then why declare it as such? Where is that shared library? There amount of shared code in rdmavt that can be considered as shared is very little. > If you are interested in supporting SoftROCE please work with them > by adjusting the code towards your requirements. In Linux we have > great results with iterative appoaches and I'd suggest you try it > as well. > Exactly. All you asked for is in the RFC I posted. > > You've got the code, so let's work based on that. > -- I say let's agree on the interfaces and start writing code. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
> No. PIO and SDMA is driver specific and lives in the driver. Rdmavt has no > concept of this. I'm agreeing that the send will be generic and have no hw > specific stuff. > I understand that PIO/SDMA are not a concept of RVT. However, making the send from RVT to driver exactly as the interface from ib_core to RVT raises the question: What exactly do we achieve by this? > > > As I've stated a number of times across multiple threads: It must not do > anything that would prevent another driver from using it. > The question is not how Soft RoCE fits into this framework but how does this framework achieve its goals. > > > I expect feedback based on the code submissions. More will be coming > shortly. I have taken all the feedback from the first post and will be > sending a v2 shortly. > Again, I have no idea about the complete interfaces between both pieces of the suggested solution. - If you have them then please publish - if you don't but plan to have them then why did you submit a half baked idea - If you say that final interface is what we see now then I say that the problem of code duplication isn't going to be resolved So, what it is from the 3? > > -Denny > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
On 12/27/2015 12:54 PM, Moni Shoua wrote: >> Yes it is specific to Intel *now*, that doesn't mean it should stay that >> way. Rdmavt could, and in my opinion should, be extended to support >> soft-roce. I don't think replicating the same thing is a great idea. >> > But you post *now* a so called generic driver so it must now fit any > possible driver (including Soft RoCE) This is incorrect. This isn't some public API that we are exporting to user space. Nor is it an API that out of tree drivers are using. This is a purely kernel internal API for use by a limited number of drivers. As such, it need not be finalized before it is submitted or used. It can be taken one piece at a time, and if, at some point, it is determined that there are shortcomings to the API, it can be updated in place with all of the drivers that use it in a single patch or patch series. So a finalized design prior to putting code in place is specifically *not* needed. >> As to the location, where do you think it should go. drivers/infiniband/sw >> makes the most sense to me, but open to suggestions. >> >> And for the question of why publish when it's not ready, the better question >> is why not? Is it not good to see the work in progress as it evolves so the >> community can provide feedback? >> > What kind of a feedback you expect when I don't have an idea about > your plans for rdmavt > Interfaces, flows, data structures... all is missing from the > documentation to rdmavt. They released it so that you can start hooking SoftRoCE into it. As you hook it in, if it needs changes to work with SoftRoCE, simply make the changes needed and move on. I think Dennis' point, and I agree with him, is that you are over complicating the issue here. This need not be a highly designed item, it needs to be a functional item, and we can build it as we go. If you have to make changes to rdmavt in order to hook up SoftRoCE, that's fine, post them to the list, they will get reviewed. As long as the change doesn't break or otherwise negatively impact qib and/or hfi1, then it should be fine. If it does, then I'm sure Intel will work with you to find a solution that doesn't negatively impact them. -- Doug Ledford GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [RFC] Generic InfiniBand transport done in software
I think that my point is missed. See my answers inline > This is incorrect. This isn't some public API that we are exporting to > user space. Nor is it an API that out of tree drivers are using. This > is a purely kernel internal API for use by a limited number of drivers. > As such, it need not be finalized before it is submitted or used. It > can be taken one piece at a time, and if, at some point, it is > determined that there are shortcomings to the API, it can be updated in > place with all of the drivers that use it in a single patch or patch > series. So a finalized design prior to putting code in place is > specifically *not* needed. > This is not a question of future backward comparability where interfaces must be kept forever. I agree that kernel interfaces may be changed with kernel moving forward. However, this is not what I'm arguing against. When one submits a RFC for a generic Infrastructure he must state what are the interfaces between blocks of the design. Soft RoCE block can't start until I know how the final interfaces look like. This is an unacceptable method of work. > > They released it so that you can start hooking SoftRoCE into it. As you > hook it in, if it needs changes to work with SoftRoCE, simply make the > changes needed and move on. This is not a question if I can hook Soft RoCE driver into this framework. In fact, I can't think of an IB driver that can't use this framework. What this framework offers is just another hop from ib_core the real driver. Where is the removal of duplicated code? This is a list of functions that for now must be implemented in the low level driver. create_cq destroy_cq poll_cq req_notify_cq resize_cq create_srq modify_srq destroy_srq query_srq create_qp query_device query_gid alloc_ucontext modify_device modify_qp dealloc_ucontext query_port destroy_qp get_port_immutable modify_port query_qp post_send post_recv post_srq_recv Most if not all of them have common part in all drivers. What are the plans to get rid of them? When? Don't you think that this should be known in advance? I already asked and never been answered seriously: what was the purpose of the submission in this premature state of the code It can't be for feedback because what kind of feedback can you provide for just a skeleton? Moreover, today they submitted V2 with a changelog that is almost 100% cosmetic changes. I really don't understand this kind of work. > > I think Dennis' point, and I agree with him, is that you are over > complicating the issue here. This need not be a highly designed item, > it needs to be a functional item, and we can build it as we go. If you > have to make changes to rdmavt in order to hook up SoftRoCE, that's > fine, post them to the list, they will get reviewed. As long as the > change doesn't break or otherwise negatively impact qib and/or hfi1, > then it should be fine. If it does, then I'm sure Intel will work with > you to find a solution that doesn't negatively impact them. A reminder of what the initial goal was - remove code duplicates between all IB transport drivers. This goal is complicated and in my RFC I explained why. So, for start, I am not complicating anything that was simple before. Second, what you are saying here is actually: "this is a project to serves Intel's needs". So why treat it as a generic infrastructure? I'm not aiming to hurt performance but Intel should aim for achieving the goals we agreed on in the begging. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Generic InfiniBand transport done in software
On Tue, Dec 29, 2015 at 07:38:30PM +0200, Moni Shoua wrote: This is not a question if I can hook Soft RoCE driver into this framework. In fact, I can't think of an IB driver that can't use this framework. What this framework offers is just another hop from ib_core the real driver. Where is the removal of duplicated code? This is a list of functions that for now must be implemented in the low level driver. create_cq destroy_cq poll_cq req_notify_cq resize_cq create_srq modify_srq destroy_srq query_srq create_qp query_device query_gid alloc_ucontext modify_device modify_qp dealloc_ucontext query_port destroy_qp get_port_immutable modify_port query_qp post_send post_recv post_srq_recv Most if not all of them have common part in all drivers. What are the plans to get rid of them? When? Don't you think that this should be known in advance? We have patch sets that implement all of these which will be posted soon. With the holidays things have just been a bit slow to come out. -Denny -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html