Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Doug, Leon has finished review as well in [7]. Christoph Acked too in [8]. Can you please advise whether (1) I should rebase and resend PatchV12? (2) If so for which branch - master/4.9 or? Tejun and Christoph mentioned that it might be late for 4.9. Can we atleast merge to linux-rdma tree, so that more features/changes can be done on top of it? How can we avoid merge conflict to Linus since this patchset is applicable to two git trees. (cgroup and linux-rdma). I was thinking to push through linux-rdma as it is currently going through more changes, so resolving merge conflict would be simpler if that happens. Please provide the direction. [7] https://lkml.org/lkml/2016/10/5/134 [8] https://lkml.org/lkml/2016/10/5/30 Regards, Parav Pandit On Tue, Oct 4, 2016 at 11:49 PM, Parav Panditwrote: > Hi Doug, > > I am still waiting for Leon to provide his comments if any on rdma cgroup. > From other email context, he was on vacation last week. > While we wait for his comments, I wanted to know your view of this > patchset in 4.9 merge window. > > To summarize the discussion that happened in two threads. > > [1] Ack by Tejun, asking for review from rdma list > [2] quick review by Christoph on patch-v11 (patch 12 has only typo > corrections) > [3] Christoph's ack on architecture of rdma cgroup and fitting it with ABI > [4] My response on Matan's query on RSS indirection table > [5] Response from Intel on their driver support for Matan's query > [6] Christoph's point on architecture, which we are following in new > ABI and current ABI > > I have reviewed recent patch [7] from Matan where I see IB verbs > objects are still handled through common path as suggested by > Christoph. > > I do not see any issues with rdma cgroup patchset other than it requires > rebase. > Am I missing something? > Can you please help me - What would be required to merge it to 4.9? > > [1] https://lkml.org/lkml/2016/8/31/494 > [2] https://lkml.org/lkml/2016/8/25/146 > [3] https://lkml.org/lkml/2016/9/10/175 > [4] https://lkml.org/lkml/2016/9/14/221 > [5] https://lkml.org/lkml/2016/9/19/571 > [6] http://www.spinics.net/lists/linux-rdma/msg40337.html > [7] email subject: [RFC ABI V4 0/7] SG-based RDMA ABI Proposal > > Regards, > Parav Pandit > > On Wed, Sep 21, 2016 at 9:32 PM, Parav Pandit wrote: >> Hi Tejun, >> >> On Wed, Sep 21, 2016 at 7:56 PM, Tejun Heo wrote: >>> Hello, Parav. >>> >>> On Wed, Sep 21, 2016 at 10:13:38AM +0530, Parav Pandit wrote: We have completed review from Tejun, Christoph. HFI driver folks also provided feedback for Intel drivers. Matan's also doesn't have any more comments. If possible, if you can also review, it will be helpful. I have some more changes unrelated to cgroup in same files in both the git tree. Pushing them now either results into merge conflict later on for Doug/Tejun, or requires rebase and resending patch. If you can review, we can avoid such rework. >>> >>> My impression of the thread was that there doesn't seem to be enough >>> of consensus around how rdma resources should be defined. Is that >>> part agreed upon now? >>> >> >> We ended up discussing few points on different thread [1]. >> >> There was confusion on how some non-rdma/non-IB drivers would work >> with rdma cgroup from Matan. >> Christoph explained how they don't fit in the rdma subsystem and >> therefore its not prime target to addess. >> Intel driver maintainer Denny also acknowledged same on [2]. >> IB compliant drivers of Intel support rdma cgroup as explained in [2]. >> With that usnic and Intel psm drivers falls out of rdma cgroup support >> as they don't fit very well in the verbs definition. >> >> [1] https://www.spinics.net/lists/linux-rdma/msg40340.html >> [2] http://www.spinics.net/lists/linux-rdma/msg40717.html >> >> I will wait for Leon's review comments if he has different view on >> architecture. >> Back in April when I met face-to-face to Leon and Haggai, Leon was in >> support to have kernel defined the rdma resources as suggested by >> Christoph and Tejun instead of IB/RDMA subsystem. >> I will wait for his comments if his views have changed with new uAPI >> taking shape.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Doug, Leon has finished review as well in [7]. Christoph Acked too in [8]. Can you please advise whether (1) I should rebase and resend PatchV12? (2) If so for which branch - master/4.9 or? Tejun and Christoph mentioned that it might be late for 4.9. Can we atleast merge to linux-rdma tree, so that more features/changes can be done on top of it? How can we avoid merge conflict to Linus since this patchset is applicable to two git trees. (cgroup and linux-rdma). I was thinking to push through linux-rdma as it is currently going through more changes, so resolving merge conflict would be simpler if that happens. Please provide the direction. [7] https://lkml.org/lkml/2016/10/5/134 [8] https://lkml.org/lkml/2016/10/5/30 Regards, Parav Pandit On Tue, Oct 4, 2016 at 11:49 PM, Parav Pandit wrote: > Hi Doug, > > I am still waiting for Leon to provide his comments if any on rdma cgroup. > From other email context, he was on vacation last week. > While we wait for his comments, I wanted to know your view of this > patchset in 4.9 merge window. > > To summarize the discussion that happened in two threads. > > [1] Ack by Tejun, asking for review from rdma list > [2] quick review by Christoph on patch-v11 (patch 12 has only typo > corrections) > [3] Christoph's ack on architecture of rdma cgroup and fitting it with ABI > [4] My response on Matan's query on RSS indirection table > [5] Response from Intel on their driver support for Matan's query > [6] Christoph's point on architecture, which we are following in new > ABI and current ABI > > I have reviewed recent patch [7] from Matan where I see IB verbs > objects are still handled through common path as suggested by > Christoph. > > I do not see any issues with rdma cgroup patchset other than it requires > rebase. > Am I missing something? > Can you please help me - What would be required to merge it to 4.9? > > [1] https://lkml.org/lkml/2016/8/31/494 > [2] https://lkml.org/lkml/2016/8/25/146 > [3] https://lkml.org/lkml/2016/9/10/175 > [4] https://lkml.org/lkml/2016/9/14/221 > [5] https://lkml.org/lkml/2016/9/19/571 > [6] http://www.spinics.net/lists/linux-rdma/msg40337.html > [7] email subject: [RFC ABI V4 0/7] SG-based RDMA ABI Proposal > > Regards, > Parav Pandit > > On Wed, Sep 21, 2016 at 9:32 PM, Parav Pandit wrote: >> Hi Tejun, >> >> On Wed, Sep 21, 2016 at 7:56 PM, Tejun Heo wrote: >>> Hello, Parav. >>> >>> On Wed, Sep 21, 2016 at 10:13:38AM +0530, Parav Pandit wrote: We have completed review from Tejun, Christoph. HFI driver folks also provided feedback for Intel drivers. Matan's also doesn't have any more comments. If possible, if you can also review, it will be helpful. I have some more changes unrelated to cgroup in same files in both the git tree. Pushing them now either results into merge conflict later on for Doug/Tejun, or requires rebase and resending patch. If you can review, we can avoid such rework. >>> >>> My impression of the thread was that there doesn't seem to be enough >>> of consensus around how rdma resources should be defined. Is that >>> part agreed upon now? >>> >> >> We ended up discussing few points on different thread [1]. >> >> There was confusion on how some non-rdma/non-IB drivers would work >> with rdma cgroup from Matan. >> Christoph explained how they don't fit in the rdma subsystem and >> therefore its not prime target to addess. >> Intel driver maintainer Denny also acknowledged same on [2]. >> IB compliant drivers of Intel support rdma cgroup as explained in [2]. >> With that usnic and Intel psm drivers falls out of rdma cgroup support >> as they don't fit very well in the verbs definition. >> >> [1] https://www.spinics.net/lists/linux-rdma/msg40340.html >> [2] http://www.spinics.net/lists/linux-rdma/msg40717.html >> >> I will wait for Leon's review comments if he has different view on >> architecture. >> Back in April when I met face-to-face to Leon and Haggai, Leon was in >> support to have kernel defined the rdma resources as suggested by >> Christoph and Tejun instead of IB/RDMA subsystem. >> I will wait for his comments if his views have changed with new uAPI >> taking shape.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Christoph, On Wed, Oct 5, 2016 at 12:07 PM, Christoph Hellwigwrote: > FYI, the patches look fine to me: > > Acked-by: Christoph Hellwig > Thanks a lot for review. Parav
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Christoph, On Wed, Oct 5, 2016 at 12:07 PM, Christoph Hellwig wrote: > FYI, the patches look fine to me: > > Acked-by: Christoph Hellwig > Thanks a lot for review. Parav
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hello, On Wed, Oct 05, 2016 at 02:22:57PM +0300, Leon Romanovsky wrote: > On Wed, Oct 05, 2016 at 08:37:35AM +0200, Christoph Hellwig wrote: > > FYI, the patches look fine to me: > > > > Acked-by: Christoph Hellwig> > > > but we're past the merge window for 4.9 now unfortunately. > > IMHO, it still can make it. Most likely, we only have three / four days till rc1 opens, I think it's too late. Let's target the next one. Thanks. -- tejun
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hello, On Wed, Oct 05, 2016 at 02:22:57PM +0300, Leon Romanovsky wrote: > On Wed, Oct 05, 2016 at 08:37:35AM +0200, Christoph Hellwig wrote: > > FYI, the patches look fine to me: > > > > Acked-by: Christoph Hellwig > > > > but we're past the merge window for 4.9 now unfortunately. > > IMHO, it still can make it. Most likely, we only have three / four days till rc1 opens, I think it's too late. Let's target the next one. Thanks. -- tejun
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Oct 05, 2016 at 08:37:35AM +0200, Christoph Hellwig wrote: > FYI, the patches look fine to me: > > Acked-by: Christoph Hellwig> > but we're past the merge window for 4.9 now unfortunately. IMHO, it still can make it. Thanks signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Oct 05, 2016 at 08:37:35AM +0200, Christoph Hellwig wrote: > FYI, the patches look fine to me: > > Acked-by: Christoph Hellwig > > but we're past the merge window for 4.9 now unfortunately. IMHO, it still can make it. Thanks signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
FYI, the patches look fine to me: Acked-by: Christoph Hellwigbut we're past the merge window for 4.9 now unfortunately.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
FYI, the patches look fine to me: Acked-by: Christoph Hellwig but we're past the merge window for 4.9 now unfortunately.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Doug, I am still waiting for Leon to provide his comments if any on rdma cgroup. >From other email context, he was on vacation last week. While we wait for his comments, I wanted to know your view of this patchset in 4.9 merge window. To summarize the discussion that happened in two threads. [1] Ack by Tejun, asking for review from rdma list [2] quick review by Christoph on patch-v11 (patch 12 has only typo corrections) [3] Christoph's ack on architecture of rdma cgroup and fitting it with ABI [4] My response on Matan's query on RSS indirection table [5] Response from Intel on their driver support for Matan's query [6] Christoph's point on architecture, which we are following in new ABI and current ABI I have reviewed recent patch [7] from Matan where I see IB verbs objects are still handled through common path as suggested by Christoph. I do not see any issues with rdma cgroup patchset other than it requires rebase. Am I missing something? Can you please help me - What would be required to merge it to 4.9? [1] https://lkml.org/lkml/2016/8/31/494 [2] https://lkml.org/lkml/2016/8/25/146 [3] https://lkml.org/lkml/2016/9/10/175 [4] https://lkml.org/lkml/2016/9/14/221 [5] https://lkml.org/lkml/2016/9/19/571 [6] http://www.spinics.net/lists/linux-rdma/msg40337.html [7] email subject: [RFC ABI V4 0/7] SG-based RDMA ABI Proposal Regards, Parav Pandit On Wed, Sep 21, 2016 at 9:32 PM, Parav Panditwrote: > Hi Tejun, > > On Wed, Sep 21, 2016 at 7:56 PM, Tejun Heo wrote: >> Hello, Parav. >> >> On Wed, Sep 21, 2016 at 10:13:38AM +0530, Parav Pandit wrote: >>> We have completed review from Tejun, Christoph. >>> HFI driver folks also provided feedback for Intel drivers. >>> Matan's also doesn't have any more comments. >>> >>> If possible, if you can also review, it will be helpful. >>> >>> I have some more changes unrelated to cgroup in same files in both the git >>> tree. >>> Pushing them now either results into merge conflict later on for >>> Doug/Tejun, or requires rebase and resending patch. >>> If you can review, we can avoid such rework. >> >> My impression of the thread was that there doesn't seem to be enough >> of consensus around how rdma resources should be defined. Is that >> part agreed upon now? >> > > We ended up discussing few points on different thread [1]. > > There was confusion on how some non-rdma/non-IB drivers would work > with rdma cgroup from Matan. > Christoph explained how they don't fit in the rdma subsystem and > therefore its not prime target to addess. > Intel driver maintainer Denny also acknowledged same on [2]. > IB compliant drivers of Intel support rdma cgroup as explained in [2]. > With that usnic and Intel psm drivers falls out of rdma cgroup support > as they don't fit very well in the verbs definition. > > [1] https://www.spinics.net/lists/linux-rdma/msg40340.html > [2] http://www.spinics.net/lists/linux-rdma/msg40717.html > > I will wait for Leon's review comments if he has different view on > architecture. > Back in April when I met face-to-face to Leon and Haggai, Leon was in > support to have kernel defined the rdma resources as suggested by > Christoph and Tejun instead of IB/RDMA subsystem. > I will wait for his comments if his views have changed with new uAPI > taking shape.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Doug, I am still waiting for Leon to provide his comments if any on rdma cgroup. >From other email context, he was on vacation last week. While we wait for his comments, I wanted to know your view of this patchset in 4.9 merge window. To summarize the discussion that happened in two threads. [1] Ack by Tejun, asking for review from rdma list [2] quick review by Christoph on patch-v11 (patch 12 has only typo corrections) [3] Christoph's ack on architecture of rdma cgroup and fitting it with ABI [4] My response on Matan's query on RSS indirection table [5] Response from Intel on their driver support for Matan's query [6] Christoph's point on architecture, which we are following in new ABI and current ABI I have reviewed recent patch [7] from Matan where I see IB verbs objects are still handled through common path as suggested by Christoph. I do not see any issues with rdma cgroup patchset other than it requires rebase. Am I missing something? Can you please help me - What would be required to merge it to 4.9? [1] https://lkml.org/lkml/2016/8/31/494 [2] https://lkml.org/lkml/2016/8/25/146 [3] https://lkml.org/lkml/2016/9/10/175 [4] https://lkml.org/lkml/2016/9/14/221 [5] https://lkml.org/lkml/2016/9/19/571 [6] http://www.spinics.net/lists/linux-rdma/msg40337.html [7] email subject: [RFC ABI V4 0/7] SG-based RDMA ABI Proposal Regards, Parav Pandit On Wed, Sep 21, 2016 at 9:32 PM, Parav Pandit wrote: > Hi Tejun, > > On Wed, Sep 21, 2016 at 7:56 PM, Tejun Heo wrote: >> Hello, Parav. >> >> On Wed, Sep 21, 2016 at 10:13:38AM +0530, Parav Pandit wrote: >>> We have completed review from Tejun, Christoph. >>> HFI driver folks also provided feedback for Intel drivers. >>> Matan's also doesn't have any more comments. >>> >>> If possible, if you can also review, it will be helpful. >>> >>> I have some more changes unrelated to cgroup in same files in both the git >>> tree. >>> Pushing them now either results into merge conflict later on for >>> Doug/Tejun, or requires rebase and resending patch. >>> If you can review, we can avoid such rework. >> >> My impression of the thread was that there doesn't seem to be enough >> of consensus around how rdma resources should be defined. Is that >> part agreed upon now? >> > > We ended up discussing few points on different thread [1]. > > There was confusion on how some non-rdma/non-IB drivers would work > with rdma cgroup from Matan. > Christoph explained how they don't fit in the rdma subsystem and > therefore its not prime target to addess. > Intel driver maintainer Denny also acknowledged same on [2]. > IB compliant drivers of Intel support rdma cgroup as explained in [2]. > With that usnic and Intel psm drivers falls out of rdma cgroup support > as they don't fit very well in the verbs definition. > > [1] https://www.spinics.net/lists/linux-rdma/msg40340.html > [2] http://www.spinics.net/lists/linux-rdma/msg40717.html > > I will wait for Leon's review comments if he has different view on > architecture. > Back in April when I met face-to-face to Leon and Haggai, Leon was in > support to have kernel defined the rdma resources as suggested by > Christoph and Tejun instead of IB/RDMA subsystem. > I will wait for his comments if his views have changed with new uAPI > taking shape.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Tejun, On Wed, Sep 21, 2016 at 7:56 PM, Tejun Heowrote: > Hello, Parav. > > On Wed, Sep 21, 2016 at 10:13:38AM +0530, Parav Pandit wrote: >> We have completed review from Tejun, Christoph. >> HFI driver folks also provided feedback for Intel drivers. >> Matan's also doesn't have any more comments. >> >> If possible, if you can also review, it will be helpful. >> >> I have some more changes unrelated to cgroup in same files in both the git >> tree. >> Pushing them now either results into merge conflict later on for >> Doug/Tejun, or requires rebase and resending patch. >> If you can review, we can avoid such rework. > > My impression of the thread was that there doesn't seem to be enough > of consensus around how rdma resources should be defined. Is that > part agreed upon now? > We ended up discussing few points on different thread [1]. There was confusion on how some non-rdma/non-IB drivers would work with rdma cgroup from Matan. Christoph explained how they don't fit in the rdma subsystem and therefore its not prime target to addess. Intel driver maintainer Denny also acknowledged same on [2]. IB compliant drivers of Intel support rdma cgroup as explained in [2]. With that usnic and Intel psm drivers falls out of rdma cgroup support as they don't fit very well in the verbs definition. [1] https://www.spinics.net/lists/linux-rdma/msg40340.html [2] http://www.spinics.net/lists/linux-rdma/msg40717.html I will wait for Leon's review comments if he has different view on architecture. Back in April when I met face-to-face to Leon and Haggai, Leon was in support to have kernel defined the rdma resources as suggested by Christoph and Tejun instead of IB/RDMA subsystem. I will wait for his comments if his views have changed with new uAPI taking shape.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Tejun, On Wed, Sep 21, 2016 at 7:56 PM, Tejun Heo wrote: > Hello, Parav. > > On Wed, Sep 21, 2016 at 10:13:38AM +0530, Parav Pandit wrote: >> We have completed review from Tejun, Christoph. >> HFI driver folks also provided feedback for Intel drivers. >> Matan's also doesn't have any more comments. >> >> If possible, if you can also review, it will be helpful. >> >> I have some more changes unrelated to cgroup in same files in both the git >> tree. >> Pushing them now either results into merge conflict later on for >> Doug/Tejun, or requires rebase and resending patch. >> If you can review, we can avoid such rework. > > My impression of the thread was that there doesn't seem to be enough > of consensus around how rdma resources should be defined. Is that > part agreed upon now? > We ended up discussing few points on different thread [1]. There was confusion on how some non-rdma/non-IB drivers would work with rdma cgroup from Matan. Christoph explained how they don't fit in the rdma subsystem and therefore its not prime target to addess. Intel driver maintainer Denny also acknowledged same on [2]. IB compliant drivers of Intel support rdma cgroup as explained in [2]. With that usnic and Intel psm drivers falls out of rdma cgroup support as they don't fit very well in the verbs definition. [1] https://www.spinics.net/lists/linux-rdma/msg40340.html [2] http://www.spinics.net/lists/linux-rdma/msg40717.html I will wait for Leon's review comments if he has different view on architecture. Back in April when I met face-to-face to Leon and Haggai, Leon was in support to have kernel defined the rdma resources as suggested by Christoph and Tejun instead of IB/RDMA subsystem. I will wait for his comments if his views have changed with new uAPI taking shape.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hello, Parav. On Wed, Sep 21, 2016 at 10:13:38AM +0530, Parav Pandit wrote: > We have completed review from Tejun, Christoph. > HFI driver folks also provided feedback for Intel drivers. > Matan's also doesn't have any more comments. > > If possible, if you can also review, it will be helpful. > > I have some more changes unrelated to cgroup in same files in both the git > tree. > Pushing them now either results into merge conflict later on for > Doug/Tejun, or requires rebase and resending patch. > If you can review, we can avoid such rework. My impression of the thread was that there doesn't seem to be enough of consensus around how rdma resources should be defined. Is that part agreed upon now? Thanks. -- tejun
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hello, Parav. On Wed, Sep 21, 2016 at 10:13:38AM +0530, Parav Pandit wrote: > We have completed review from Tejun, Christoph. > HFI driver folks also provided feedback for Intel drivers. > Matan's also doesn't have any more comments. > > If possible, if you can also review, it will be helpful. > > I have some more changes unrelated to cgroup in same files in both the git > tree. > Pushing them now either results into merge conflict later on for > Doug/Tejun, or requires rebase and resending patch. > If you can review, we can avoid such rework. My impression of the thread was that there doesn't seem to be enough of consensus around how rdma resources should be defined. Is that part agreed upon now? Thanks. -- tejun
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Leon, On Fri, Sep 16, 2016 at 12:26 AM, Leon Romanovskywrote: > On Wed, Sep 14, 2016 at 12:36:19PM +0530, Parav Pandit wrote: >> Hi Dennis, >> >> Do you know how would HFI1 driver would work along with rdma cgroup? >> >> Hi Matan, Leon, Jason, >> Apart from HFI1, is there any other concern? >> Or Patch is good to go? > > I didn't review it yet :(. > Sorry > We have completed review from Tejun, Christoph. HFI driver folks also provided feedback for Intel drivers. Matan's also doesn't have any more comments. If possible, if you can also review, it will be helpful. I have some more changes unrelated to cgroup in same files in both the git tree. Pushing them now either results into merge conflict later on for Doug/Tejun, or requires rebase and resending patch. If you can review, we can avoid such rework. >> >> 4.8 dates are close by (2 weeks) and there are two git trees involved >> (that might cause merge error to Linus) so if there are no issues, I >> would like to make request to Doug to consider it for 4.8 early on. >> >> Parav >> >> On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky wrote: >> > On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: >> >> On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: >> >> > > > > I've posted some initial work toward a) a while ago, and once we >> >> > > >> >> > > Did it get merged? Do you have a pointer? >> >> > >> >> > http://www.spinics.net/lists/linux-rdma/msg31958.html >> >> >> >> Right, I remember that. Certainly the right direction >> >> >> >> > > However, everything under verbs is not straightforward. The files in >> >> > > userspace are not copies... >> >> > > >> >> > > user: >> >> > > >> >> > > struct ibv_query_device { >> >> > >__u32 command; >> >> > >__u16 in_words; >> >> > >__u16 out_words; >> >> > >__u64 response; >> >> > >__u64 driver_data[0]; >> >> > > }; >> >> > > >> >> > > kernel: >> >> > > >> >> > > struct ib_uverbs_query_device { >> >> > > __u64 response; >> >> > > __u64 driver_data[0]; >> >> > > }; >> >> > >> >> > We'll obviously need different strutures for the libibvers API >> >> > and the kernel interface in this case, and we'll need to figure out >> >> > how to properly translate them. I think a cast, plus compile time >> >> > type checking ala BUILD_BUG_ON is the way to go. >> >> >> >> I'm not sure I follow, which would I cast? >> >> >> >> BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + >> >> sizeof(ib_uverbs_query_device)) >> >> >> >> ? >> >> >> >> > > I'm thinking the best way forward might be to use a script and >> >> > > transform userspace into: >> >> > > >> >> > > struct ibv_query_device { >> >> > > struct ib_uverbs_cmd_hdr hdr; >> >> > > struct ib_uverbs_query_device cmd; >> >> > > }; >> >> > >> >> > That would break the users of the interface. >> >> >> >> Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI >> >> identical the modified libibverbs would still be binary compatible >> >> with all providers but not source compatible. Since all kernel >> >> supported providers are in rdma-plumbing we can add the '.cmd.' at the >> >> same time. >> >> >> >> The kernel uapi header would stay the same. >> >> >> >> > However automatically generating the user ABI from the kernel one >> >> > might still be a good idea in the long run. >> >> >> >> My preference would be to try and use the kernel headers directly. >> > >> > I thought the same, especially after realizing that they are almost >> > copy/paste from the vendor *-abi.h files. >> > >> >> >> >> Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Leon, On Fri, Sep 16, 2016 at 12:26 AM, Leon Romanovsky wrote: > On Wed, Sep 14, 2016 at 12:36:19PM +0530, Parav Pandit wrote: >> Hi Dennis, >> >> Do you know how would HFI1 driver would work along with rdma cgroup? >> >> Hi Matan, Leon, Jason, >> Apart from HFI1, is there any other concern? >> Or Patch is good to go? > > I didn't review it yet :(. > Sorry > We have completed review from Tejun, Christoph. HFI driver folks also provided feedback for Intel drivers. Matan's also doesn't have any more comments. If possible, if you can also review, it will be helpful. I have some more changes unrelated to cgroup in same files in both the git tree. Pushing them now either results into merge conflict later on for Doug/Tejun, or requires rebase and resending patch. If you can review, we can avoid such rework. >> >> 4.8 dates are close by (2 weeks) and there are two git trees involved >> (that might cause merge error to Linus) so if there are no issues, I >> would like to make request to Doug to consider it for 4.8 early on. >> >> Parav >> >> On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky wrote: >> > On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: >> >> On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: >> >> > > > > I've posted some initial work toward a) a while ago, and once we >> >> > > >> >> > > Did it get merged? Do you have a pointer? >> >> > >> >> > http://www.spinics.net/lists/linux-rdma/msg31958.html >> >> >> >> Right, I remember that. Certainly the right direction >> >> >> >> > > However, everything under verbs is not straightforward. The files in >> >> > > userspace are not copies... >> >> > > >> >> > > user: >> >> > > >> >> > > struct ibv_query_device { >> >> > >__u32 command; >> >> > >__u16 in_words; >> >> > >__u16 out_words; >> >> > >__u64 response; >> >> > >__u64 driver_data[0]; >> >> > > }; >> >> > > >> >> > > kernel: >> >> > > >> >> > > struct ib_uverbs_query_device { >> >> > > __u64 response; >> >> > > __u64 driver_data[0]; >> >> > > }; >> >> > >> >> > We'll obviously need different strutures for the libibvers API >> >> > and the kernel interface in this case, and we'll need to figure out >> >> > how to properly translate them. I think a cast, plus compile time >> >> > type checking ala BUILD_BUG_ON is the way to go. >> >> >> >> I'm not sure I follow, which would I cast? >> >> >> >> BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + >> >> sizeof(ib_uverbs_query_device)) >> >> >> >> ? >> >> >> >> > > I'm thinking the best way forward might be to use a script and >> >> > > transform userspace into: >> >> > > >> >> > > struct ibv_query_device { >> >> > > struct ib_uverbs_cmd_hdr hdr; >> >> > > struct ib_uverbs_query_device cmd; >> >> > > }; >> >> > >> >> > That would break the users of the interface. >> >> >> >> Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI >> >> identical the modified libibverbs would still be binary compatible >> >> with all providers but not source compatible. Since all kernel >> >> supported providers are in rdma-plumbing we can add the '.cmd.' at the >> >> same time. >> >> >> >> The kernel uapi header would stay the same. >> >> >> >> > However automatically generating the user ABI from the kernel one >> >> > might still be a good idea in the long run. >> >> >> >> My preference would be to try and use the kernel headers directly. >> > >> > I thought the same, especially after realizing that they are almost >> > copy/paste from the vendor *-abi.h files. >> > >> >> >> >> Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Denny, On Mon, Sep 19, 2016 at 6:40 PM, Dalessandro, Denniswrote: > On Wed, 2016-09-14 at 12:36 +0530, Parav Pandit wrote: >> Hi Dennis, >> >> Do you know how would HFI1 driver would work along with rdma cgroup? > > Keep in mind HFI1 driver has two "modes" of operation. We support > verbs, and would surely fall in line with whatever cgroups do for IB > core. Thanks for the feedback. > For our psm interface, not sure how cgroups would come into play. > Psm is designed to expose the hw to user and avoid the kernel when > possible adding more kernel control is sort of contrary to that. > Yes, PSM is currently out of RDMA cgroup and in future we can take a look on how things shape as subsystem if it does. > Now that being said, Christoph recently made mention of maybe having a > drivers/psm [1]. I really haven't had a chance to think about the > implications of that, but maybe it's worth considering, after all we > have two implementations, qib and hfi1. So anyway I'm not sure we need > to be too concerned about cgroups right now as far as psm side of > things goes. > o.k.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Denny, On Mon, Sep 19, 2016 at 6:40 PM, Dalessandro, Dennis wrote: > On Wed, 2016-09-14 at 12:36 +0530, Parav Pandit wrote: >> Hi Dennis, >> >> Do you know how would HFI1 driver would work along with rdma cgroup? > > Keep in mind HFI1 driver has two "modes" of operation. We support > verbs, and would surely fall in line with whatever cgroups do for IB > core. Thanks for the feedback. > For our psm interface, not sure how cgroups would come into play. > Psm is designed to expose the hw to user and avoid the kernel when > possible adding more kernel control is sort of contrary to that. > Yes, PSM is currently out of RDMA cgroup and in future we can take a look on how things shape as subsystem if it does. > Now that being said, Christoph recently made mention of maybe having a > drivers/psm [1]. I really haven't had a chance to think about the > implications of that, but maybe it's worth considering, after all we > have two implementations, qib and hfi1. So anyway I'm not sure we need > to be too concerned about cgroups right now as far as psm side of > things goes. > o.k.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, 2016-09-14 at 12:36 +0530, Parav Pandit wrote: > Hi Dennis, > > Do you know how would HFI1 driver would work along with rdma cgroup? Keep in mind HFI1 driver has two "modes" of operation. We support verbs, and would surely fall in line with whatever cgroups do for IB core. For our psm interface, not sure how cgroups would come into play. Psm is designed to expose the hw to user and avoid the kernel when possible adding more kernel control is sort of contrary to that. Now that being said, Christoph recently made mention of maybe having a drivers/psm [1]. I really haven't had a chance to think about the implications of that, but maybe it's worth considering, after all we have two implementations, qib and hfi1. So anyway I'm not sure we need to be too concerned about cgroups right now as far as psm side of things goes. Depending how things shake out for the uAPI rewrite, or verbs 2.0 or whatever we are calling it today things may change. [1] http://marc.info/?l=linux-rdma=147401714313831=2 -Denny > Hi Matan, Leon, Jason, > Apart from HFI1, is there any other concern? > Or Patch is good to go? > > 4.8 dates are close by (2 weeks) and there are two git trees involved > (that might cause merge error to Linus) so if there are no issues, I > would like to make request to Doug to consider it for 4.8 early on. > > Parav > > On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky> wrote: > > On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: > > > On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig > > > wrote: > > > > > > > I've posted some initial work toward a) a while ago, and > > > > > > > once we > > > > > > > > > > Did it get merged? Do you have a pointer? > > > > > > > > http://www.spinics.net/lists/linux-rdma/msg31958.html > > > > > > Right, I remember that. Certainly the right direction > > > > > > > > However, everything under verbs is not straightforward. The > > > > > files in > > > > > userspace are not copies... > > > > > > > > > > user: > > > > > > > > > > struct ibv_query_device { > > > > > __u32 command; > > > > > __u16 in_words; > > > > > __u16 out_words; > > > > > __u64 response; > > > > > __u64 driver_data[0]; > > > > > }; > > > > > > > > > > kernel: > > > > > > > > > > struct ib_uverbs_query_device { > > > > > __u64 response; > > > > > __u64 driver_data[0]; > > > > > }; > > > > > > > > We'll obviously need different strutures for the libibvers API > > > > and the kernel interface in this case, and we'll need to figure > > > > out > > > > how to properly translate them. I think a cast, plus compile > > > > time > > > > type checking ala BUILD_BUG_ON is the way to go. > > > > > > I'm not sure I follow, which would I cast? > > > > > > BUILD_BUG_ON(sizeof(ibv_query_device) == > > > sizeof(ib_uverbs_cmd_hdr) + > > > sizeof(ib_uverbs_query_device)) > > > > > > ? > > > > > > > > I'm thinking the best way forward might be to use a script > > > > > and > > > > > transform userspace into: > > > > > > > > > > struct ibv_query_device { > > > > > struct ib_uverbs_cmd_hdr hdr; > > > > > struct ib_uverbs_query_device cmd; > > > > > }; > > > > > > > > That would break the users of the interface. > > > > > > Sorry, I mean doing this inside rdma-plumbing. Since the change > > > is ABI > > > identical the modified libibverbs would still be binary > > > compatible > > > with all providers but not source compatible. Since all kernel > > > supported providers are in rdma-plumbing we can add the '.cmd.' > > > at the > > > same time. > > > > > > The kernel uapi header would stay the same. > > > > > > > However automatically generating the user ABI from the kernel > > > > one > > > > might still be a good idea in the long run. > > > > > > My preference would be to try and use the kernel headers > > > directly. > > > > I thought the same, especially after realizing that they are almost > > copy/paste from the vendor *-abi.h files. > > > > > > > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, 2016-09-14 at 12:36 +0530, Parav Pandit wrote: > Hi Dennis, > > Do you know how would HFI1 driver would work along with rdma cgroup? Keep in mind HFI1 driver has two "modes" of operation. We support verbs, and would surely fall in line with whatever cgroups do for IB core. For our psm interface, not sure how cgroups would come into play. Psm is designed to expose the hw to user and avoid the kernel when possible adding more kernel control is sort of contrary to that. Now that being said, Christoph recently made mention of maybe having a drivers/psm [1]. I really haven't had a chance to think about the implications of that, but maybe it's worth considering, after all we have two implementations, qib and hfi1. So anyway I'm not sure we need to be too concerned about cgroups right now as far as psm side of things goes. Depending how things shake out for the uAPI rewrite, or verbs 2.0 or whatever we are calling it today things may change. [1] http://marc.info/?l=linux-rdma=147401714313831=2 -Denny > Hi Matan, Leon, Jason, > Apart from HFI1, is there any other concern? > Or Patch is good to go? > > 4.8 dates are close by (2 weeks) and there are two git trees involved > (that might cause merge error to Linus) so if there are no issues, I > would like to make request to Doug to consider it for 4.8 early on. > > Parav > > On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky > wrote: > > On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: > > > On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig > > > wrote: > > > > > > > I've posted some initial work toward a) a while ago, and > > > > > > > once we > > > > > > > > > > Did it get merged? Do you have a pointer? > > > > > > > > http://www.spinics.net/lists/linux-rdma/msg31958.html > > > > > > Right, I remember that. Certainly the right direction > > > > > > > > However, everything under verbs is not straightforward. The > > > > > files in > > > > > userspace are not copies... > > > > > > > > > > user: > > > > > > > > > > struct ibv_query_device { > > > > > __u32 command; > > > > > __u16 in_words; > > > > > __u16 out_words; > > > > > __u64 response; > > > > > __u64 driver_data[0]; > > > > > }; > > > > > > > > > > kernel: > > > > > > > > > > struct ib_uverbs_query_device { > > > > > __u64 response; > > > > > __u64 driver_data[0]; > > > > > }; > > > > > > > > We'll obviously need different strutures for the libibvers API > > > > and the kernel interface in this case, and we'll need to figure > > > > out > > > > how to properly translate them. I think a cast, plus compile > > > > time > > > > type checking ala BUILD_BUG_ON is the way to go. > > > > > > I'm not sure I follow, which would I cast? > > > > > > BUILD_BUG_ON(sizeof(ibv_query_device) == > > > sizeof(ib_uverbs_cmd_hdr) + > > > sizeof(ib_uverbs_query_device)) > > > > > > ? > > > > > > > > I'm thinking the best way forward might be to use a script > > > > > and > > > > > transform userspace into: > > > > > > > > > > struct ibv_query_device { > > > > > struct ib_uverbs_cmd_hdr hdr; > > > > > struct ib_uverbs_query_device cmd; > > > > > }; > > > > > > > > That would break the users of the interface. > > > > > > Sorry, I mean doing this inside rdma-plumbing. Since the change > > > is ABI > > > identical the modified libibverbs would still be binary > > > compatible > > > with all providers but not source compatible. Since all kernel > > > supported providers are in rdma-plumbing we can add the '.cmd.' > > > at the > > > same time. > > > > > > The kernel uapi header would stay the same. > > > > > > > However automatically generating the user ABI from the kernel > > > > one > > > > might still be a good idea in the long run. > > > > > > My preference would be to try and use the kernel headers > > > directly. > > > > I thought the same, especially after realizing that they are almost > > copy/paste from the vendor *-abi.h files. > > > > > > > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Sep 14, 2016 at 12:36:19PM +0530, Parav Pandit wrote: > Hi Dennis, > > Do you know how would HFI1 driver would work along with rdma cgroup? > > Hi Matan, Leon, Jason, > Apart from HFI1, is there any other concern? > Or Patch is good to go? I didn't review it yet :(. Sorry > > 4.8 dates are close by (2 weeks) and there are two git trees involved > (that might cause merge error to Linus) so if there are no issues, I > would like to make request to Doug to consider it for 4.8 early on. > > Parav > > On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovskywrote: > > On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: > >> On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: > >> > > > > I've posted some initial work toward a) a while ago, and once we > >> > > > >> > > Did it get merged? Do you have a pointer? > >> > > >> > http://www.spinics.net/lists/linux-rdma/msg31958.html > >> > >> Right, I remember that. Certainly the right direction > >> > >> > > However, everything under verbs is not straightforward. The files in > >> > > userspace are not copies... > >> > > > >> > > user: > >> > > > >> > > struct ibv_query_device { > >> > >__u32 command; > >> > >__u16 in_words; > >> > >__u16 out_words; > >> > >__u64 response; > >> > >__u64 driver_data[0]; > >> > > }; > >> > > > >> > > kernel: > >> > > > >> > > struct ib_uverbs_query_device { > >> > > __u64 response; > >> > > __u64 driver_data[0]; > >> > > }; > >> > > >> > We'll obviously need different strutures for the libibvers API > >> > and the kernel interface in this case, and we'll need to figure out > >> > how to properly translate them. I think a cast, plus compile time > >> > type checking ala BUILD_BUG_ON is the way to go. > >> > >> I'm not sure I follow, which would I cast? > >> > >> BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + > >> sizeof(ib_uverbs_query_device)) > >> > >> ? > >> > >> > > I'm thinking the best way forward might be to use a script and > >> > > transform userspace into: > >> > > > >> > > struct ibv_query_device { > >> > > struct ib_uverbs_cmd_hdr hdr; > >> > > struct ib_uverbs_query_device cmd; > >> > > }; > >> > > >> > That would break the users of the interface. > >> > >> Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI > >> identical the modified libibverbs would still be binary compatible > >> with all providers but not source compatible. Since all kernel > >> supported providers are in rdma-plumbing we can add the '.cmd.' at the > >> same time. > >> > >> The kernel uapi header would stay the same. > >> > >> > However automatically generating the user ABI from the kernel one > >> > might still be a good idea in the long run. > >> > >> My preference would be to try and use the kernel headers directly. > > > > I thought the same, especially after realizing that they are almost > > copy/paste from the vendor *-abi.h files. > > > >> > >> Jason signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Sep 14, 2016 at 12:36:19PM +0530, Parav Pandit wrote: > Hi Dennis, > > Do you know how would HFI1 driver would work along with rdma cgroup? > > Hi Matan, Leon, Jason, > Apart from HFI1, is there any other concern? > Or Patch is good to go? I didn't review it yet :(. Sorry > > 4.8 dates are close by (2 weeks) and there are two git trees involved > (that might cause merge error to Linus) so if there are no issues, I > would like to make request to Doug to consider it for 4.8 early on. > > Parav > > On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky wrote: > > On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: > >> On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: > >> > > > > I've posted some initial work toward a) a while ago, and once we > >> > > > >> > > Did it get merged? Do you have a pointer? > >> > > >> > http://www.spinics.net/lists/linux-rdma/msg31958.html > >> > >> Right, I remember that. Certainly the right direction > >> > >> > > However, everything under verbs is not straightforward. The files in > >> > > userspace are not copies... > >> > > > >> > > user: > >> > > > >> > > struct ibv_query_device { > >> > >__u32 command; > >> > >__u16 in_words; > >> > >__u16 out_words; > >> > >__u64 response; > >> > >__u64 driver_data[0]; > >> > > }; > >> > > > >> > > kernel: > >> > > > >> > > struct ib_uverbs_query_device { > >> > > __u64 response; > >> > > __u64 driver_data[0]; > >> > > }; > >> > > >> > We'll obviously need different strutures for the libibvers API > >> > and the kernel interface in this case, and we'll need to figure out > >> > how to properly translate them. I think a cast, plus compile time > >> > type checking ala BUILD_BUG_ON is the way to go. > >> > >> I'm not sure I follow, which would I cast? > >> > >> BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + > >> sizeof(ib_uverbs_query_device)) > >> > >> ? > >> > >> > > I'm thinking the best way forward might be to use a script and > >> > > transform userspace into: > >> > > > >> > > struct ibv_query_device { > >> > > struct ib_uverbs_cmd_hdr hdr; > >> > > struct ib_uverbs_query_device cmd; > >> > > }; > >> > > >> > That would break the users of the interface. > >> > >> Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI > >> identical the modified libibverbs would still be binary compatible > >> with all providers but not source compatible. Since all kernel > >> supported providers are in rdma-plumbing we can add the '.cmd.' at the > >> same time. > >> > >> The kernel uapi header would stay the same. > >> > >> > However automatically generating the user ABI from the kernel one > >> > might still be a good idea in the long run. > >> > >> My preference would be to try and use the kernel headers directly. > > > > I thought the same, especially after realizing that they are almost > > copy/paste from the vendor *-abi.h files. > > > >> > >> Jason signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Matan, On Wed, Sep 14, 2016 at 1:44 PM, Matan Barakwrote: > On 14/09/2016 10:06, Parav Pandit wrote: >> >> Hi Dennis, >> >> Do you know how would HFI1 driver would work along with rdma cgroup? >> >> Hi Matan, Leon, Jason, >> Apart from HFI1, is there any other concern? > > > I just wonder how things like RSS will work. For example, a RSS QP doesn't > really have a queue (if I recall, it's connected to work queues via an > indirection table). So, when a user creates such a QP, do you want to > account it as a regular QP? > How are work queues accounted? ib_create_rwq_ind_table verb allows creating indirection table. I assume it allows creating multiple such tables. If it is so, than number of tables should be a cgroup resource that we can add in follow on patch. By doing so, one container doesn't takeaway all the tables. > > >> Or Patch is good to go? >> >> 4.8 dates are close by (2 weeks) and there are two git trees involved >> (that might cause merge error to Linus) so if there are no issues, I >> would like to make request to Doug to consider it for 4.8 early on. >> >> Parav >> >> On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky wrote: >>> >>> On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: I've posted some initial work toward a) a while ago, and once we >> >> >> Did it get merged? Do you have a pointer? > > > http://www.spinics.net/lists/linux-rdma/msg31958.html Right, I remember that. Certainly the right direction >> However, everything under verbs is not straightforward. The files in >> userspace are not copies... >> >> user: >> >> struct ibv_query_device { >>__u32 command; >>__u16 in_words; >>__u16 out_words; >>__u64 response; >>__u64 driver_data[0]; >> }; >> >> kernel: >> >> struct ib_uverbs_query_device { >> __u64 response; >> __u64 driver_data[0]; >> }; > > > We'll obviously need different strutures for the libibvers API > and the kernel interface in this case, and we'll need to figure out > how to properly translate them. I think a cast, plus compile time > type checking ala BUILD_BUG_ON is the way to go. I'm not sure I follow, which would I cast? BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + sizeof(ib_uverbs_query_device)) ? >> I'm thinking the best way forward might be to use a script and >> transform userspace into: >> >> struct ibv_query_device { >> struct ib_uverbs_cmd_hdr hdr; >> struct ib_uverbs_query_device cmd; >> }; > > > That would break the users of the interface. Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI identical the modified libibverbs would still be binary compatible with all providers but not source compatible. Since all kernel supported providers are in rdma-plumbing we can add the '.cmd.' at the same time. The kernel uapi header would stay the same. > However automatically generating the user ABI from the kernel one > might still be a good idea in the long run. My preference would be to try and use the kernel headers directly. >>> >>> >>> I thought the same, especially after realizing that they are almost >>> copy/paste from the vendor *-abi.h files. >>> Jason > >
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Matan, On Wed, Sep 14, 2016 at 1:44 PM, Matan Barak wrote: > On 14/09/2016 10:06, Parav Pandit wrote: >> >> Hi Dennis, >> >> Do you know how would HFI1 driver would work along with rdma cgroup? >> >> Hi Matan, Leon, Jason, >> Apart from HFI1, is there any other concern? > > > I just wonder how things like RSS will work. For example, a RSS QP doesn't > really have a queue (if I recall, it's connected to work queues via an > indirection table). So, when a user creates such a QP, do you want to > account it as a regular QP? > How are work queues accounted? ib_create_rwq_ind_table verb allows creating indirection table. I assume it allows creating multiple such tables. If it is so, than number of tables should be a cgroup resource that we can add in follow on patch. By doing so, one container doesn't takeaway all the tables. > > >> Or Patch is good to go? >> >> 4.8 dates are close by (2 weeks) and there are two git trees involved >> (that might cause merge error to Linus) so if there are no issues, I >> would like to make request to Doug to consider it for 4.8 early on. >> >> Parav >> >> On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky wrote: >>> >>> On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: I've posted some initial work toward a) a while ago, and once we >> >> >> Did it get merged? Do you have a pointer? > > > http://www.spinics.net/lists/linux-rdma/msg31958.html Right, I remember that. Certainly the right direction >> However, everything under verbs is not straightforward. The files in >> userspace are not copies... >> >> user: >> >> struct ibv_query_device { >>__u32 command; >>__u16 in_words; >>__u16 out_words; >>__u64 response; >>__u64 driver_data[0]; >> }; >> >> kernel: >> >> struct ib_uverbs_query_device { >> __u64 response; >> __u64 driver_data[0]; >> }; > > > We'll obviously need different strutures for the libibvers API > and the kernel interface in this case, and we'll need to figure out > how to properly translate them. I think a cast, plus compile time > type checking ala BUILD_BUG_ON is the way to go. I'm not sure I follow, which would I cast? BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + sizeof(ib_uverbs_query_device)) ? >> I'm thinking the best way forward might be to use a script and >> transform userspace into: >> >> struct ibv_query_device { >> struct ib_uverbs_cmd_hdr hdr; >> struct ib_uverbs_query_device cmd; >> }; > > > That would break the users of the interface. Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI identical the modified libibverbs would still be binary compatible with all providers but not source compatible. Since all kernel supported providers are in rdma-plumbing we can add the '.cmd.' at the same time. The kernel uapi header would stay the same. > However automatically generating the user ABI from the kernel one > might still be a good idea in the long run. My preference would be to try and use the kernel headers directly. >>> >>> >>> I thought the same, especially after realizing that they are almost >>> copy/paste from the vendor *-abi.h files. >>> Jason > >
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 14/09/2016 10:06, Parav Pandit wrote: Hi Dennis, Do you know how would HFI1 driver would work along with rdma cgroup? Hi Matan, Leon, Jason, Apart from HFI1, is there any other concern? I just wonder how things like RSS will work. For example, a RSS QP doesn't really have a queue (if I recall, it's connected to work queues via an indirection table). So, when a user creates such a QP, do you want to account it as a regular QP? How are work queues accounted? Or Patch is good to go? 4.8 dates are close by (2 weeks) and there are two git trees involved (that might cause merge error to Linus) so if there are no issues, I would like to make request to Doug to consider it for 4.8 early on. Parav On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovskywrote: On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: I've posted some initial work toward a) a while ago, and once we Did it get merged? Do you have a pointer? http://www.spinics.net/lists/linux-rdma/msg31958.html Right, I remember that. Certainly the right direction However, everything under verbs is not straightforward. The files in userspace are not copies... user: struct ibv_query_device { __u32 command; __u16 in_words; __u16 out_words; __u64 response; __u64 driver_data[0]; }; kernel: struct ib_uverbs_query_device { __u64 response; __u64 driver_data[0]; }; We'll obviously need different strutures for the libibvers API and the kernel interface in this case, and we'll need to figure out how to properly translate them. I think a cast, plus compile time type checking ala BUILD_BUG_ON is the way to go. I'm not sure I follow, which would I cast? BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + sizeof(ib_uverbs_query_device)) ? I'm thinking the best way forward might be to use a script and transform userspace into: struct ibv_query_device { struct ib_uverbs_cmd_hdr hdr; struct ib_uverbs_query_device cmd; }; That would break the users of the interface. Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI identical the modified libibverbs would still be binary compatible with all providers but not source compatible. Since all kernel supported providers are in rdma-plumbing we can add the '.cmd.' at the same time. The kernel uapi header would stay the same. However automatically generating the user ABI from the kernel one might still be a good idea in the long run. My preference would be to try and use the kernel headers directly. I thought the same, especially after realizing that they are almost copy/paste from the vendor *-abi.h files. Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 14/09/2016 10:06, Parav Pandit wrote: Hi Dennis, Do you know how would HFI1 driver would work along with rdma cgroup? Hi Matan, Leon, Jason, Apart from HFI1, is there any other concern? I just wonder how things like RSS will work. For example, a RSS QP doesn't really have a queue (if I recall, it's connected to work queues via an indirection table). So, when a user creates such a QP, do you want to account it as a regular QP? How are work queues accounted? Or Patch is good to go? 4.8 dates are close by (2 weeks) and there are two git trees involved (that might cause merge error to Linus) so if there are no issues, I would like to make request to Doug to consider it for 4.8 early on. Parav On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky wrote: On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: I've posted some initial work toward a) a while ago, and once we Did it get merged? Do you have a pointer? http://www.spinics.net/lists/linux-rdma/msg31958.html Right, I remember that. Certainly the right direction However, everything under verbs is not straightforward. The files in userspace are not copies... user: struct ibv_query_device { __u32 command; __u16 in_words; __u16 out_words; __u64 response; __u64 driver_data[0]; }; kernel: struct ib_uverbs_query_device { __u64 response; __u64 driver_data[0]; }; We'll obviously need different strutures for the libibvers API and the kernel interface in this case, and we'll need to figure out how to properly translate them. I think a cast, plus compile time type checking ala BUILD_BUG_ON is the way to go. I'm not sure I follow, which would I cast? BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + sizeof(ib_uverbs_query_device)) ? I'm thinking the best way forward might be to use a script and transform userspace into: struct ibv_query_device { struct ib_uverbs_cmd_hdr hdr; struct ib_uverbs_query_device cmd; }; That would break the users of the interface. Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI identical the modified libibverbs would still be binary compatible with all providers but not source compatible. Since all kernel supported providers are in rdma-plumbing we can add the '.cmd.' at the same time. The kernel uapi header would stay the same. However automatically generating the user ABI from the kernel one might still be a good idea in the long run. My preference would be to try and use the kernel headers directly. I thought the same, especially after realizing that they are almost copy/paste from the vendor *-abi.h files. Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Dennis, Do you know how would HFI1 driver would work along with rdma cgroup? Hi Matan, Leon, Jason, Apart from HFI1, is there any other concern? Or Patch is good to go? 4.8 dates are close by (2 weeks) and there are two git trees involved (that might cause merge error to Linus) so if there are no issues, I would like to make request to Doug to consider it for 4.8 early on. Parav On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovskywrote: > On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: >> On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: >> > > > > I've posted some initial work toward a) a while ago, and once we >> > > >> > > Did it get merged? Do you have a pointer? >> > >> > http://www.spinics.net/lists/linux-rdma/msg31958.html >> >> Right, I remember that. Certainly the right direction >> >> > > However, everything under verbs is not straightforward. The files in >> > > userspace are not copies... >> > > >> > > user: >> > > >> > > struct ibv_query_device { >> > >__u32 command; >> > >__u16 in_words; >> > >__u16 out_words; >> > >__u64 response; >> > >__u64 driver_data[0]; >> > > }; >> > > >> > > kernel: >> > > >> > > struct ib_uverbs_query_device { >> > > __u64 response; >> > > __u64 driver_data[0]; >> > > }; >> > >> > We'll obviously need different strutures for the libibvers API >> > and the kernel interface in this case, and we'll need to figure out >> > how to properly translate them. I think a cast, plus compile time >> > type checking ala BUILD_BUG_ON is the way to go. >> >> I'm not sure I follow, which would I cast? >> >> BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + >> sizeof(ib_uverbs_query_device)) >> >> ? >> >> > > I'm thinking the best way forward might be to use a script and >> > > transform userspace into: >> > > >> > > struct ibv_query_device { >> > > struct ib_uverbs_cmd_hdr hdr; >> > > struct ib_uverbs_query_device cmd; >> > > }; >> > >> > That would break the users of the interface. >> >> Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI >> identical the modified libibverbs would still be binary compatible >> with all providers but not source compatible. Since all kernel >> supported providers are in rdma-plumbing we can add the '.cmd.' at the >> same time. >> >> The kernel uapi header would stay the same. >> >> > However automatically generating the user ABI from the kernel one >> > might still be a good idea in the long run. >> >> My preference would be to try and use the kernel headers directly. > > I thought the same, especially after realizing that they are almost > copy/paste from the vendor *-abi.h files. > >> >> Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Dennis, Do you know how would HFI1 driver would work along with rdma cgroup? Hi Matan, Leon, Jason, Apart from HFI1, is there any other concern? Or Patch is good to go? 4.8 dates are close by (2 weeks) and there are two git trees involved (that might cause merge error to Linus) so if there are no issues, I would like to make request to Doug to consider it for 4.8 early on. Parav On Mon, Sep 12, 2016 at 10:37 AM, Leon Romanovsky wrote: > On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: >> On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: >> > > > > I've posted some initial work toward a) a while ago, and once we >> > > >> > > Did it get merged? Do you have a pointer? >> > >> > http://www.spinics.net/lists/linux-rdma/msg31958.html >> >> Right, I remember that. Certainly the right direction >> >> > > However, everything under verbs is not straightforward. The files in >> > > userspace are not copies... >> > > >> > > user: >> > > >> > > struct ibv_query_device { >> > >__u32 command; >> > >__u16 in_words; >> > >__u16 out_words; >> > >__u64 response; >> > >__u64 driver_data[0]; >> > > }; >> > > >> > > kernel: >> > > >> > > struct ib_uverbs_query_device { >> > > __u64 response; >> > > __u64 driver_data[0]; >> > > }; >> > >> > We'll obviously need different strutures for the libibvers API >> > and the kernel interface in this case, and we'll need to figure out >> > how to properly translate them. I think a cast, plus compile time >> > type checking ala BUILD_BUG_ON is the way to go. >> >> I'm not sure I follow, which would I cast? >> >> BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + >> sizeof(ib_uverbs_query_device)) >> >> ? >> >> > > I'm thinking the best way forward might be to use a script and >> > > transform userspace into: >> > > >> > > struct ibv_query_device { >> > > struct ib_uverbs_cmd_hdr hdr; >> > > struct ib_uverbs_query_device cmd; >> > > }; >> > >> > That would break the users of the interface. >> >> Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI >> identical the modified libibverbs would still be binary compatible >> with all providers but not source compatible. Since all kernel >> supported providers are in rdma-plumbing we can add the '.cmd.' at the >> same time. >> >> The kernel uapi header would stay the same. >> >> > However automatically generating the user ABI from the kernel one >> > might still be a good idea in the long run. >> >> My preference would be to try and use the kernel headers directly. > > I thought the same, especially after realizing that they are almost > copy/paste from the vendor *-abi.h files. > >> >> Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: > On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: > > > > > I've posted some initial work toward a) a while ago, and once we > > > > > > Did it get merged? Do you have a pointer? > > > > http://www.spinics.net/lists/linux-rdma/msg31958.html > > Right, I remember that. Certainly the right direction > > > > However, everything under verbs is not straightforward. The files in > > > userspace are not copies... > > > > > > user: > > > > > > struct ibv_query_device { > > >__u32 command; > > >__u16 in_words; > > >__u16 out_words; > > >__u64 response; > > >__u64 driver_data[0]; > > > }; > > > > > > kernel: > > > > > > struct ib_uverbs_query_device { > > > __u64 response; > > > __u64 driver_data[0]; > > > }; > > > > We'll obviously need different strutures for the libibvers API > > and the kernel interface in this case, and we'll need to figure out > > how to properly translate them. I think a cast, plus compile time > > type checking ala BUILD_BUG_ON is the way to go. > > I'm not sure I follow, which would I cast? > > BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + > sizeof(ib_uverbs_query_device)) > > ? > > > > I'm thinking the best way forward might be to use a script and > > > transform userspace into: > > > > > > struct ibv_query_device { > > > struct ib_uverbs_cmd_hdr hdr; > > > struct ib_uverbs_query_device cmd; > > > }; > > > > That would break the users of the interface. > > Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI > identical the modified libibverbs would still be binary compatible > with all providers but not source compatible. Since all kernel > supported providers are in rdma-plumbing we can add the '.cmd.' at the > same time. > > The kernel uapi header would stay the same. > > > However automatically generating the user ABI from the kernel one > > might still be a good idea in the long run. > > My preference would be to try and use the kernel headers directly. I thought the same, especially after realizing that they are almost copy/paste from the vendor *-abi.h files. > > Jason signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sun, Sep 11, 2016 at 11:52:35AM -0600, Jason Gunthorpe wrote: > On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: > > > > > I've posted some initial work toward a) a while ago, and once we > > > > > > Did it get merged? Do you have a pointer? > > > > http://www.spinics.net/lists/linux-rdma/msg31958.html > > Right, I remember that. Certainly the right direction > > > > However, everything under verbs is not straightforward. The files in > > > userspace are not copies... > > > > > > user: > > > > > > struct ibv_query_device { > > >__u32 command; > > >__u16 in_words; > > >__u16 out_words; > > >__u64 response; > > >__u64 driver_data[0]; > > > }; > > > > > > kernel: > > > > > > struct ib_uverbs_query_device { > > > __u64 response; > > > __u64 driver_data[0]; > > > }; > > > > We'll obviously need different strutures for the libibvers API > > and the kernel interface in this case, and we'll need to figure out > > how to properly translate them. I think a cast, plus compile time > > type checking ala BUILD_BUG_ON is the way to go. > > I'm not sure I follow, which would I cast? > > BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + > sizeof(ib_uverbs_query_device)) > > ? > > > > I'm thinking the best way forward might be to use a script and > > > transform userspace into: > > > > > > struct ibv_query_device { > > > struct ib_uverbs_cmd_hdr hdr; > > > struct ib_uverbs_query_device cmd; > > > }; > > > > That would break the users of the interface. > > Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI > identical the modified libibverbs would still be binary compatible > with all providers but not source compatible. Since all kernel > supported providers are in rdma-plumbing we can add the '.cmd.' at the > same time. > > The kernel uapi header would stay the same. > > > However automatically generating the user ABI from the kernel one > > might still be a good idea in the long run. > > My preference would be to try and use the kernel headers directly. I thought the same, especially after realizing that they are almost copy/paste from the vendor *-abi.h files. > > Jason signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: > > > > I've posted some initial work toward a) a while ago, and once we > > > > Did it get merged? Do you have a pointer? > > http://www.spinics.net/lists/linux-rdma/msg31958.html Right, I remember that. Certainly the right direction > > However, everything under verbs is not straightforward. The files in > > userspace are not copies... > > > > user: > > > > struct ibv_query_device { > >__u32 command; > >__u16 in_words; > >__u16 out_words; > >__u64 response; > >__u64 driver_data[0]; > > }; > > > > kernel: > > > > struct ib_uverbs_query_device { > > __u64 response; > > __u64 driver_data[0]; > > }; > > We'll obviously need different strutures for the libibvers API > and the kernel interface in this case, and we'll need to figure out > how to properly translate them. I think a cast, plus compile time > type checking ala BUILD_BUG_ON is the way to go. I'm not sure I follow, which would I cast? BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + sizeof(ib_uverbs_query_device)) ? > > I'm thinking the best way forward might be to use a script and > > transform userspace into: > > > > struct ibv_query_device { > > struct ib_uverbs_cmd_hdr hdr; > > struct ib_uverbs_query_device cmd; > > }; > > That would break the users of the interface. Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI identical the modified libibverbs would still be binary compatible with all providers but not source compatible. Since all kernel supported providers are in rdma-plumbing we can add the '.cmd.' at the same time. The kernel uapi header would stay the same. > However automatically generating the user ABI from the kernel one > might still be a good idea in the long run. My preference would be to try and use the kernel headers directly. Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sun, Sep 11, 2016 at 07:24:45PM +0200, Christoph Hellwig wrote: > > > > I've posted some initial work toward a) a while ago, and once we > > > > Did it get merged? Do you have a pointer? > > http://www.spinics.net/lists/linux-rdma/msg31958.html Right, I remember that. Certainly the right direction > > However, everything under verbs is not straightforward. The files in > > userspace are not copies... > > > > user: > > > > struct ibv_query_device { > >__u32 command; > >__u16 in_words; > >__u16 out_words; > >__u64 response; > >__u64 driver_data[0]; > > }; > > > > kernel: > > > > struct ib_uverbs_query_device { > > __u64 response; > > __u64 driver_data[0]; > > }; > > We'll obviously need different strutures for the libibvers API > and the kernel interface in this case, and we'll need to figure out > how to properly translate them. I think a cast, plus compile time > type checking ala BUILD_BUG_ON is the way to go. I'm not sure I follow, which would I cast? BUILD_BUG_ON(sizeof(ibv_query_device) == sizeof(ib_uverbs_cmd_hdr) + sizeof(ib_uverbs_query_device)) ? > > I'm thinking the best way forward might be to use a script and > > transform userspace into: > > > > struct ibv_query_device { > > struct ib_uverbs_cmd_hdr hdr; > > struct ib_uverbs_query_device cmd; > > }; > > That would break the users of the interface. Sorry, I mean doing this inside rdma-plumbing. Since the change is ABI identical the modified libibverbs would still be binary compatible with all providers but not source compatible. Since all kernel supported providers are in rdma-plumbing we can add the '.cmd.' at the same time. The kernel uapi header would stay the same. > However automatically generating the user ABI from the kernel one > might still be a good idea in the long run. My preference would be to try and use the kernel headers directly. Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sun, Sep 11, 2016 at 11:14:09AM -0600, Jason Gunthorpe wrote: > > > We stil always have the common structure first. And at least for > > > cgroups supports that's what matters. > > > > > > Re the actual structures - we'll really need to make sure we > > > > > > a) expose proper userspace abi headers in the kernel for all code > > > in the RDMA subsystem > > > b) actually use that in the userspace components > > > > > > I've posted some initial work toward a) a while ago, and once we > > Did it get merged? Do you have a pointer? http://www.spinics.net/lists/linux-rdma/msg31958.html > this without it would be very hard, as everything is cross-linked, I > couldnn't unwind libibcm until I fixed a bit of verbs, and rdmacm can't > even include its uapi header until the duplicate definitions in the > verbs copy are delt with .. and I've also learned we are making > changing to the kernel uapi header and since nothing uses them we never even > compile test :( :( eg > https://github.com/torvalds/linux/commit/b493d91d333e867a043f7ff1397bcba6e2d0dda2] > However, everything under verbs is not straightforward. The files in > userspace are not copies... > > user: > > struct ibv_query_device { >__u32 command; >__u16 in_words; >__u16 out_words; >__u64 response; >__u64 driver_data[0]; > }; > > kernel: > > struct ib_uverbs_query_device { > __u64 response; > __u64 driver_data[0]; > }; We'll obviously need different strutures for the libibvers API and the kernel interface in this case, and we'll need to figure out how to properly translate them. I think a cast, plus compile time type checking ala BUILD_BUG_ON is the way to go. > eg the userspace version stuffs the header into the struct and the > kernel version does not. Presumably this is for efficiency so that no > copies are required when marshaling. This impacts everything :( > > I'm thinking the best way forward might be to use a script and > transform userspace into: > > struct ibv_query_device { > struct ib_uverbs_cmd_hdr hdr; > struct ib_uverbs_query_device cmd; > }; That would break the users of the interface. However automatically generating the user ABI from the kernel one might still be a good idea in the long run.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sun, Sep 11, 2016 at 11:14:09AM -0600, Jason Gunthorpe wrote: > > > We stil always have the common structure first. And at least for > > > cgroups supports that's what matters. > > > > > > Re the actual structures - we'll really need to make sure we > > > > > > a) expose proper userspace abi headers in the kernel for all code > > > in the RDMA subsystem > > > b) actually use that in the userspace components > > > > > > I've posted some initial work toward a) a while ago, and once we > > Did it get merged? Do you have a pointer? http://www.spinics.net/lists/linux-rdma/msg31958.html > this without it would be very hard, as everything is cross-linked, I > couldnn't unwind libibcm until I fixed a bit of verbs, and rdmacm can't > even include its uapi header until the duplicate definitions in the > verbs copy are delt with .. and I've also learned we are making > changing to the kernel uapi header and since nothing uses them we never even > compile test :( :( eg > https://github.com/torvalds/linux/commit/b493d91d333e867a043f7ff1397bcba6e2d0dda2] > However, everything under verbs is not straightforward. The files in > userspace are not copies... > > user: > > struct ibv_query_device { >__u32 command; >__u16 in_words; >__u16 out_words; >__u64 response; >__u64 driver_data[0]; > }; > > kernel: > > struct ib_uverbs_query_device { > __u64 response; > __u64 driver_data[0]; > }; We'll obviously need different strutures for the libibvers API and the kernel interface in this case, and we'll need to figure out how to properly translate them. I think a cast, plus compile time type checking ala BUILD_BUG_ON is the way to go. > eg the userspace version stuffs the header into the struct and the > kernel version does not. Presumably this is for efficiency so that no > copies are required when marshaling. This impacts everything :( > > I'm thinking the best way forward might be to use a script and > transform userspace into: > > struct ibv_query_device { > struct ib_uverbs_cmd_hdr hdr; > struct ib_uverbs_query_device cmd; > }; That would break the users of the interface. However automatically generating the user ABI from the kernel one might still be a good idea in the long run.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sun, Sep 11, 2016 at 05:35:22PM +0300, Leon Romanovsky wrote: > > We stil always have the common structure first. And at least for > > cgroups supports that's what matters. > > > > Re the actual structures - we'll really need to make sure we > > > > a) expose proper userspace abi headers in the kernel for all code > > in the RDMA subsystem > > b) actually use that in the userspace components > > > > I've posted some initial work toward a) a while ago, and once we Did it get merged? Do you have a pointer? > > agree on adopting your common repo I'd really like to start through > > with that work. I think it's a pre-requisite for any major new > > userspace ABI work. > > I started to work on it over weekend and it is worth do not do same work > twice. Yes, I also agree that it is important before we tackle the uapi conversion to get this fully sorted. I've already done several cases working with the existing uapi headers: https://github.com/jgunthorpe/rdma-plumbing/commit/f4f40689440dbc9c57b55548b04b15fe808a1767 https://github.com/jgunthorpe/rdma-plumbing/commit/0cf1893dce4791dafa035bcb6ee045a6ce0ff3c3 https://github.com/jgunthorpe/rdma-plumbing/commit/0522fc42aac4a5e8fc888dcca4341c9bc1dc58ca [.. and this is a strong argument why we need the common repo, doing this without it would be very hard, as everything is cross-linked, I couldnn't unwind libibcm until I fixed a bit of verbs, and rdmacm can't even include its uapi header until the duplicate definitions in the verbs copy are delt with .. and I've also learned we are making changing to the kernel uapi header and since nothing uses them we never even compile test :( :( eg https://github.com/torvalds/linux/commit/b493d91d333e867a043f7ff1397bcba6e2d0dda2] However, everything under verbs is not straightforward. The files in userspace are not copies... user: struct ibv_query_device { __u32 command; __u16 in_words; __u16 out_words; __u64 response; __u64 driver_data[0]; }; kernel: struct ib_uverbs_query_device { __u64 response; __u64 driver_data[0]; }; eg the userspace version stuffs the header into the struct and the kernel version does not. Presumably this is for efficiency so that no copies are required when marshaling. This impacts everything :( I'm thinking the best way forward might be to use a script and transform userspace into: struct ibv_query_device { struct ib_uverbs_cmd_hdr hdr; struct ib_uverbs_query_device cmd; }; Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sun, Sep 11, 2016 at 05:35:22PM +0300, Leon Romanovsky wrote: > > We stil always have the common structure first. And at least for > > cgroups supports that's what matters. > > > > Re the actual structures - we'll really need to make sure we > > > > a) expose proper userspace abi headers in the kernel for all code > > in the RDMA subsystem > > b) actually use that in the userspace components > > > > I've posted some initial work toward a) a while ago, and once we Did it get merged? Do you have a pointer? > > agree on adopting your common repo I'd really like to start through > > with that work. I think it's a pre-requisite for any major new > > userspace ABI work. > > I started to work on it over weekend and it is worth do not do same work > twice. Yes, I also agree that it is important before we tackle the uapi conversion to get this fully sorted. I've already done several cases working with the existing uapi headers: https://github.com/jgunthorpe/rdma-plumbing/commit/f4f40689440dbc9c57b55548b04b15fe808a1767 https://github.com/jgunthorpe/rdma-plumbing/commit/0cf1893dce4791dafa035bcb6ee045a6ce0ff3c3 https://github.com/jgunthorpe/rdma-plumbing/commit/0522fc42aac4a5e8fc888dcca4341c9bc1dc58ca [.. and this is a strong argument why we need the common repo, doing this without it would be very hard, as everything is cross-linked, I couldnn't unwind libibcm until I fixed a bit of verbs, and rdmacm can't even include its uapi header until the duplicate definitions in the verbs copy are delt with .. and I've also learned we are making changing to the kernel uapi header and since nothing uses them we never even compile test :( :( eg https://github.com/torvalds/linux/commit/b493d91d333e867a043f7ff1397bcba6e2d0dda2] However, everything under verbs is not straightforward. The files in userspace are not copies... user: struct ibv_query_device { __u32 command; __u16 in_words; __u16 out_words; __u64 response; __u64 driver_data[0]; }; kernel: struct ib_uverbs_query_device { __u64 response; __u64 driver_data[0]; }; eg the userspace version stuffs the header into the struct and the kernel version does not. Presumably this is for efficiency so that no copies are required when marshaling. This impacts everything :( I'm thinking the best way forward might be to use a script and transform userspace into: struct ibv_query_device { struct ib_uverbs_cmd_hdr hdr; struct ib_uverbs_query_device cmd; }; Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sun, Sep 11, 2016 at 03:34:21PM +0200, Christoph Hellwig wrote: > On Sat, Sep 10, 2016 at 11:01:51AM -0600, Jason Gunthorpe wrote: > > Sadly, it isn't a step backwards, it is status quo - at least as far > > as the uapi is concerned. > > Sort of, see below: > > > struct mlx5_create_cq { > > struct ibv_create_cqibv_cmd; > > __u64 buf_addr; > > __u64 db_addr; > > __u32 cqe_size; > > }; > > > > struct iwch_create_cq { > > struct ibv_create_cq ibv_cmd; > > uint64_t user_rptr_addr; > > }; > > > > Love to hear ideas on a way forward that doesn't involve rewriting > > everything :( > > We stil always have the common structure first. And at least for > cgroups supports that's what matters. > > Re the actual structures - we'll really need to make sure we > > a) expose proper userspace abi headers in the kernel for all code > in the RDMA subsystem > b) actually use that in the userspace components > > I've posted some initial work toward a) a while ago, and once we > agree on adopting your common repo I'd really like to start through > with that work. I think it's a pre-requisite for any major new > userspace ABI work. I started to work on it over weekend and it is worth do not do same work twice. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sun, Sep 11, 2016 at 03:34:21PM +0200, Christoph Hellwig wrote: > On Sat, Sep 10, 2016 at 11:01:51AM -0600, Jason Gunthorpe wrote: > > Sadly, it isn't a step backwards, it is status quo - at least as far > > as the uapi is concerned. > > Sort of, see below: > > > struct mlx5_create_cq { > > struct ibv_create_cqibv_cmd; > > __u64 buf_addr; > > __u64 db_addr; > > __u32 cqe_size; > > }; > > > > struct iwch_create_cq { > > struct ibv_create_cq ibv_cmd; > > uint64_t user_rptr_addr; > > }; > > > > Love to hear ideas on a way forward that doesn't involve rewriting > > everything :( > > We stil always have the common structure first. And at least for > cgroups supports that's what matters. > > Re the actual structures - we'll really need to make sure we > > a) expose proper userspace abi headers in the kernel for all code > in the RDMA subsystem > b) actually use that in the userspace components > > I've posted some initial work toward a) a while ago, and once we > agree on adopting your common repo I'd really like to start through > with that work. I think it's a pre-requisite for any major new > userspace ABI work. I started to work on it over weekend and it is worth do not do same work twice. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sat, Sep 10, 2016 at 11:01:51AM -0600, Jason Gunthorpe wrote: > Sadly, it isn't a step backwards, it is status quo - at least as far > as the uapi is concerned. Sort of, see below: > struct mlx5_create_cq { > struct ibv_create_cqibv_cmd; > __u64 buf_addr; > __u64 db_addr; > __u32 cqe_size; > }; > > struct iwch_create_cq { > struct ibv_create_cq ibv_cmd; > uint64_t user_rptr_addr; > }; > > Love to hear ideas on a way forward that doesn't involve rewriting > everything :( We stil always have the common structure first. And at least for cgroups supports that's what matters. Re the actual structures - we'll really need to make sure we a) expose proper userspace abi headers in the kernel for all code in the RDMA subsystem b) actually use that in the userspace components I've posted some initial work toward a) a while ago, and once we agree on adopting your common repo I'd really like to start through with that work. I think it's a pre-requisite for any major new userspace ABI work.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sat, Sep 10, 2016 at 11:01:51AM -0600, Jason Gunthorpe wrote: > Sadly, it isn't a step backwards, it is status quo - at least as far > as the uapi is concerned. Sort of, see below: > struct mlx5_create_cq { > struct ibv_create_cqibv_cmd; > __u64 buf_addr; > __u64 db_addr; > __u32 cqe_size; > }; > > struct iwch_create_cq { > struct ibv_create_cq ibv_cmd; > uint64_t user_rptr_addr; > }; > > Love to hear ideas on a way forward that doesn't involve rewriting > everything :( We stil always have the common structure first. And at least for cgroups supports that's what matters. Re the actual structures - we'll really need to make sure we a) expose proper userspace abi headers in the kernel for all code in the RDMA subsystem b) actually use that in the userspace components I've posted some initial work toward a) a while ago, and once we agree on adopting your common repo I'd really like to start through with that work. I think it's a pre-requisite for any major new userspace ABI work.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 10/09/2016 20:01, Jason Gunthorpe wrote: On Sat, Sep 10, 2016 at 06:14:42PM +0200, Christoph Hellwig wrote: OFVWG meetings have absolutely zero relevance for Linux development. Well, to be fair there are a fair number of kernel developers on that particular call.. More "flexibility" for drivers just means giving up on designing a coherent API and leaving it to drivers authors to add crap to their own drivers. That's a major step backwards. Sadly, it isn't a step backwards, it is status quo - at least as far as the uapi is concerned. Every single user space driver has its own private abi file, carefully hidden in their driver, and dutifully copied over to user space: providers/cxgb3/iwch-abi.h providers/cxgb4/cxgb4-abi.h providers/hfi1verbs/hfi-abi.h providers/i40iw/i40iw-abi.h providers/ipathverbs/ipath-abi.h providers/mlx4/mlx4-abi.h providers/mlx5/mlx5-abi.h providers/mthca/mthca-abi.h providers/nes/nes-abi.h providers/ocrdma/ocrdma_abi.h providers/rxe/rxe-abi.h Just to pick two random examples: struct mlx5_create_cq { struct ibv_create_cqibv_cmd; __u64 buf_addr; __u64 db_addr; __u32 cqe_size; }; struct iwch_create_cq { struct ibv_create_cq ibv_cmd; uint64_t user_rptr_addr; }; Love to hear ideas on a way forward that doesn't involve rewriting everything :( Yeah, unfortunately, the RDMA ABI is more driver specific ABI than a common user-kernel ABI. I guess this will become even worse, as the RDMA subsystem is evolving to serve more drivers with different object types. For example, I would like to hear how hfi1 are going to define their user-kernel ABI (once they leave the custom ioctls). They should not be using the code in drivers/infiniband. usnic is such an example of a driver that should never have been added in it's current form. +1 Jason Matan
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 10/09/2016 20:01, Jason Gunthorpe wrote: On Sat, Sep 10, 2016 at 06:14:42PM +0200, Christoph Hellwig wrote: OFVWG meetings have absolutely zero relevance for Linux development. Well, to be fair there are a fair number of kernel developers on that particular call.. More "flexibility" for drivers just means giving up on designing a coherent API and leaving it to drivers authors to add crap to their own drivers. That's a major step backwards. Sadly, it isn't a step backwards, it is status quo - at least as far as the uapi is concerned. Every single user space driver has its own private abi file, carefully hidden in their driver, and dutifully copied over to user space: providers/cxgb3/iwch-abi.h providers/cxgb4/cxgb4-abi.h providers/hfi1verbs/hfi-abi.h providers/i40iw/i40iw-abi.h providers/ipathverbs/ipath-abi.h providers/mlx4/mlx4-abi.h providers/mlx5/mlx5-abi.h providers/mthca/mthca-abi.h providers/nes/nes-abi.h providers/ocrdma/ocrdma_abi.h providers/rxe/rxe-abi.h Just to pick two random examples: struct mlx5_create_cq { struct ibv_create_cqibv_cmd; __u64 buf_addr; __u64 db_addr; __u32 cqe_size; }; struct iwch_create_cq { struct ibv_create_cq ibv_cmd; uint64_t user_rptr_addr; }; Love to hear ideas on a way forward that doesn't involve rewriting everything :( Yeah, unfortunately, the RDMA ABI is more driver specific ABI than a common user-kernel ABI. I guess this will become even worse, as the RDMA subsystem is evolving to serve more drivers with different object types. For example, I would like to hear how hfi1 are going to define their user-kernel ABI (once they leave the custom ioctls). They should not be using the code in drivers/infiniband. usnic is such an example of a driver that should never have been added in it's current form. +1 Jason Matan
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 10/09/2016 19:12, Christoph Hellwig wrote: On Wed, Sep 07, 2016 at 01:25:13PM +0530, Parav Pandit wrote: a) delay cgroups support until the grand rewrite is done b) add it now and deal with the consequences later Can we do (b) now and differ adding any HW resources to cgroup until they are clearly called out. Architecture and APIs are already in place to support this. Sounds fine to me. The only thing I want to avoid is pie in the sky "future proofing" that leads to horrible architectures. And I assume that's what Matan proposed. NO, that not what I proposed. User-kernel API/ABI should be designed with drivers differences in mind. The internal design or internals APIs could ignore such things as they can be changed later.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 10/09/2016 19:12, Christoph Hellwig wrote: On Wed, Sep 07, 2016 at 01:25:13PM +0530, Parav Pandit wrote: a) delay cgroups support until the grand rewrite is done b) add it now and deal with the consequences later Can we do (b) now and differ adding any HW resources to cgroup until they are clearly called out. Architecture and APIs are already in place to support this. Sounds fine to me. The only thing I want to avoid is pie in the sky "future proofing" that leads to horrible architectures. And I assume that's what Matan proposed. NO, that not what I proposed. User-kernel API/ABI should be designed with drivers differences in mind. The internal design or internals APIs could ignore such things as they can be changed later.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sat, Sep 10, 2016 at 06:14:42PM +0200, Christoph Hellwig wrote: > OFVWG meetings have absolutely zero relevance for Linux development. Well, to be fair there are a fair number of kernel developers on that particular call.. > More "flexibility" for drivers just means giving up on designing a > coherent API and leaving it to drivers authors to add crap to their > own drivers. That's a major step backwards. Sadly, it isn't a step backwards, it is status quo - at least as far as the uapi is concerned. Every single user space driver has its own private abi file, carefully hidden in their driver, and dutifully copied over to user space: providers/cxgb3/iwch-abi.h providers/cxgb4/cxgb4-abi.h providers/hfi1verbs/hfi-abi.h providers/i40iw/i40iw-abi.h providers/ipathverbs/ipath-abi.h providers/mlx4/mlx4-abi.h providers/mlx5/mlx5-abi.h providers/mthca/mthca-abi.h providers/nes/nes-abi.h providers/ocrdma/ocrdma_abi.h providers/rxe/rxe-abi.h Just to pick two random examples: struct mlx5_create_cq { struct ibv_create_cqibv_cmd; __u64 buf_addr; __u64 db_addr; __u32 cqe_size; }; struct iwch_create_cq { struct ibv_create_cq ibv_cmd; uint64_t user_rptr_addr; }; Love to hear ideas on a way forward that doesn't involve rewriting everything :( > They should not be using the code in drivers/infiniband. usnic is such > an example of a driver that should never have been added in it's current > form. +1 Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Sat, Sep 10, 2016 at 06:14:42PM +0200, Christoph Hellwig wrote: > OFVWG meetings have absolutely zero relevance for Linux development. Well, to be fair there are a fair number of kernel developers on that particular call.. > More "flexibility" for drivers just means giving up on designing a > coherent API and leaving it to drivers authors to add crap to their > own drivers. That's a major step backwards. Sadly, it isn't a step backwards, it is status quo - at least as far as the uapi is concerned. Every single user space driver has its own private abi file, carefully hidden in their driver, and dutifully copied over to user space: providers/cxgb3/iwch-abi.h providers/cxgb4/cxgb4-abi.h providers/hfi1verbs/hfi-abi.h providers/i40iw/i40iw-abi.h providers/ipathverbs/ipath-abi.h providers/mlx4/mlx4-abi.h providers/mlx5/mlx5-abi.h providers/mthca/mthca-abi.h providers/nes/nes-abi.h providers/ocrdma/ocrdma_abi.h providers/rxe/rxe-abi.h Just to pick two random examples: struct mlx5_create_cq { struct ibv_create_cqibv_cmd; __u64 buf_addr; __u64 db_addr; __u32 cqe_size; }; struct iwch_create_cq { struct ibv_create_cq ibv_cmd; uint64_t user_rptr_addr; }; Love to hear ideas on a way forward that doesn't involve rewriting everything :( > They should not be using the code in drivers/infiniband. usnic is such > an example of a driver that should never have been added in it's current > form. +1 Jason
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Sep 07, 2016 at 11:51:42AM +0300, Matan Barak wrote: > All recent proposals of the new ABI schema deals with extending the > flexibility of the current schema by letting drivers define their specific > types, actions, attributes, etc. Even more than that, the dispatching > starts from the driver and it chooses if it wants to use the common RDMA > core layer or have it's own wise implementation instead. > Some drivers might even prefer not to implement the current verbs types. > These decisions were made in the OFVWG meetings. OFVWG meetings have absolutely zero relevance for Linux development. More "flexibility" for drivers just means giving up on designing a coherent API and leaving it to drivers authors to add crap to their own drivers. That's a major step backwards. > Sounds reasonable, but what about drivers which ignore the common code and > implement it in their own way? What about drivers which don't support the > standard RDMA types at all? They should not be using the code in drivers/infiniband. usnic is such an example of a driver that should never have been added in it's current form.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Sep 07, 2016 at 11:51:42AM +0300, Matan Barak wrote: > All recent proposals of the new ABI schema deals with extending the > flexibility of the current schema by letting drivers define their specific > types, actions, attributes, etc. Even more than that, the dispatching > starts from the driver and it chooses if it wants to use the common RDMA > core layer or have it's own wise implementation instead. > Some drivers might even prefer not to implement the current verbs types. > These decisions were made in the OFVWG meetings. OFVWG meetings have absolutely zero relevance for Linux development. More "flexibility" for drivers just means giving up on designing a coherent API and leaving it to drivers authors to add crap to their own drivers. That's a major step backwards. > Sounds reasonable, but what about drivers which ignore the common code and > implement it in their own way? What about drivers which don't support the > standard RDMA types at all? They should not be using the code in drivers/infiniband. usnic is such an example of a driver that should never have been added in it's current form.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Sep 07, 2016 at 01:25:13PM +0530, Parav Pandit wrote: > > a) delay cgroups support until the grand rewrite is done > > b) add it now and deal with the consequences later > > > Can we do (b) now and differ adding any HW resources to cgroup until > they are clearly called out. > Architecture and APIs are already in place to support this. Sounds fine to me. The only thing I want to avoid is pie in the sky "future proofing" that leads to horrible architectures. And I assume that's what Matan proposed.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Sep 07, 2016 at 01:25:13PM +0530, Parav Pandit wrote: > > a) delay cgroups support until the grand rewrite is done > > b) add it now and deal with the consequences later > > > Can we do (b) now and differ adding any HW resources to cgroup until > they are clearly called out. > Architecture and APIs are already in place to support this. Sounds fine to me. The only thing I want to avoid is pie in the sky "future proofing" that leads to horrible architectures. And I assume that's what Matan proposed.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Thu, Sep 8, 2016 at 11:42 AM, Leon Romanovskywrote: > On Wed, Sep 07, 2016 at 08:37:23PM +0530, Parav Pandit wrote: >> Did you get a chance to review the series? > > We need to decide on fundamental question before reviewing it, which is > "how to fit rdmacg to new ABI model". >From last discussion with Matan in this email thread, it appears that - only broken case are: (a) HW vendor driver specific resources (if they have crazy big list), which cannot be abstracted out well enough, won't be controlled by rdma cgroup. (b) Such resource objects are not well defined today with new ABI model. If such objects are well defined today, lets call them out and discuss with Doug, Tejun, Christoph and larger group, whether they qualify for inclusion or not. rdma cgroup currently supports including handful of HW resource that can be abstracted (at least at functionality level). Please include any other option issue, if any.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Thu, Sep 8, 2016 at 11:42 AM, Leon Romanovsky wrote: > On Wed, Sep 07, 2016 at 08:37:23PM +0530, Parav Pandit wrote: >> Did you get a chance to review the series? > > We need to decide on fundamental question before reviewing it, which is > "how to fit rdmacg to new ABI model". >From last discussion with Matan in this email thread, it appears that - only broken case are: (a) HW vendor driver specific resources (if they have crazy big list), which cannot be abstracted out well enough, won't be controlled by rdma cgroup. (b) Such resource objects are not well defined today with new ABI model. If such objects are well defined today, lets call them out and discuss with Doug, Tejun, Christoph and larger group, whether they qualify for inclusion or not. rdma cgroup currently supports including handful of HW resource that can be abstracted (at least at functionality level). Please include any other option issue, if any.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Sep 07, 2016 at 08:37:23PM +0530, Parav Pandit wrote: > Hi Leon, > > >> Signed-off-by: Parav Pandit> >> +static struct rdmacg_resource_pool * > >> +get_cg_rpool_locked(struct rdma_cgroup *cg, struct rdmacg_device *device) > >> +{ > >> + struct rdmacg_resource_pool *rpool; > >> + > >> + rpool = find_cg_rpool_locked(cg, device); > >> + if (rpool) > >> + return rpool; > >> + > >> + rpool = kzalloc(sizeof(*rpool), GFP_KERNEL); > >> + if (!rpool) > >> + return ERR_PTR(-ENOMEM); > >> + > >> + rpool->device = device; > >> + set_all_resource_max_limit(rpool); > >> + > >> + INIT_LIST_HEAD(>cg_node); > >> + INIT_LIST_HEAD(>dev_node); > >> + list_add_tail(>cg_node, >rpools); > >> + list_add_tail(>dev_node, >rpools); > >> + return rpool; > >> +} > > > > <...> > > > >> + for (p = cg; p; p = parent_rdmacg(p)) { > >> + rpool = get_cg_rpool_locked(p, device); > >> + if (IS_ERR_OR_NULL(rpool)) { > > > > get_cg_rpool_locked always returns !NULL (error, or pointer) > > Can this change go as incremental change after this patch, since this > patch is close to completion? > Or I need to revise v13? Sure, it is cleanup. It is not worth of respinning. > > > > >> + ret = PTR_ERR(rpool); > >> + goto err; > > > > I didn't review the whole series yet. > > Did you get a chance to review the series? We need to decide on fundamental question before reviewing it, which is "how to fit rdmacg to new ABI model". Thanks signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Sep 07, 2016 at 08:37:23PM +0530, Parav Pandit wrote: > Hi Leon, > > >> Signed-off-by: Parav Pandit > >> +static struct rdmacg_resource_pool * > >> +get_cg_rpool_locked(struct rdma_cgroup *cg, struct rdmacg_device *device) > >> +{ > >> + struct rdmacg_resource_pool *rpool; > >> + > >> + rpool = find_cg_rpool_locked(cg, device); > >> + if (rpool) > >> + return rpool; > >> + > >> + rpool = kzalloc(sizeof(*rpool), GFP_KERNEL); > >> + if (!rpool) > >> + return ERR_PTR(-ENOMEM); > >> + > >> + rpool->device = device; > >> + set_all_resource_max_limit(rpool); > >> + > >> + INIT_LIST_HEAD(>cg_node); > >> + INIT_LIST_HEAD(>dev_node); > >> + list_add_tail(>cg_node, >rpools); > >> + list_add_tail(>dev_node, >rpools); > >> + return rpool; > >> +} > > > > <...> > > > >> + for (p = cg; p; p = parent_rdmacg(p)) { > >> + rpool = get_cg_rpool_locked(p, device); > >> + if (IS_ERR_OR_NULL(rpool)) { > > > > get_cg_rpool_locked always returns !NULL (error, or pointer) > > Can this change go as incremental change after this patch, since this > patch is close to completion? > Or I need to revise v13? Sure, it is cleanup. It is not worth of respinning. > > > > >> + ret = PTR_ERR(rpool); > >> + goto err; > > > > I didn't review the whole series yet. > > Did you get a chance to review the series? We need to decide on fundamental question before reviewing it, which is "how to fit rdmacg to new ABI model". Thanks signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 07/09/2016 10:55, Parav Pandit wrote: Hi Matan, On Thu, Sep 1, 2016 at 2:14 PM, Christoph Hellwigwrote: On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote: Well, if I recall, the reason doing so last time was in order to allow flexible updating of ib_core independently, which is obviously not a good reason (to say the least). Since the new ABI will probably define new object types (all recent proposals go this way), the current approach could lead to either trying to map new objects to existing cgroup resource types, which could lead to some weird non 1:1 mapping, or having a split definitions - such that each driver will declare its objects both in the cgroups mechanism and in its driver dispatch table. Even worse than that, drivers could simply ignore the cgroups support while implementing their own resource types and get a very broken containers support. If drivers are broken due to ignorance of not-calling cgroup APIs, that should be ok. That particular driver should fix it. If the resource creation using uverbs is using well defined rdma level resource, uverbs level will make sure to honor cgroup limits without involving hw drivers anyway. All recent proposals of the new ABI schema deals with extending the flexibility of the current schema by letting drivers define their specific types, actions, attributes, etc. Even more than that, the dispatching starts from the driver and it chooses if it wants to use the common RDMA core layer or have it's own wise implementation instead. Some drivers might even prefer not to implement the current verbs types. These decisions were made in the OFVWG meetings. Anyway, maybe we should consider using a more higher-level logic objects that could fit multiple drivers requirements. RDMA Verb level resource is charged/uncharged by RDMA core. RDMA HW level resource is charged/uncharged by RDMA HW driver using well defined API and resource by cgroup core. This scheme ensures that there is 1:1 mapping. Sounds reasonable, but what about drivers which ignore the common code and implement it in their own way? What about drivers which don't support the standard RDMA types at all? I guess we should find a balance between something abstract and common enough that will ease administrator configuration but be specific enough for the various models we have (or will have) in the RDMA stack. I don't think current definition of resource type is carved out on stone. They can be extended as we forward. As many of us agree that, they should be well defined and it should be agreed by cgroup and rdma community. Of course, but since the ABI and cgroups model are somehow related, they should be dealt with together and approved by Doug who participated in some of the OFVWG meetings. For example, today we have RDMA_VERB_xxx resources. New well defined RDMA HW resources can be defined in rdma_cgroup.h file as RDMA_HW_xx in same enum table. So a driver will change the cgroups file for every new type it adds? Will we just have a super set enum of all crazy types vendors added? Sorry guys, but arbitrary extensibility for something not finished is the worst idea ever. We have two options here: a) delay cgroups support until the grand rewrite is done b) add it now and deal with the consequences later Can we do (b) now and differ adding any HW resources to cgroup until they are clearly called out. Architecture and APIs are already in place to support this. Since this affect the user, it's better to think how it fits our "optional standard"/"vendor types" model first. Maybe we could force all standard types even if the driver we use doesn't support any of them. That being said, adding random non-Verbs hardwasre to the RDMA core is the second worst idea ever. We can differ adding HW resource to core and cgroup until they are clearly defined. In that case current architecture still holds good. Clearly we should differ adding the actual code until a driver could declare such objects, but we need to decide how to expose the standard optional RDMA types (basically, the types you've added) and how future driver specific types fit in. Guess I need to catch up with the discussion and start using the flame thrower. Matan, Can you please point us to the new RFC/ABI email thread which describes the design, partitioning of code between core vs hw drivers etc. One proposal is [1]. There's another one from Sean which aims for similar targets regards the driver specific types. [1] https://www.spinics.net/lists/linux-rdma/msg38997.html Matan
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 07/09/2016 10:55, Parav Pandit wrote: Hi Matan, On Thu, Sep 1, 2016 at 2:14 PM, Christoph Hellwig wrote: On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote: Well, if I recall, the reason doing so last time was in order to allow flexible updating of ib_core independently, which is obviously not a good reason (to say the least). Since the new ABI will probably define new object types (all recent proposals go this way), the current approach could lead to either trying to map new objects to existing cgroup resource types, which could lead to some weird non 1:1 mapping, or having a split definitions - such that each driver will declare its objects both in the cgroups mechanism and in its driver dispatch table. Even worse than that, drivers could simply ignore the cgroups support while implementing their own resource types and get a very broken containers support. If drivers are broken due to ignorance of not-calling cgroup APIs, that should be ok. That particular driver should fix it. If the resource creation using uverbs is using well defined rdma level resource, uverbs level will make sure to honor cgroup limits without involving hw drivers anyway. All recent proposals of the new ABI schema deals with extending the flexibility of the current schema by letting drivers define their specific types, actions, attributes, etc. Even more than that, the dispatching starts from the driver and it chooses if it wants to use the common RDMA core layer or have it's own wise implementation instead. Some drivers might even prefer not to implement the current verbs types. These decisions were made in the OFVWG meetings. Anyway, maybe we should consider using a more higher-level logic objects that could fit multiple drivers requirements. RDMA Verb level resource is charged/uncharged by RDMA core. RDMA HW level resource is charged/uncharged by RDMA HW driver using well defined API and resource by cgroup core. This scheme ensures that there is 1:1 mapping. Sounds reasonable, but what about drivers which ignore the common code and implement it in their own way? What about drivers which don't support the standard RDMA types at all? I guess we should find a balance between something abstract and common enough that will ease administrator configuration but be specific enough for the various models we have (or will have) in the RDMA stack. I don't think current definition of resource type is carved out on stone. They can be extended as we forward. As many of us agree that, they should be well defined and it should be agreed by cgroup and rdma community. Of course, but since the ABI and cgroups model are somehow related, they should be dealt with together and approved by Doug who participated in some of the OFVWG meetings. For example, today we have RDMA_VERB_xxx resources. New well defined RDMA HW resources can be defined in rdma_cgroup.h file as RDMA_HW_xx in same enum table. So a driver will change the cgroups file for every new type it adds? Will we just have a super set enum of all crazy types vendors added? Sorry guys, but arbitrary extensibility for something not finished is the worst idea ever. We have two options here: a) delay cgroups support until the grand rewrite is done b) add it now and deal with the consequences later Can we do (b) now and differ adding any HW resources to cgroup until they are clearly called out. Architecture and APIs are already in place to support this. Since this affect the user, it's better to think how it fits our "optional standard"/"vendor types" model first. Maybe we could force all standard types even if the driver we use doesn't support any of them. That being said, adding random non-Verbs hardwasre to the RDMA core is the second worst idea ever. We can differ adding HW resource to core and cgroup until they are clearly defined. In that case current architecture still holds good. Clearly we should differ adding the actual code until a driver could declare such objects, but we need to decide how to expose the standard optional RDMA types (basically, the types you've added) and how future driver specific types fit in. Guess I need to catch up with the discussion and start using the flame thrower. Matan, Can you please point us to the new RFC/ABI email thread which describes the design, partitioning of code between core vs hw drivers etc. One proposal is [1]. There's another one from Sean which aims for similar targets regards the driver specific types. [1] https://www.spinics.net/lists/linux-rdma/msg38997.html Matan
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Leon, >> Signed-off-by: Parav Pandit>> +static struct rdmacg_resource_pool * >> +get_cg_rpool_locked(struct rdma_cgroup *cg, struct rdmacg_device *device) >> +{ >> + struct rdmacg_resource_pool *rpool; >> + >> + rpool = find_cg_rpool_locked(cg, device); >> + if (rpool) >> + return rpool; >> + >> + rpool = kzalloc(sizeof(*rpool), GFP_KERNEL); >> + if (!rpool) >> + return ERR_PTR(-ENOMEM); >> + >> + rpool->device = device; >> + set_all_resource_max_limit(rpool); >> + >> + INIT_LIST_HEAD(>cg_node); >> + INIT_LIST_HEAD(>dev_node); >> + list_add_tail(>cg_node, >rpools); >> + list_add_tail(>dev_node, >rpools); >> + return rpool; >> +} > > <...> > >> + for (p = cg; p; p = parent_rdmacg(p)) { >> + rpool = get_cg_rpool_locked(p, device); >> + if (IS_ERR_OR_NULL(rpool)) { > > get_cg_rpool_locked always returns !NULL (error, or pointer) Can this change go as incremental change after this patch, since this patch is close to completion? Or I need to revise v13? > >> + ret = PTR_ERR(rpool); >> + goto err; > > I didn't review the whole series yet. Did you get a chance to review the series?
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Leon, >> Signed-off-by: Parav Pandit >> +static struct rdmacg_resource_pool * >> +get_cg_rpool_locked(struct rdma_cgroup *cg, struct rdmacg_device *device) >> +{ >> + struct rdmacg_resource_pool *rpool; >> + >> + rpool = find_cg_rpool_locked(cg, device); >> + if (rpool) >> + return rpool; >> + >> + rpool = kzalloc(sizeof(*rpool), GFP_KERNEL); >> + if (!rpool) >> + return ERR_PTR(-ENOMEM); >> + >> + rpool->device = device; >> + set_all_resource_max_limit(rpool); >> + >> + INIT_LIST_HEAD(>cg_node); >> + INIT_LIST_HEAD(>dev_node); >> + list_add_tail(>cg_node, >rpools); >> + list_add_tail(>dev_node, >rpools); >> + return rpool; >> +} > > <...> > >> + for (p = cg; p; p = parent_rdmacg(p)) { >> + rpool = get_cg_rpool_locked(p, device); >> + if (IS_ERR_OR_NULL(rpool)) { > > get_cg_rpool_locked always returns !NULL (error, or pointer) Can this change go as incremental change after this patch, since this patch is close to completion? Or I need to revise v13? > >> + ret = PTR_ERR(rpool); >> + goto err; > > I didn't review the whole series yet. Did you get a chance to review the series?
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Sep 7, 2016 at 2:21 PM, Matan Barakwrote: > On 07/09/2016 10:55, Parav Pandit wrote: >> >> Hi Matan, >> >> On Thu, Sep 1, 2016 at 2:14 PM, Christoph Hellwig wrote: >>> >>> On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote: Well, if I recall, the reason doing so last time was in order to allow flexible updating of ib_core independently, which is obviously not a good reason (to say the least). Since the new ABI will probably define new object types (all recent proposals go this way), the current approach could lead to either trying to map new objects to existing cgroup resource types, which could lead to some weird non 1:1 mapping, or having a split definitions - such that each driver will declare its objects both in the cgroups mechanism and in its driver dispatch table. >> >> Even worse than that, drivers could simply ignore the cgroups support while implementing their own resource types and get a very broken containers support. >> >> If drivers are broken due to ignorance of not-calling cgroup APIs, >> that should be ok. >> That particular driver should fix it. >> If the resource creation using uverbs is using well defined rdma level >> resource, uverbs level will make sure to honor cgroup limits without >> involving hw drivers anyway. >> > > All recent proposals of the new ABI schema deals with extending the > flexibility of the current schema by letting drivers define their specific > types, actions, attributes, etc. Even more than that, the dispatching starts > from the driver and it chooses if it wants to use the common RDMA core layer > or have it's own wise implementation instead. > Some drivers might even prefer not to implement the current verbs types. > These decisions were made in the OFVWG meetings. o.k. If some drivers doesn't implement current verbs type, there is no point in controlling it either. In such case application space library won't even invoke resource allocation/free for unsupported resources. For resources (type in your word) which are not defined in cgroup, but exist in hw driver, cannot be controlled by cgroup. As you highlighted in your [1], "driver's specific attributes could someday become core's standard attributes", we should be able to add new resource type to existing rdma cgroup. > > Anyway, maybe we should consider using a more higher-level logic objects > that could fit multiple drivers requirements. I do not have any other objects other than QP, MR etc in mind currently. I can think of a RDMA specific socket that encompass one PD, AH, couple of MRs, and QP. But we don't have such resource abstraction and its data transport APIs in place. There is rsocket, various versions of MPI, libfabric etc in place. So I am not sure which high level objects to defined at this point. If you have such objects nailed down, we should be able to add them in incremental development. > >> RDMA Verb level resource is charged/uncharged by RDMA core. >> RDMA HW level resource is charged/uncharged by RDMA HW driver using >> well defined API and resource by cgroup core. >> This scheme ensures that there is 1:1 mapping. >> > > Sounds reasonable, but what about drivers which ignore the common code and > implement it in their own way? Shouldn't Doug ask them to use cgroup infrastructure instead of implementing resource charging/uncharing in their own way. It still unlikely or difficult for drivers to group processes them selves like cgroup to implement things in their own way. I agree, they can completely ignore RDMA verbs resources and implement their own HW resources. As you pointed below, we need to find balance between which hw resource to be defined and which not. For example, newly added XRQ object, which could be a pure buffer_tag with receive queue for other vendor. I am not sure how to abstract them into single object. > What about drivers which don't support the > standard RDMA types at all? > I guess we should find a balance between something abstract and common > enough that will ease administrator configuration but be specific enough for > the various models we have (or will have) in the RDMA stack. > >> I don't think current definition of resource type is carved out on stone. >> They can be extended as we forward. >> As many of us agree that, they should be well defined and it should be >> agreed by cgroup and rdma community. >> > > Of course, but since the ABI and cgroups model are somehow related, they > should be dealt with together and approved by Doug who participated in some > of the OFVWG meetings. Sure. > >> For example, today we have RDMA_VERB_xxx resources. >> New well defined RDMA HW resources can be defined in rdma_cgroup.h >> file as RDMA_HW_xx in same enum table. >> > > So a driver will change the cgroups file for every new type it adds? Well, we wanted to avoid that such churn in cgroup file, thats why v11 was
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Sep 7, 2016 at 2:21 PM, Matan Barak wrote: > On 07/09/2016 10:55, Parav Pandit wrote: >> >> Hi Matan, >> >> On Thu, Sep 1, 2016 at 2:14 PM, Christoph Hellwig wrote: >>> >>> On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote: Well, if I recall, the reason doing so last time was in order to allow flexible updating of ib_core independently, which is obviously not a good reason (to say the least). Since the new ABI will probably define new object types (all recent proposals go this way), the current approach could lead to either trying to map new objects to existing cgroup resource types, which could lead to some weird non 1:1 mapping, or having a split definitions - such that each driver will declare its objects both in the cgroups mechanism and in its driver dispatch table. >> >> Even worse than that, drivers could simply ignore the cgroups support while implementing their own resource types and get a very broken containers support. >> >> If drivers are broken due to ignorance of not-calling cgroup APIs, >> that should be ok. >> That particular driver should fix it. >> If the resource creation using uverbs is using well defined rdma level >> resource, uverbs level will make sure to honor cgroup limits without >> involving hw drivers anyway. >> > > All recent proposals of the new ABI schema deals with extending the > flexibility of the current schema by letting drivers define their specific > types, actions, attributes, etc. Even more than that, the dispatching starts > from the driver and it chooses if it wants to use the common RDMA core layer > or have it's own wise implementation instead. > Some drivers might even prefer not to implement the current verbs types. > These decisions were made in the OFVWG meetings. o.k. If some drivers doesn't implement current verbs type, there is no point in controlling it either. In such case application space library won't even invoke resource allocation/free for unsupported resources. For resources (type in your word) which are not defined in cgroup, but exist in hw driver, cannot be controlled by cgroup. As you highlighted in your [1], "driver's specific attributes could someday become core's standard attributes", we should be able to add new resource type to existing rdma cgroup. > > Anyway, maybe we should consider using a more higher-level logic objects > that could fit multiple drivers requirements. I do not have any other objects other than QP, MR etc in mind currently. I can think of a RDMA specific socket that encompass one PD, AH, couple of MRs, and QP. But we don't have such resource abstraction and its data transport APIs in place. There is rsocket, various versions of MPI, libfabric etc in place. So I am not sure which high level objects to defined at this point. If you have such objects nailed down, we should be able to add them in incremental development. > >> RDMA Verb level resource is charged/uncharged by RDMA core. >> RDMA HW level resource is charged/uncharged by RDMA HW driver using >> well defined API and resource by cgroup core. >> This scheme ensures that there is 1:1 mapping. >> > > Sounds reasonable, but what about drivers which ignore the common code and > implement it in their own way? Shouldn't Doug ask them to use cgroup infrastructure instead of implementing resource charging/uncharing in their own way. It still unlikely or difficult for drivers to group processes them selves like cgroup to implement things in their own way. I agree, they can completely ignore RDMA verbs resources and implement their own HW resources. As you pointed below, we need to find balance between which hw resource to be defined and which not. For example, newly added XRQ object, which could be a pure buffer_tag with receive queue for other vendor. I am not sure how to abstract them into single object. > What about drivers which don't support the > standard RDMA types at all? > I guess we should find a balance between something abstract and common > enough that will ease administrator configuration but be specific enough for > the various models we have (or will have) in the RDMA stack. > >> I don't think current definition of resource type is carved out on stone. >> They can be extended as we forward. >> As many of us agree that, they should be well defined and it should be >> agreed by cgroup and rdma community. >> > > Of course, but since the ABI and cgroups model are somehow related, they > should be dealt with together and approved by Doug who participated in some > of the OFVWG meetings. Sure. > >> For example, today we have RDMA_VERB_xxx resources. >> New well defined RDMA HW resources can be defined in rdma_cgroup.h >> file as RDMA_HW_xx in same enum table. >> > > So a driver will change the cgroups file for every new type it adds? Well, we wanted to avoid that such churn in cgroup file, thats why v11 was defining resources in IB core. But I
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Matan, On Thu, Sep 1, 2016 at 2:14 PM, Christoph Hellwigwrote: > On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote: >> Well, if I recall, the reason doing so last time was in order to allow >> flexible updating of ib_core independently, which is obviously not a good >> reason (to say the least). >> >> Since the new ABI will probably define new object types (all recent >> proposals go this way), the current approach could lead to either trying to >> map new objects to existing cgroup resource types, which could lead to some >> weird non 1:1 mapping, or having a split definitions - such that each >> driver will declare its objects both in the cgroups mechanism and in its >> driver dispatch table. >> Even worse than that, drivers could simply ignore the cgroups support while >> implementing their own resource types and get a very broken containers >> support. If drivers are broken due to ignorance of not-calling cgroup APIs, that should be ok. That particular driver should fix it. If the resource creation using uverbs is using well defined rdma level resource, uverbs level will make sure to honor cgroup limits without involving hw drivers anyway. RDMA Verb level resource is charged/uncharged by RDMA core. RDMA HW level resource is charged/uncharged by RDMA HW driver using well defined API and resource by cgroup core. This scheme ensures that there is 1:1 mapping. I don't think current definition of resource type is carved out on stone. They can be extended as we forward. As many of us agree that, they should be well defined and it should be agreed by cgroup and rdma community. For example, today we have RDMA_VERB_xxx resources. New well defined RDMA HW resources can be defined in rdma_cgroup.h file as RDMA_HW_xx in same enum table. > > Sorry guys, but arbitrary extensibility for something not finished is the > worst idea ever. We have two options here: > > a) delay cgroups support until the grand rewrite is done > b) add it now and deal with the consequences later > Can we do (b) now and differ adding any HW resources to cgroup until they are clearly called out. Architecture and APIs are already in place to support this. > That being said, adding random non-Verbs hardwasre to the RDMA core is > the second worst idea ever. We can differ adding HW resource to core and cgroup until they are clearly defined. In that case current architecture still holds good. > Guess I need to catch up with the > discussion and start using the flame thrower. Matan, Can you please point us to the new RFC/ABI email thread which describes the design, partitioning of code between core vs hw drivers etc.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hi Matan, On Thu, Sep 1, 2016 at 2:14 PM, Christoph Hellwig wrote: > On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote: >> Well, if I recall, the reason doing so last time was in order to allow >> flexible updating of ib_core independently, which is obviously not a good >> reason (to say the least). >> >> Since the new ABI will probably define new object types (all recent >> proposals go this way), the current approach could lead to either trying to >> map new objects to existing cgroup resource types, which could lead to some >> weird non 1:1 mapping, or having a split definitions - such that each >> driver will declare its objects both in the cgroups mechanism and in its >> driver dispatch table. >> Even worse than that, drivers could simply ignore the cgroups support while >> implementing their own resource types and get a very broken containers >> support. If drivers are broken due to ignorance of not-calling cgroup APIs, that should be ok. That particular driver should fix it. If the resource creation using uverbs is using well defined rdma level resource, uverbs level will make sure to honor cgroup limits without involving hw drivers anyway. RDMA Verb level resource is charged/uncharged by RDMA core. RDMA HW level resource is charged/uncharged by RDMA HW driver using well defined API and resource by cgroup core. This scheme ensures that there is 1:1 mapping. I don't think current definition of resource type is carved out on stone. They can be extended as we forward. As many of us agree that, they should be well defined and it should be agreed by cgroup and rdma community. For example, today we have RDMA_VERB_xxx resources. New well defined RDMA HW resources can be defined in rdma_cgroup.h file as RDMA_HW_xx in same enum table. > > Sorry guys, but arbitrary extensibility for something not finished is the > worst idea ever. We have two options here: > > a) delay cgroups support until the grand rewrite is done > b) add it now and deal with the consequences later > Can we do (b) now and differ adding any HW resources to cgroup until they are clearly called out. Architecture and APIs are already in place to support this. > That being said, adding random non-Verbs hardwasre to the RDMA core is > the second worst idea ever. We can differ adding HW resource to core and cgroup until they are clearly defined. In that case current architecture still holds good. > Guess I need to catch up with the > discussion and start using the flame thrower. Matan, Can you please point us to the new RFC/ABI email thread which describes the design, partitioning of code between core vs hw drivers etc.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 01/09/2016 00:16, Tejun Heo wrote: Hello, On Wed, Aug 31, 2016 at 06:07:30PM +0300, Matan Barak wrote: Currently, there are some discussions regarding the RDMA ABI. The current proposed approach (after a lot of discussions in the OFVWG) is to have driver dependent object types rather than the fixed set of IB object types we have today. AFAIK, some vendors might want to use the RDMA subsystem for a different fabrics which has a different set of objects. You could see RFCs for such concepts both from Mellanox and Intel on the linux-rdma mailing list. Saying that, maybe we need to make the resource types a bit more flexible and dynamic. That'd be back to square one and Christoph was dead against it too, so... Well, if I recall, the reason doing so last time was in order to allow flexible updating of ib_core independently, which is obviously not a good reason (to say the least). Since the new ABI will probably define new object types (all recent proposals go this way), the current approach could lead to either trying to map new objects to existing cgroup resource types, which could lead to some weird non 1:1 mapping, or having a split definitions - such that each driver will declare its objects both in the cgroups mechanism and in its driver dispatch table. Even worse than that, drivers could simply ignore the cgroups support while implementing their own resource types and get a very broken containers support. Thanks. Matan
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 01/09/2016 00:16, Tejun Heo wrote: Hello, On Wed, Aug 31, 2016 at 06:07:30PM +0300, Matan Barak wrote: Currently, there are some discussions regarding the RDMA ABI. The current proposed approach (after a lot of discussions in the OFVWG) is to have driver dependent object types rather than the fixed set of IB object types we have today. AFAIK, some vendors might want to use the RDMA subsystem for a different fabrics which has a different set of objects. You could see RFCs for such concepts both from Mellanox and Intel on the linux-rdma mailing list. Saying that, maybe we need to make the resource types a bit more flexible and dynamic. That'd be back to square one and Christoph was dead against it too, so... Well, if I recall, the reason doing so last time was in order to allow flexible updating of ib_core independently, which is obviously not a good reason (to say the least). Since the new ABI will probably define new object types (all recent proposals go this way), the current approach could lead to either trying to map new objects to existing cgroup resource types, which could lead to some weird non 1:1 mapping, or having a split definitions - such that each driver will declare its objects both in the cgroups mechanism and in its driver dispatch table. Even worse than that, drivers could simply ignore the cgroups support while implementing their own resource types and get a very broken containers support. Thanks. Matan
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote: > Well, if I recall, the reason doing so last time was in order to allow > flexible updating of ib_core independently, which is obviously not a good > reason (to say the least). > > Since the new ABI will probably define new object types (all recent > proposals go this way), the current approach could lead to either trying to > map new objects to existing cgroup resource types, which could lead to some > weird non 1:1 mapping, or having a split definitions - such that each > driver will declare its objects both in the cgroups mechanism and in its > driver dispatch table. > Even worse than that, drivers could simply ignore the cgroups support while > implementing their own resource types and get a very broken containers > support. Sorry guys, but arbitrary extensibility for something not finished is the worst idea ever. We have two options here: a) delay cgroups support until the grand rewrite is done b) add it now and deal with the consequences later That being said, adding random non-Verbs hardwasre to the RDMA core is the second worst idea ever. Guess I need to catch up with the discussion and start using the flame thrower.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote: > Well, if I recall, the reason doing so last time was in order to allow > flexible updating of ib_core independently, which is obviously not a good > reason (to say the least). > > Since the new ABI will probably define new object types (all recent > proposals go this way), the current approach could lead to either trying to > map new objects to existing cgroup resource types, which could lead to some > weird non 1:1 mapping, or having a split definitions - such that each > driver will declare its objects both in the cgroups mechanism and in its > driver dispatch table. > Even worse than that, drivers could simply ignore the cgroups support while > implementing their own resource types and get a very broken containers > support. Sorry guys, but arbitrary extensibility for something not finished is the worst idea ever. We have two options here: a) delay cgroups support until the grand rewrite is done b) add it now and deal with the consequences later That being said, adding random non-Verbs hardwasre to the RDMA core is the second worst idea ever. Guess I need to catch up with the discussion and start using the flame thrower.
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 31/08/2016 11:37, Parav Pandit wrote: Added rdma cgroup controller that does accounting, limit enforcement on rdma/IB verbs and hw resources. Added rdma cgroup header file which defines its APIs to perform charing/uncharing functionality. It also defined APIs for RDMA/IB stack for device registration. Devices which are registered will participate in controller functions of accounting and limit enforcements. It define rdmacg_device structure to bind IB stack and RDMA cgroup controller. RDMA resources are tracked using resource pool. Resource pool is per device, per cgroup entity which allows setting up accounting limits on per device basis. Currently resources are defined by the RDMA cgroup. Resource pool is created/destroyed dynamically whenever charging/uncharging occurs respectively and whenever user configuration is done. Its a tradeoff of memory vs little more code space that creates resource pool object whenever necessary, instead of creating them during cgroup creation and device registration time. Signed-off-by: Parav Pandit--- include/linux/cgroup_rdma.h | 66 + include/linux/cgroup_subsys.h | 4 + init/Kconfig | 10 + kernel/Makefile | 1 + kernel/cgroup_rdma.c | 664 ++ 5 files changed, 745 insertions(+) create mode 100644 include/linux/cgroup_rdma.h create mode 100644 kernel/cgroup_rdma.c diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h new file mode 100644 index 000..6710e28 --- /dev/null +++ b/include/linux/cgroup_rdma.h @@ -0,0 +1,66 @@ +/* + * Copyright (C) 2016 Parav Pandit + * + * This file is subject to the terms and conditions of version 2 of the GNU + * General Public License. See the file COPYING in the main directory of the + * Linux distribution for more details. + */ + +#ifndef _CGROUP_RDMA_H +#define _CGROUP_RDMA_H + +#include + +enum rdmacg_resource_type { + RDMACG_VERB_RESOURCE_UCTX, + RDMACG_VERB_RESOURCE_AH, + RDMACG_VERB_RESOURCE_PD, + RDMACG_VERB_RESOURCE_CQ, + RDMACG_VERB_RESOURCE_MR, + RDMACG_VERB_RESOURCE_MW, + RDMACG_VERB_RESOURCE_SRQ, + RDMACG_VERB_RESOURCE_QP, + RDMACG_VERB_RESOURCE_FLOW, + /* +* add any hw specific resource here as RDMA_HW_RESOURCE_NAME +*/ + RDMACG_RESOURCE_MAX, +}; + +#ifdef CONFIG_CGROUP_RDMA + Currently, there are some discussions regarding the RDMA ABI. The current proposed approach (after a lot of discussions in the OFVWG) is to have driver dependent object types rather than the fixed set of IB object types we have today. AFAIK, some vendors might want to use the RDMA subsystem for a different fabrics which has a different set of objects. You could see RFCs for such concepts both from Mellanox and Intel on the linux-rdma mailing list. Saying that, maybe we need to make the resource types a bit more flexible and dynamic. Regards, Matan
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On 31/08/2016 11:37, Parav Pandit wrote: Added rdma cgroup controller that does accounting, limit enforcement on rdma/IB verbs and hw resources. Added rdma cgroup header file which defines its APIs to perform charing/uncharing functionality. It also defined APIs for RDMA/IB stack for device registration. Devices which are registered will participate in controller functions of accounting and limit enforcements. It define rdmacg_device structure to bind IB stack and RDMA cgroup controller. RDMA resources are tracked using resource pool. Resource pool is per device, per cgroup entity which allows setting up accounting limits on per device basis. Currently resources are defined by the RDMA cgroup. Resource pool is created/destroyed dynamically whenever charging/uncharging occurs respectively and whenever user configuration is done. Its a tradeoff of memory vs little more code space that creates resource pool object whenever necessary, instead of creating them during cgroup creation and device registration time. Signed-off-by: Parav Pandit --- include/linux/cgroup_rdma.h | 66 + include/linux/cgroup_subsys.h | 4 + init/Kconfig | 10 + kernel/Makefile | 1 + kernel/cgroup_rdma.c | 664 ++ 5 files changed, 745 insertions(+) create mode 100644 include/linux/cgroup_rdma.h create mode 100644 kernel/cgroup_rdma.c diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h new file mode 100644 index 000..6710e28 --- /dev/null +++ b/include/linux/cgroup_rdma.h @@ -0,0 +1,66 @@ +/* + * Copyright (C) 2016 Parav Pandit + * + * This file is subject to the terms and conditions of version 2 of the GNU + * General Public License. See the file COPYING in the main directory of the + * Linux distribution for more details. + */ + +#ifndef _CGROUP_RDMA_H +#define _CGROUP_RDMA_H + +#include + +enum rdmacg_resource_type { + RDMACG_VERB_RESOURCE_UCTX, + RDMACG_VERB_RESOURCE_AH, + RDMACG_VERB_RESOURCE_PD, + RDMACG_VERB_RESOURCE_CQ, + RDMACG_VERB_RESOURCE_MR, + RDMACG_VERB_RESOURCE_MW, + RDMACG_VERB_RESOURCE_SRQ, + RDMACG_VERB_RESOURCE_QP, + RDMACG_VERB_RESOURCE_FLOW, + /* +* add any hw specific resource here as RDMA_HW_RESOURCE_NAME +*/ + RDMACG_RESOURCE_MAX, +}; + +#ifdef CONFIG_CGROUP_RDMA + Currently, there are some discussions regarding the RDMA ABI. The current proposed approach (after a lot of discussions in the OFVWG) is to have driver dependent object types rather than the fixed set of IB object types we have today. AFAIK, some vendors might want to use the RDMA subsystem for a different fabrics which has a different set of objects. You could see RFCs for such concepts both from Mellanox and Intel on the linux-rdma mailing list. Saying that, maybe we need to make the resource types a bit more flexible and dynamic. Regards, Matan
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hello, On Wed, Aug 31, 2016 at 06:07:30PM +0300, Matan Barak wrote: > Currently, there are some discussions regarding the RDMA ABI. The current > proposed approach (after a lot of discussions in the OFVWG) is to have > driver dependent object types rather than the fixed set of IB object types > we have today. > AFAIK, some vendors might want to use the RDMA subsystem for a different > fabrics which has a different set of objects. > You could see RFCs for such concepts both from Mellanox and Intel on the > linux-rdma mailing list. > > Saying that, maybe we need to make the resource types a bit more flexible > and dynamic. That'd be back to square one and Christoph was dead against it too, so... Thanks. -- tejun
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
Hello, On Wed, Aug 31, 2016 at 06:07:30PM +0300, Matan Barak wrote: > Currently, there are some discussions regarding the RDMA ABI. The current > proposed approach (after a lot of discussions in the OFVWG) is to have > driver dependent object types rather than the fixed set of IB object types > we have today. > AFAIK, some vendors might want to use the RDMA subsystem for a different > fabrics which has a different set of objects. > You could see RFCs for such concepts both from Mellanox and Intel on the > linux-rdma mailing list. > > Saying that, maybe we need to make the resource types a bit more flexible > and dynamic. That'd be back to square one and Christoph was dead against it too, so... Thanks. -- tejun
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Aug 31, 2016 at 02:07:25PM +0530, Parav Pandit wrote: > Added rdma cgroup controller that does accounting, limit enforcement > on rdma/IB verbs and hw resources. > > Added rdma cgroup header file which defines its APIs to perform > charing/uncharing functionality. It also defined APIs for RDMA/IB > stack for device registration. Devices which are registered will > participate in controller functions of accounting and limit > enforcements. It define rdmacg_device structure to bind IB stack > and RDMA cgroup controller. > > RDMA resources are tracked using resource pool. Resource pool is per > device, per cgroup entity which allows setting up accounting limits > on per device basis. > > Currently resources are defined by the RDMA cgroup. > > Resource pool is created/destroyed dynamically whenever > charging/uncharging occurs respectively and whenever user > configuration is done. Its a tradeoff of memory vs little more code > space that creates resource pool object whenever necessary, instead of > creating them during cgroup creation and device registration time. > > Signed-off-by: Parav Pandit> --- <...> > + > +static struct rdmacg_resource_pool * > +get_cg_rpool_locked(struct rdma_cgroup *cg, struct rdmacg_device *device) > +{ > + struct rdmacg_resource_pool *rpool; > + > + rpool = find_cg_rpool_locked(cg, device); > + if (rpool) > + return rpool; > + > + rpool = kzalloc(sizeof(*rpool), GFP_KERNEL); > + if (!rpool) > + return ERR_PTR(-ENOMEM); > + > + rpool->device = device; > + set_all_resource_max_limit(rpool); > + > + INIT_LIST_HEAD(>cg_node); > + INIT_LIST_HEAD(>dev_node); > + list_add_tail(>cg_node, >rpools); > + list_add_tail(>dev_node, >rpools); > + return rpool; > +} <...> > + for (p = cg; p; p = parent_rdmacg(p)) { > + rpool = get_cg_rpool_locked(p, device); > + if (IS_ERR_OR_NULL(rpool)) { get_cg_rpool_locked always returns !NULL (error, or pointer) > + ret = PTR_ERR(rpool); > + goto err; I didn't review the whole series yet. signature.asc Description: PGP signature
Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller
On Wed, Aug 31, 2016 at 02:07:25PM +0530, Parav Pandit wrote: > Added rdma cgroup controller that does accounting, limit enforcement > on rdma/IB verbs and hw resources. > > Added rdma cgroup header file which defines its APIs to perform > charing/uncharing functionality. It also defined APIs for RDMA/IB > stack for device registration. Devices which are registered will > participate in controller functions of accounting and limit > enforcements. It define rdmacg_device structure to bind IB stack > and RDMA cgroup controller. > > RDMA resources are tracked using resource pool. Resource pool is per > device, per cgroup entity which allows setting up accounting limits > on per device basis. > > Currently resources are defined by the RDMA cgroup. > > Resource pool is created/destroyed dynamically whenever > charging/uncharging occurs respectively and whenever user > configuration is done. Its a tradeoff of memory vs little more code > space that creates resource pool object whenever necessary, instead of > creating them during cgroup creation and device registration time. > > Signed-off-by: Parav Pandit > --- <...> > + > +static struct rdmacg_resource_pool * > +get_cg_rpool_locked(struct rdma_cgroup *cg, struct rdmacg_device *device) > +{ > + struct rdmacg_resource_pool *rpool; > + > + rpool = find_cg_rpool_locked(cg, device); > + if (rpool) > + return rpool; > + > + rpool = kzalloc(sizeof(*rpool), GFP_KERNEL); > + if (!rpool) > + return ERR_PTR(-ENOMEM); > + > + rpool->device = device; > + set_all_resource_max_limit(rpool); > + > + INIT_LIST_HEAD(>cg_node); > + INIT_LIST_HEAD(>dev_node); > + list_add_tail(>cg_node, >rpools); > + list_add_tail(>dev_node, >rpools); > + return rpool; > +} <...> > + for (p = cg; p; p = parent_rdmacg(p)) { > + rpool = get_cg_rpool_locked(p, device); > + if (IS_ERR_OR_NULL(rpool)) { get_cg_rpool_locked always returns !NULL (error, or pointer) > + ret = PTR_ERR(rpool); > + goto err; I didn't review the whole series yet. signature.asc Description: PGP signature