Re: [PATCH 0/2] Adding support for RoCE V2 specification
On Tue, Dec 2, 2014 at 7:35 AM, Somnath Kotur wrote: > HI Moni, > Thank you for your comments , please find my response inline. > > >> > The overarching goal behind this patch is to keep RDMA-CM as the >> > central entity that decides the protocol (ROCEV2 /ROCEV1) and hides all the >> address resolution details from applications. >> I agree but with one comment. RDMA-CM, although preferred, is not >> mandatory to establish a connection. The solution needs to take care also for >> apps that don't use RDMA-CM > > Agreed and this patch takes care of that as well thanks to the solution that > was put in place > for IP-based GIDs anyway in `modify_qp` effectively by means of the > ib_resolve_l2_attrs(). > >> >> > This is just a continuation of the existing RDMA-CM design philosophy >> > for IP Based GIDs that lets applications operate over any form of RDMA >> Service in a completely transparent way without any changes to the >> applications themselves. >> > >> > Protocol Resolution(RoCEV2 or V1) must be done even before we send out >> > the 1st connection request packet, the corresponding MAD pkt must be V2 >> for a destination that is behind a router, correct? >> > I am not sure we want to try sending out RoCEv2 connection requests first >> and if that fails ,then fallback/retry with RoCEv1 request. >> I agree but when you send the first packet (request or response) you know >> the SGID index to be used. Knowing that and assuming you chose the SGID >> index correctly there is no need to guess. >> A suggestion for choosing the SGID index is: >> 1. First, populate GID table with GIDs matching the IP address list of the >> corresponding net devices. Each GID will appear twice, one for >> RoCEv1 and one for RoCEv2. >> 2. When connection is being established RDMA-CM will choose one of the >> 2 matching indices based on a preferred protocol attribute for the rmda_id >> (new attr.) 3. Proffered protocol is >> a) global for the node/fabric based on the administrator settings >> b) can be changed by applications that want to override the default >> (requires extra API) > > Agree to point 1 which is what we were thinking of doing as well as part of > the 'override' option. > Still feel that the 'default' protocol resolution should be based on help > from the IP/Networking stack > as that would straight away enable existing applications and majority of > use-case scenarios that use RDMA-CM to seamlessly > work without any change for RoCE V2. > What this means is now RDMA-CM will choose one of the 2 matching GID indices > based on help from the IP stack. > > For this to happen , the 'gid_cache' structure will have to undergo a change > to now incorporate a new 'gid_type' field. > Accordingly the helper functions that find a cached gid entry - > ib_find_cached_gid() will also undergo a change to take into account this new > field. > > I plan to resubmit the patch series making the following changes > a) Populating GID table with 2 entries per IP address > b) Changes to the GID cache structure along with it's helper functions > c) Make sure that RDMA-CM selects/finds the corresponding GID once a network > type is chosen based on help from the IP stack. > > > The only thing missing is the ability for end-nodes connected locally inside > same subnet to communicate using V2 and that > could be provided as an 'override' option using the 'preferred protocol' > attribute setting on a system-wide basis or on an application basis > as you have proposed. > This I propose as an additional patch after the above patch goes in first. > > Does this sound like a plan ? > > Thanks > Som > Hi Som Generally, I agree with the changes you suggested. To summarize (and make sure I got it right) your plan for V2 is to: 1. Associate gid_index with RoCE type in GID table and GID cache. Requires helper functions to search for gid_index according to GID value and type 2. Move the code that monitors IP address changes to populate GID table from device driver to IB/core. Requires each RoCE device driver to implement add_gid_entry/rem_gid_entry functions 3. Remove the RoCE type from address structure (e.g. ib_ah_attr). Leave it only in rdma_addr to keep the hint from the IP stack 4. Choose sgid_index (from all matching entries) in RDMA-CM based on hint from IP stack, user preference and admin policy If so, I guess that we have a plan. Thanks Moni -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/2] Adding support for RoCE V2 specification
HI Moni, Thank you for your comments , please find my response inline. > > The overarching goal behind this patch is to keep RDMA-CM as the > > central entity that decides the protocol (ROCEV2 /ROCEV1) and hides all the > address resolution details from applications. > I agree but with one comment. RDMA-CM, although preferred, is not > mandatory to establish a connection. The solution needs to take care also for > apps that don't use RDMA-CM Agreed and this patch takes care of that as well thanks to the solution that was put in place for IP-based GIDs anyway in `modify_qp` effectively by means of the ib_resolve_l2_attrs(). > > > This is just a continuation of the existing RDMA-CM design philosophy > > for IP Based GIDs that lets applications operate over any form of RDMA > Service in a completely transparent way without any changes to the > applications themselves. > > > > Protocol Resolution(RoCEV2 or V1) must be done even before we send out > > the 1st connection request packet, the corresponding MAD pkt must be V2 > for a destination that is behind a router, correct? > > I am not sure we want to try sending out RoCEv2 connection requests first > and if that fails ,then fallback/retry with RoCEv1 request. > I agree but when you send the first packet (request or response) you know > the SGID index to be used. Knowing that and assuming you chose the SGID > index correctly there is no need to guess. > A suggestion for choosing the SGID index is: > 1. First, populate GID table with GIDs matching the IP address list of the > corresponding net devices. Each GID will appear twice, one for > RoCEv1 and one for RoCEv2. > 2. When connection is being established RDMA-CM will choose one of the > 2 matching indices based on a preferred protocol attribute for the rmda_id > (new attr.) 3. Proffered protocol is > a) global for the node/fabric based on the administrator settings > b) can be changed by applications that want to override the default > (requires extra API) Agree to point 1 which is what we were thinking of doing as well as part of the 'override' option. Still feel that the 'default' protocol resolution should be based on help from the IP/Networking stack as that would straight away enable existing applications and majority of use-case scenarios that use RDMA-CM to seamlessly work without any change for RoCE V2. What this means is now RDMA-CM will choose one of the 2 matching GID indices based on help from the IP stack. For this to happen , the 'gid_cache' structure will have to undergo a change to now incorporate a new 'gid_type' field. Accordingly the helper functions that find a cached gid entry - ib_find_cached_gid() will also undergo a change to take into account this new field. I plan to resubmit the patch series making the following changes a) Populating GID table with 2 entries per IP address b) Changes to the GID cache structure along with it's helper functions c) Make sure that RDMA-CM selects/finds the corresponding GID once a network type is chosen based on help from the IP stack. The only thing missing is the ability for end-nodes connected locally inside same subnet to communicate using V2 and that could be provided as an 'override' option using the 'preferred protocol' attribute setting on a system-wide basis or on an application basis as you have proposed. This I propose as an additional patch after the above patch goes in first. Does this sound like a plan ? Thanks Som N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj"��!�i
RE: [RFC PATCH 00/16] ib_mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing.
> > > >>>[RFC PATCH 07/16] ib/mad: create a jumbo MAD kmem_cache > >> why not use a single kmem-cache instance with a non hard coded > >> element size, 256B (or whatever we use today) or 2KB? > > I wanted to be able to adjust the element count of the caches separately to > better tune overall memory usage. However, I stopped short of adding > additional module parameters to adjust the 2K cache at this time. > > > I tend to think that the resulted code is too much of a special purpose one > under a (jumbo == 2K) assumption. See some more comments in the individual > patches and we'll take it from there. > Ok, I'll address those comments in the other email threads. > > > > > >> Also (nit), please change the prefix for all patches to be IB/mad: > >> and not > >> ib/mad: to comply with the existing habit of patch titles for the IB > >> subsystem > > I will thanks. > > Good. See below another easy-to-fix nitpicking comment, but before that, for > the sake of easier review and post-robustness of the code to future > bisections, > please do a re-ordering of the series such that all general refactoring and > pre- > patches come before the OPApatches. > > This goes to re-order the current series such tat patches 8/9/10 are located > after patch 14, as listed here: > >[RFC PATCH 01/16] ib/mad: rename is_data_mad to is_rmpp_data_mad >[RFC PATCH 02/16] ib/core: add IB_DEVICE_JUMBO_MAD_SUPPORT device > cap >[RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device >[RFC PATCH 04/16] ib/mad: add base version parameter to >[RFC PATCH 05/16] ib/mad: Add MAD size parameters to process_mad >[RFC PATCH 06/16] ib/mad: Create jumbo_mad data structures >[RFC PATCH 07/16] ib/mad: create a jumbo MAD kmem_cache >[RFC PATCH 11/16] ib/mad: create helper function for >[RFC PATCH 12/16] ib/mad: create helper function for >[RFC PATCH 13/16] ib/mad: create helper function for >[RFC PATCH 14/16] ib/mad: Create helper function for SMI processing >[RFC PATCH 08/16] ib/mad: Add Intel Omni-Path Architecture defines >[RFC PATCH 09/16] ib/mad: Implement support for Intel Omni-Path >[RFC PATCH 10/16] ib/mad: Add registration check for Intel Omni-Path >[RFC PATCH 15/16] ib/mad: Implement Intel Omni-Path Architecture SMP >[RFC PATCH 16/16] ib/mad: Implement Intel Omni-Path Architecture General > Done. > Another easy-to-fix nitpicking comment would be to have all the patches be > consistent w.r.t to the capitalization of the 1st letter in the 1st word > after the > IB/core: or IB/mad: prefix, e.g > > ib/mad: create helper function for smi_handle_dr_smp_send > > becomes > > IB/mad: Create helper function for smi_handle_dr_smp_send Done. > > BTW, here my personal preference is "Add helper" and not "Create helper" > > IB/mad: Add helper function for smi_handle_dr_smp_send Done. Ira N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj"��!�i
RE: [PATCH] iw_cxgb4: Wake up waiters after flushing the qp
Hey Roland, please reply and let me know if you will push these to 3.18 (would have to happen now) or if they are destined for 3.19. Thanks, Steve. > -Original Message- > From: Steve Wise [mailto:sw...@opengridcomputing.com] > Sent: Friday, November 21, 2014 9:42 AM > To: 'rol...@purestorage.com' > Cc: 'linux-rdma@vger.kernel.org' > Subject: RE: [PATCH] iw_cxgb4: Wake up waiters after flushing the qp > > I've pushed these to: > > git://git.openfabrics.org/~swise/linux.git for_roland > > Steve > > Hariprasad Shenai (2): > iw_cxgb4: Fixes locking issue in process_mpa_request > iw_cxgb4: limit MRs to < 8GB for T4/T5 devices > > Pramod Kumar (2): > cxgb4/rdma:Increase epd buff size for debug interface > cxgb4/rdma: configure 0B MRs to match HW implementation > > Steve Wise (1): > iw_cxgb4: Wake up waiters after flushing the qp > > > > -Original Message- > > From: Steve Wise [mailto:sw...@opengridcomputing.com] > > Sent: Friday, November 21, 2014 9:26 AM > > To: rol...@purestorage.com > > Cc: linux-rdma@vger.kernel.org > > Subject: Re: [PATCH] iw_cxgb4: Wake up waiters after flushing the qp > > > > Hey Roland, > > > > Hope you can get this bug fix into 3.18. Plus these others that have > > been submitted a while ago. Here is the entire list: > > > > https://patchwork.kernel.org/patch/5106541/ > > https://patchwork.kernel.org/patch/5249511/ > > https://patchwork.kernel.org/patch/5249501/ > > https://patchwork.kernel.org/patch/5089511/ > > https://patchwork.kernel.org/patch/5089501/ > > > > > > Thanks, > > > > Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html