RE: [PATCH 2/2] ipoib mcast sendonly join: Move multicast specific code out of ipoib_main.c.
> > On Fri, 11 Dec 2015, ira.weiny wrote: > > > I think I would rather see this called something like > > > > ipoib_add_to_list_sendonly > > > > Or something... > > > > Calling it iboib_check* sounds like it should return a bool. > > Hmm... It only adds the multicast group if the check was successful. > > How about > > ipoib_check_and_add_mcast_sendonly() Better. > > > > > +void ipoib_check_mcast_sendonly(struct ipoib_dev_priv *priv, u8 > *mgid, > > > + struct list_head *remove_list) > > > +{ > > > + /* Is this multicast ? */ > > > + if (*mgid == 0xff) { > > > > Odd to see a mgid variable which is only u8? > > > > How about "gid_prefix"? > > That is only used in the qib driver and there it is a field. > > mgid is a pointer to the seres of bytes of the MGID and the first byte of that > signifies multicast if 0xff Understood, I misread the code at first. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] IB/mad: In validate_mad, validate CM send method for attributes other than ClassPortInfo
> > > Receipt of CM MAD with other than the Send method for an attribute other > than the ClassPortInfo attribute is invalid. > > CM attributes other than ClassPortInfo only use the send method. > > The SRP initiator does not maintain a timeout policy for CM connect requests > relies on the CM layer to do that. The result was that the SRP initiator hung > as > the connect request never completed. > > A new SRP target has been observed to respond to Send CM REQ with GetResp > of CM REQ with bad status. This is non conformant with IBA spec but exposes a > vulnerability in the current MAD/CM code which will respond to the incoming > GetResp of CM REQ as if it was a valid incoming Send of CM REQ rather than > tossing this on the floor. It also causes the MAD layer not to retransmit the > original REQ even though it has not received a REP. > > Reviewed-by: Sagi GrimbergReviewed-by: Ira Weiny > Signed-off-by: Hal Rosenstock > --- > Changes since v1: > Removed ClassPortInfo method validation > > drivers/infiniband/core/mad.c |5 + > include/rdma/ib_mad.h |2 ++ > 2 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c > index 8d8af7a..2281de1 100644 > --- a/drivers/infiniband/core/mad.c > +++ b/drivers/infiniband/core/mad.c > @@ -1811,6 +1811,11 @@ static int validate_mad(const struct ib_mad_hdr > *mad_hdr, > if (qp_num == 0) > valid = 1; > } else { > + /* CM attributes other than ClassPortInfo only use Send > method */ > + if ((mad_hdr->mgmt_class == IB_MGMT_CLASS_CM) && > + (mad_hdr->attr_id != IB_MGMT_CLASSPORTINFO_ATTR_ID) > && > + (mad_hdr->method != IB_MGMT_METHOD_SEND)) > + goto out; > /* Filter GSI packets sent to QP0 */ > if (qp_num != 0) > valid = 1; > diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h index > 188df91..ec9b44d 100644 > --- a/include/rdma/ib_mad.h > +++ b/include/rdma/ib_mad.h > @@ -237,6 +237,8 @@ struct ib_vendor_mad { > u8 data[IB_MGMT_VENDOR_DATA]; > }; > > +#define IB_MGMT_CLASSPORTINFO_ATTR_IDcpu_to_be16(0x0001) > + > struct ib_class_port_info { > u8 base_version; > u8 class_version; > -- > 1.7.8.2 >
RE: [PATCH] IB/sa: replace GFP_KERNEL with GFP_ATOMIC
> > On Tue, Oct 27, 2015 at 06:56:50PM +, Wan, Kaike wrote: > > > > I do wonder if it is a good idea to call ib_nl_send_msg with a > > > spinlock held though.. Would be nice to see that go away. > > > > We have to hold the lock to protect against a race condition that a > > quick response will try to free the request from the > > ib_nl_request_list before we even put it on the list. > > Put is on the list first? Use a kref? Doesn't look like a big deal to clean > this up. > > Jason Not sure what "Put is on the list first?" means. I think it is valid to build the request, if success, add to the list, then send it. That would solve the problem you mention above. Was that what you hand in mind, Jason? I don't have time to work on this right now, not sure about Kaike. Until we can remove the spinlock the current proposed patch should be applied in the interim. Sorry for the noise before. Reviewed-By: Ira Weiny-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 07/22] staging/rdma/hfi1: Fix sparse error in sdma.h file
> > On Mon, Oct 19, 2015 at 10:11:22PM -0400, ira.we...@intel.com wrote: > > From: Niranjana Vishwanathapura> > > > Use NULL instead of 0 for pointer argument to fix the sparse error. > > > > Reviewed-by: Mike Marciniszyn > > Reviewed-by: Mitko Haralanov > > Reviewed-by: Dennis Dalessandro > > Signed-off-by: Niranjana Vishwanathapura > > > > Signed-off-by: Ira Weiny > > This should have just been folded in with the previous patch. Don't introduce > problems and fix them in the patchset. My bad, I will fix in v3. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 00/23] Update driver to 0.9-294
> > On Mon, Oct 19, 2015 at 12:43:24PM -0400, ira.we...@intel.com wrote: > > From: Ira Weiny> > > > The following series has bug fixes and updates to the staging hfi1 driver. > > Why are you adding new functionality to this driver before it is moved out of > drivers/staging/ ? I _REALLY_ don't like to see that, because it implies > that you > are spending time and energy on new stuff, not fixing up the known issues > instead. Perhaps I did not chose my words carefully enough. The largest issue on the TODO list is the refactoring of the code to be shared between the hfi1 and qib driver. While the hardware between hfi1 and qib is similar and thus the initial code looked similar, our performance tuning on hfi1 has revealed that some changes will be required to the hfi1 code to fully utilize the hardware. We would like to get these updates in to make sure that the refactoring work is done properly for the 2 hardware types. > > Please redo this patch series and don't add new stuff. > I can do this but it will cause us to do our refactoring work 2 times in order for it to be done correctly (properly support both hardware types). Of the 23 patches I sent only one has significant updates which don't conflict with this refactoring work. [PATCH 15/23] staging/rdma/hfi1: Implement Expected Receive TID caching However, without this patch users performance will be impacted. In the interest of full disclosure we still have at least 2 more series, possibly 3 which are similar to this series, bug fixes and performance improvements. Given the above explanation, would you find it acceptable to take these "updates"? I can certainly rework patch 15 to separate out the code clean up as per your other email. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC] IB/mad: remove obsolete snoop interface
> > > I have a series of about 5 patches which implement tracing in the MAD > > stack which I was working through and was going to submit > > Does your MAD stack tracing include the dumping and decode of sent and > received MADs ? > Yes with a bit more details in some places and probably less in others. It also traces MADs throughout the stack as well. So there is some indication of where they are in the stack umad vs mad for example. I'll post them later today for RFC. The most recent work I did on them involved a panic (which I fixed). But this is the main reason I have not submitted this series. (I wanted to get more testing on it before submission.) Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] infiniband:Remove unneeded function definitions and declarations in the file, mad.c
> > This removes the definitions and declarations of the functions > ib_register_mad_snoop and register_snoop_agent in the file mad.c due to have > no more callers and thus can be removed to reduce code size of this particular > file. I don't disagree with removing the snoop code, however, I don't believe this goes far enough. I have a similar patch which removes the snoop interface completely. It has a medium level of testing. I'll send it to the list for review. If you concur with the larger removal, I'd be happy to add your signed-off by line. Thanks, Ira > > Signed-off-by: Nicholas Krause> --- > drivers/infiniband/core/mad.c | 98 > --- > 1 file changed, 98 deletions(-) > > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c > index 4b5c723..e5978e1 100644 > --- a/drivers/infiniband/core/mad.c > +++ b/drivers/infiniband/core/mad.c > @@ -444,104 +444,6 @@ static inline int is_snooping_recvs(int > mad_snoop_flags) >IB_MAD_SNOOP_RMPP_RECVS*/)); > } > > -static int register_snoop_agent(struct ib_mad_qp_info *qp_info, > - struct ib_mad_snoop_private > *mad_snoop_priv) > -{ > - struct ib_mad_snoop_private **new_snoop_table; > - unsigned long flags; > - int i; > - > - spin_lock_irqsave(_info->snoop_lock, flags); > - /* Check for empty slot in array. */ > - for (i = 0; i < qp_info->snoop_table_size; i++) > - if (!qp_info->snoop_table[i]) > - break; > - > - if (i == qp_info->snoop_table_size) { > - /* Grow table. */ > - new_snoop_table = krealloc(qp_info->snoop_table, > -sizeof mad_snoop_priv * > -(qp_info->snoop_table_size + 1), > -GFP_ATOMIC); > - if (!new_snoop_table) { > - i = -ENOMEM; > - goto out; > - } > - > - qp_info->snoop_table = new_snoop_table; > - qp_info->snoop_table_size++; > - } > - qp_info->snoop_table[i] = mad_snoop_priv; > - atomic_inc(_info->snoop_count); > -out: > - spin_unlock_irqrestore(_info->snoop_lock, flags); > - return i; > -} > - > -struct ib_mad_agent *ib_register_mad_snoop(struct ib_device *device, > -u8 port_num, > -enum ib_qp_type qp_type, > -int mad_snoop_flags, > -ib_mad_snoop_handler > snoop_handler, > -ib_mad_recv_handler recv_handler, > -void *context) > -{ > - struct ib_mad_port_private *port_priv; > - struct ib_mad_agent *ret; > - struct ib_mad_snoop_private *mad_snoop_priv; > - int qpn; > - > - /* Validate parameters */ > - if ((is_snooping_sends(mad_snoop_flags) && !snoop_handler) || > - (is_snooping_recvs(mad_snoop_flags) && !recv_handler)) { > - ret = ERR_PTR(-EINVAL); > - goto error1; > - } > - qpn = get_spl_qp_index(qp_type); > - if (qpn == -1) { > - ret = ERR_PTR(-EINVAL); > - goto error1; > - } > - port_priv = ib_get_mad_port(device, port_num); > - if (!port_priv) { > - ret = ERR_PTR(-ENODEV); > - goto error1; > - } > - /* Allocate structures */ > - mad_snoop_priv = kzalloc(sizeof *mad_snoop_priv, GFP_KERNEL); > - if (!mad_snoop_priv) { > - ret = ERR_PTR(-ENOMEM); > - goto error1; > - } > - > - /* Now, fill in the various structures */ > - mad_snoop_priv->qp_info = _priv->qp_info[qpn]; > - mad_snoop_priv->agent.device = device; > - mad_snoop_priv->agent.recv_handler = recv_handler; > - mad_snoop_priv->agent.snoop_handler = snoop_handler; > - mad_snoop_priv->agent.context = context; > - mad_snoop_priv->agent.qp = port_priv->qp_info[qpn].qp; > - mad_snoop_priv->agent.port_num = port_num; > - mad_snoop_priv->mad_snoop_flags = mad_snoop_flags; > - init_completion(_snoop_priv->comp); > - mad_snoop_priv->snoop_index = register_snoop_agent( > - _priv->qp_info[qpn], > - mad_snoop_priv); > - if (mad_snoop_priv->snoop_index < 0) { > - ret = ERR_PTR(mad_snoop_priv->snoop_index); > - goto error2; > - } > - > - atomic_set(_snoop_priv->refcount, 1); > - return _snoop_priv->agent; > - > -error2: > - kfree(mad_snoop_priv); > -error1: > - return ret; > -} > -EXPORT_SYMBOL(ib_register_mad_snoop); > - > static inline void deref_mad_agent(struct ib_mad_agent_private > *mad_agent_priv) { > if (atomic_dec_and_test(_agent_priv->refcount)) > -- >
RE: [PATCH RFC] IB/mad: remove obsolete snoop interface
> > On 9/30/2015 2:01 AM, ira.we...@intel.com wrote: > > From: Ira Weiny> > > > This interface has no current users and is obsolete. > > There are no in tree users of this but there is Sean's madeye tool (which is > out > of tree). This is still a useful debug tool for MADs. Where is that? I did not think it was supported any longer? I have a series of about 5 patches which implement tracing in the MAD stack which I was working through and was going to submit along with this patch (those are still being tested in my spare time). I just wanted put this out here because of the patch Nicholas posted yesterday. > > > Signed-off-by: Ira Weiny > > --- > > > > This has undergone a medium level of testing. I have run it against > > mlx4, qib, and OPA hardware and it does not seem to cause any regressions. > > > > drivers/infiniband/core/mad.c | 226 > > + > > drivers/infiniband/core/mad_priv.h | 13 --- > > 2 files changed, 5 insertions(+), 234 deletions(-) > > There are also snoop changes needed in include/rdma/ib_mad.h. I'll look into it if there are no other objections to removing the snoop interface. > > Is it correct to assume that OPA_CAP_MASK3_IsSnoopSupported in > opa_port_info.h is not related to this ? Nope, not related. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: 0-day build issues
> > What is being done in the parent directory has a bug, you're right that far, > but > this is the wrong fix. If you check the 0day failures, it isn't because the > InfiniBand core isn't enabled, it's because the current menuconfig setup > allows > the IB core to be a module and the staging drivers to be compiled into the > kernel. That obviously is an invalid condition and it applies to all of the > drivers > in staging right now. The correct fix is to resolve that issue. That is our conclusion as well. But how do we do that globally in staging/rdma? The module setting of INFINIBAND does not seem to carry to the subdirs while the following does work. It seems like this is the easiest fix since this is a temp home for these drivers. Ira 13:54:05 > git di diff --git a/drivers/staging/rdma/amso1100/Kconfig b/drivers/staging/rdma/amso1100/Kconfig index e6ce5f209e47..809cb14ac6de 100644 --- a/drivers/staging/rdma/amso1100/Kconfig +++ b/drivers/staging/rdma/amso1100/Kconfig @@ -1,6 +1,6 @@ config INFINIBAND_AMSO1100 tristate "Ammasso 1100 HCA support" - depends on PCI && INET + depends on PCI && INET && INFINIBAND ---help--- This is a low-level driver for the Ammasso 1100 host channel adapter (HCA). diff --git a/drivers/staging/rdma/hfi1/Kconfig b/drivers/staging/rdma/hfi1/Kconfig index fd25078ee923..97372103e947 100644 --- a/drivers/staging/rdma/hfi1/Kconfig +++ b/drivers/staging/rdma/hfi1/Kconfig @@ -1,6 +1,6 @@ config INFINIBAND_HFI1 tristate "Intel OPA Gen1 support" - depends on X86_64 + depends on X86_64 && INFINIBAND default m ---help--- This is a low-level driver for Intel OPA Gen1 adapter. diff --git a/drivers/staging/rdma/ipath/Kconfig b/drivers/staging/rdma/ipath/Kconfig index 041ce0634968..12dcd4010a18 100644 --- a/drivers/staging/rdma/ipath/Kconfig +++ b/drivers/staging/rdma/ipath/Kconfig @@ -1,6 +1,6 @@ config INFINIBAND_IPATH tristate "QLogic HTX HCA support" - depends on 64BIT && NET && HT_IRQ + depends on 64BIT && NET && HT_IRQ && INFINIBAND ---help--- This is a driver for the deprecated QLogic Hyper-Transport IB host channel adapter (model QHT7140), -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: 0-day build issues
> > > > What is being done in the parent directory has a bug, you're right > > that far, but this is the wrong fix. If you check the 0day failures, > > it isn't because the InfiniBand core isn't enabled, it's because the > > current menuconfig setup allows the IB core to be a module and the > > staging drivers to be compiled into the kernel. That obviously is an > > invalid condition and it applies to all of the drivers in staging right > > now. The > correct fix is to resolve that issue. > > That is our conclusion as well. > > But how do we do that globally in staging/rdma? > > The module setting of INFINIBAND does not seem to carry to the subdirs while > the following does work. > > It seems like this is the easiest fix since this is a temp home for these > drivers. Ok looks like this will fix it... Still testing. 14:18:37 > git di diff --git a/drivers/staging/rdma/Kconfig b/drivers/staging/rdma/Kconfig index d7f62359d743..46ddc03c6839 100644 --- a/drivers/staging/rdma/Kconfig +++ b/drivers/staging/rdma/Kconfig @@ -1,10 +1,11 @@ menuconfig STAGING_RDMA -bool "RDMA staging drivers" +tristate "RDMA staging drivers" depends on INFINIBAND depends on PCI || BROKEN depends on HAS_IOMEM depends on NET depends on INET + depends on m default n ---help--- This option allows you to select a number of RDMA drivers that -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [patch] IB/hfi1: checking for NULL instead of IS_ERR
> > On Fri, Sep 18, 2015 at 11:51:09AM -0400, Doug Ledford wrote: > > On 09/16/2015 02:22 AM, Dan Carpenter wrote: > > > __get_txreq() returns an ERR_PTR() but this checks for NULL so it > > > would oops on failure. > > > > > > Signed-off-by: Dan Carpenter> > > > Thanks, applied. > > Applied to what? Should I just ignore these types of patches and not take > them > in my tree and you will send them on later on? I don't remember what we > agreed to do, sorry. My recollection was that Doug was going to handle the staging/rdma sub-tree because of the large churn he expected between the rdma subsystem and those drivers. Therefore, I have been submitting my patches against staging/rdma/hfi directly to Doug and Linux-rdma. Ira > > thanks, > > greg k-h > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the > body > of a message to majord...@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IB/hfi: Properly set permissions for user device files
> > On Thu, Sep 17, 2015 at 06:18:15PM +0200, Michal Schmidt wrote: > > On 09/16/2015 11:41 PM, ira.we...@intel.com wrote: > > > @@ -125,7 +151,20 @@ int __init dev_init(void) > > > ret = PTR_ERR(class); > > > pr_err("Could not create device class (err %d)\n", -ret); > > > unregister_chrdev_region(hfi1_dev, HFI1_NMINORS); > > > + goto done; > > > } > > > + class->devnode = hfi1_devnode; > > > + > > > + user_class = class_create(THIS_MODULE, class_name_user()); > > > + if (IS_ERR(user_class)) { > > > + ret = PTR_ERR(user_class); > > > + pr_err("Could not create device class for user accisble files > > > +(err %d)\n", > > > > Typo in error message. > > And what is the deal with all these pr_err's? This is a driver, it needs to > use > dev_err and related always. Does thatneed to go in the todo list? > > I'm also skeptical we need a print on every error case :| > This is very early in the driver code and we don't have a struct device at this point. The bulk of the driver uses macros which use dev_*. So no I don't think we need to add anything to the todo list. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [GIT PULL] pull request v2: linux-firmware: Add Intel OPA hfi1 firmware
> > On 09/03/2015 04:26 PM, Kyle McMartin wrote: > > On Wed, Sep 02, 2015 at 10:15:15PM +0000, Weiny, Ira wrote: > >>> > >>> On 08/24/2015 11:28 AM, ira.weiny wrote: > >>>> git://github.com/weiny2/linux-firmware.git master-hfi1-firmware-v2 > >>> > >>> Should this go through me or though the linux-firmware tree? > >> > >> I don't know. My apologizes for being ignorant in this case. > >> > >> I was thinking that as the rdma kernel maintainer you would have access to > merge it into the Linux-firmware tree. The README in the Linux-firmware > project only says to email linux-firmw...@kernel.org. I have not heard from > anyone on that list. > >> > >> Does anyone on Linux-firmware have any objections to merging my branch? > > > > No objections, I've just been on vacation most of August. Let me look > > over the license text and then I'll be right with you. > > > > (linux-firmware@ isn't a list because of vger rules about attachment > > size and such, it just goes to dwmw2, ben hutchings and myself.) > > > > cheers, --Kyle > > Thanks Kyle :-) Thanks! :-D Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [GIT PULL] pull request v2: linux-firmware: Add Intel OPA hfi1 firmware
> > On 08/24/2015 11:28 AM, ira.weiny wrote: > > git://github.com/weiny2/linux-firmware.git master-hfi1-firmware-v2 > > Should this go through me or though the linux-firmware tree? I don't know. My apologizes for being ignorant in this case. I was thinking that as the rdma kernel maintainer you would have access to merge it into the Linux-firmware tree. The README in the Linux-firmware project only says to email linux-firmw...@kernel.org. I have not heard from anyone on that list. Does anyone on Linux-firmware have any objections to merging my branch? Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCHv3 infiniband-diags] iblinkinfo.c: Close additional file descriptor in advance
Additional file descriptor for SMP MADs should be closed before running ibnd_discover_fabric() to avoid parallel usage of two SMP file descriptors Signed-off-by: Vladimir Koushnir vladim...@mellanox.com Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks applied, Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/2] update ocrdma to dual license
Christoph, Apologies, I misspoke in my response to you. There was a study of the code and we thought it was reasonable to post. However, in retrospect we should have used more due diligence. We're going back to seek explicit consent from key contributors. I'm no legal expert, but don't you need consent from _all_ contributors? Ira
RE: [GIT] Networking
Please accept my apologies. The original patch used WARN_ON but I was advised to use BUG_ON in a review and I should have thought about it more rather than blindly make the change. Ira, Can you please point me to the review thread where this advise was made? I can't track it. In internal reviews I always fight with developers that put BUG_ON assertions whenever something goes wrong, I'd like to see on what context this feedback was provided to you. This was recommended for 2 patches but I did not change the first. http://www.spinics.net/lists/linux-rdma/msg25514.html This is an earlier version of the patch Linus objected to http://www.spinics.net/lists/linux-rdma/msg25498.html Ira
RE: [PATCH V2] IB/mad: Remove improper use of BUG_ON
On 6/25/2015 4:52 PM, ira.we...@intel.com wrote: From: Ira Weinyira.we...@intel.com We recently added BUG_ON's which were inappropriate for a condition which should never happen. Change these to be WARN_ON_ONCE as a debugging aid. Fixes: 4cd7c9479aff ('IB/mad: Add support for additional MAD info to/from drivers') Signed-off-by: Ira Weinyira.we...@intel.com remove the blank line, Is there some documentation on this stuff? This does not mention anything about white space. https://www.kernel.org/doc/Documentation/SubmittingPatches I don't mind following the proper conventions if I knew what they were. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RFC] IB: Add rdma_cap_ib_switch helper and use where appropriate
On 6/18/2015 4:25 PM, Jason Gunthorpe wrote: Ira had patch that made some functions related to this public, not sure if it is applied yet.. Which patch from Ira are you referring to ? This patch which is in Dougs for 4.2 tree: commit ca0369e31d1794a4f4e39c52fe39b0406617e2b4 Author: Ira Weiny ira.we...@intel.com Date: Wed May 13 20:02:55 2015 -0400 IB/core: Create common start/end port functions Previously start_port and end_port were defined in 2 places, cache.c and device.c and this prevented their use in other modules. Make these common functions, change the name to reflect the rdma name space, and update existing users. Signed-off-by: Ira Weiny ira.we...@intel.com Signed-off-by: Doug Ledford dledf...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
ib_verbs define an *extensive* direct HW access API, which is constantly evolving. This is the problem with verbs... You cannot describe the intricate object relations and semantics through an API. In addition, you can't abstract anything or fix stuff in SW. The only way to *truly* know what to expect when performing Verbs calls is to check the node type. How can you say this? mthca, mlx4, mlx5, and qib all have different sets of functionality... all with the same node type. OPA has the same set as qib... same node type. ib_verbs was never only an API. It started as the Linux implementation of the IBTA standard, with guaranteed semantics and wire protocol. Later, the interface was reused to support additional RDMA devices. However, you could *always* check the node type if you wanted to, thereby retaining the standard guarantees. Win-win situation... Not true at all. For example, Qib does not support XRC and yet has the same node type as mlx4 (5)... This is a very strong property; we should not give up on it. On the contrary the property is weak and implies functionality or lack of functionality rather than being explicit. This was done because getting changes to kernel ABIs was hard and we took a shortcut with node type which we should not have. OPA attempts to stop this madness and supports the functionality of verbs _As_ _Defined_ rather than creating yet another set of things which applications need to check against. You're right that apps can be coded to other CA types, like RNICs and USNICs. However, those are all very different from an IB_CA due to limited queue pair types or limited primitives. If OPA had that same limitation then I would agree it needs a different node type. How do you know that it doesn't? Up until now you have had to take my word for it. Now that the driver has been posted it should be clear what verbs we support (same as qib). Ira
RE: [PATCH v5 1/4] IB/netlink: Add defines for local service requests through netlink
On Fri, Jun 12, 2015 at 05:24:32AM +, Weiny, Ira wrote: PATH_USE_* is a better name. But I think the defines should be: PATH_USE_ANY (or ALL?) How about PATH_USE_CONNECTED? I don't want to bikeshed this too much as I'm not really good on names but... How does Connected relate to wanting all available Path options? Can't a connected transport use a reversible GMP path? I guess that is where I am getting hung up. Ira Why not treat unknown type as ANY/ALL and return all 6 if possible. If not possible/supported (ie like ibacm) return GMP which is always valid. Make sense to me.. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 4/4] IB/sa: Route SA pathrecord query through netlink
If user space PR capabilities (unicast, multicast, both) is supported, it affects the API. How? If we can query SA for multicast PRs without joining the multicast groups, what additional changes in netlink API do we need to support both? Nothing to support both but if we wanted to disable one or other based on user space we would but it sounds like we don't need/want this but would use user space rejection/no data for this. Yes I think we should allow userspace to decide if this is supported or not. That allows for more flexibility. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v5 1/1] ibacm: Add support for pathrecord query through netlink
On Thu, Jun 11, 2015 at 01:04:25PM -0400, kaike@intel.com wrote: +static int acm_nl_parse_path_attr(struct nlattr *attr, + struct acm_ep_addr_data *data) + switch (attr-nla_type NLA_TYPE_MASK) { + default: + acm_log(1, WARN: unknown attr %x\n, attr-nla_type); + ret = -1; + break; I would like to see a mandatory/optional scheme here. For instance if we add SL it would be mandatory, but policy information like requesting net_device would be optional. Why would SL be mandatory? Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v5 1/1] ibacm: Add support for pathrecord query through netlink
On Thu, Jun 11, 2015 at 08:16:41PM +, Weiny, Ira wrote: For instance if we add SL it would be mandatory, but policy information like requesting net_device would be optional. Why would SL be mandatory? If the kernel asks for a specific SL then user space must respect that and return a path with that exact SL. If userspace does not know how to do that, then it must not answer the query. It cannot ignore the SL requirement and return a random SL. Oh, right, I misunderstood. I thought that you were saying that SL would be mandatory as a netlink attribute. Hypothetically, assuming we add SL.. Right... ;-) Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v5 1/4] IB/netlink: Add defines for local service requests through netlink
From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com] Sent: Thursday, June 11, 2015 8:22 PM To: Wan, Kaike Cc: linux-rdma@vger.kernel.org; Fleck, John; Weiny, Ira Subject: Re: [PATCH v5 1/4] IB/netlink: Add defines for local service requests through netlink On Thu, Jun 11, 2015 at 11:45:46PM +, Wan, Kaike wrote: Apparently, to achieve this goal, we need the following changes: I don't expect anything more than a proper hole in the UAPI to make this work. I already have patches that implement full kernel support for the 6-path format, and globally enable APM. With this netlink interface it becomes feasible to mainline them. That will require some reworking of what you've done, but I don't expect you to do that. The only remaining thing we need here is that tiny bit of policy information for the kernel to specify what the intended use of the path is: - For connected CM operation (6 path records) - For unidirectional UD (1 path record) - For 'misc' GMP like operation (at least 1 reversible path record) I'm happy if you call it PATH_USE_XX Testing the reversible bit should be enough to rough this in, if not reversible it is PATH_USE_UNIDIRECTIONAL_UD, otherwise it is PATH_USE_GMP. PATH_USE_* is a better name. But I think the defines should be: PATH_USE_ANY (or ALL?) PATH_USE_UNIDIRECTIONAL PATH_USE_GMP Recommend that userspace treat any unknown usage type as GMP (it is the most general). Why not treat unknown type as ANY/ALL and return all 6 if possible. If not possible/supported (ie like ibacm) return GMP which is always valid. That would be cool. If we add this attribute, should we remove reversible and numb_path attributes? Yes. Kaike Ira: Do you like 'path use' better as a name? I think I do.. Yea that is a much better name. Thanks, Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IB/core: Don't warn on no SA support in event handler
Registering an event handler is done for a device. This device may have one RoCE port (no SA cap) and one InfiniBand port (has SA cap). Therefore, warning from the event handler about a specific port that doesn't have SA cap is correct but pollutes the kernel log without a need. Maybe we should think about register event handler per port :-P I agree, though I think that can be added separately from this change. Also agreed. We identified many places where we should split support to be per port while we did this work. We need to start working up to that. For the time being we should not break existing users (or in this case annoy them... ;-) Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 4/4] IB/sa: Route SA pathrecord query through netlink
On 6/10/2015 3:10 PM, Jason Gunthorpe wrote: On Wed, Jun 10, 2015 at 01:47:36PM -0400, Hal Rosenstock wrote: On 6/9/2015 10:57 AM, kaike@intel.com wrote: From: Kaike Wan kaike@intel.com This patch routes a SA pathrecord query to netlink first Should only unicast PRs be done in this manner or should API support enabling for unicast and/or multicast ? AFAIK kernel doesn't query multicast PRs now (queries MCMRs) but this seems like it would help make it future proof and not have to take timeout on local query unless app supports it. It is a good question. We can clearly extend toward that, using a MGID as the DGID and adding additional nested netlink fields. However, does it make sense? If it doesn't make sense, then should MGIDs as DGIDs never be requested via the PR netlink API ? Why should we prevent it? What does it hurt? Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 0/4] Sending kernel pathrecord query to user cache server
This series does not attempt to optimize the kernel needing to know that a PR has been updated. There are existing mechanisms for that. Does this exist in the kernel? At least some support, yes. For example client reregister marks all IPoIB paths as invalid. Reregister indicates that all PRs are now invalid? Not directly. IPoIB treats it that way. I guess to be safe. Officially one should register for UnPath/RePath traps. But no one has ever implemented that. To be clear I am agreeing with Hal that having some sort of update signal would be nice. But I don't think that must be done before this series goes in. I think a PR update signal should be an extension to this interface. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 0/4] Sending kernel pathrecord query to user cache server
This series does not attempt to optimize the kernel needing to know that a PR has been updated. There are existing mechanisms for that. Does this exist in the kernel? At least some support, yes. For example client reregister marks all IPoIB paths as invalid. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IB/core: Change rdma_protcol_iboe to roce
Hi, Le jeudi 14 mai 2015 à 15:01 -0400, ira.we...@intel.com a écrit : From: Ira Weiny ira.we...@intel.com It has been decided that ROCE should be used within the kernel rather than IBOE as we used before. Change iboe to roce on the new rdma_protocol_* functions. Erk ... What's the usefulness of such patch ? Currently there is inconsistency in the naming of the RoCE technology. 12:27:35 grep -r roce drivers/infiniband | grep -v Binary | grep -i -v proce | wc -l 85 All in the driver code. 12:27:48 grep -r iboe drivers/infiniband | grep -v Binary | wc -l 130 Mainly in the core and mlx4 driver. This patch was to clean up the management helper function as a general move toward standardizing on roce rather than iboe. Most people in the community refer to this as RoCE so that name was chosen to move to. IBoE is used throughout the IB/RDMA subsystem. Changing only these occurences is rather inconsistent. I asked about changing all the references and Doug mentioned he would make a patch to change the other references. Personally I don't want to see a massive rename patch but this could be done. Ira
RE: [RFC PATCH 1/5] IB/core: Add Core Capability flags to ib_device
On May 10, 2015, at 11:42 PM, ira.weiny ira.we...@intel.com wrote: On Tue, May 05, 2015 at 08:22:21PM +, Dave Goodell (dgoodell) wrote: In the case that usNIC is operating in UDP mode (which is the overwhelming majority of the cases), there is absolutely no additional protocol that ends up on the wire or headers in the user buffers besides UDP/IP/Ethernet. They are 100% plain UDP packets, they just happen to be sent via OS-bypass queues instead of traveling through the kernel networking stack. [^ there continues to be confusion about this for some reason, but I don't know why] So what is this patch for? Does my earlier email clarify the situation any? http://marc.info/?l=linux- rdmam=142972178630720w=2 Somewhat, is there any reason applications need to distinguish between the The legacy RDMA_TRANSPORT_USNIC type and The current RDMA_TRANSPORT_USNIC_UDP type? Or does the former no longer exist? commit 248567f79304b953ea492fb92ade097b62ed09b2 Author: Upinder Malhi uma...@cisco.com Date: Thu Jan 9 14:48:19 2014 -0800 IB/core: Add RDMA_TRANSPORT_USNIC_UDP Add RDMA_TRANSPORT_USNIC_UDP which will be used by usNIC. Signed-off-by: Upinder Malhi uma...@cisco.com Signed-off-by: Roland Dreier rol...@purestorage.com This is probably where a lot of the confusion is coming from. Arguably RDMA_TRANSPORT_USNIC_UDP could/should have simply been named RDMA_TRANSPORT_UDP. I guess I'm wondering if there needs to be an RDMA_TRANSPORT_USNIC to represent the The legacy RDMA_TRANSPORT_USNIC type you mention in the link above. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 07/28] IB/Verbs: Reform IB-ulp ipoib
On Wed, Apr 15, 2015 at 09:58:18AM +0200, Michael Wang wrote: We can give client-add() callback a return value and make ib_register_device() return -ENOMEM when it failed, just wondering why we don't do this at first, any special reason? No idea, but having ib_register_device fail and unwind if a client fails to attach makes sense to me. Yes that is what we should do _but_ I think we should tackle that in a different series. As you said in another email, this series is getting very long and hard to review/prove is correct. This is why I was advocating keeping a check at the top of cm_add_one which verified all Ports supported the CM. This is the current logic today and is proven to work for the devices/use cases out there. We can clean up the initialization code and implement support for individual ports in follow on patches. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 10/27] IB/Verbs: Reform cm related part in IB-core cma/ucm
On 4/16/2015 11:58 AM, Jason Gunthorpe wrote: It also looks like hardwired 1 won't work on switch ports, so it is no-go. AFAIK enhanced switch port 0 is not supported by CM/RDMA CM in the current code. There is no need for CM/RDMA CM on base switch port 0. I concur and I thought I mentioned that before. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)
On Tue, Apr 7, 2015 at 10:12 AM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: I don't think you understand how deep the problem Or is describing goes. [...Appropriate and correct critique...] This thread has made me realize that even as I am able to carve out more time to work on things like IB maintaintership, I no longer have the desire to spend my time maintaining the IB subsystem. Since my current level of activity is clearly hurting the community, I've decided to step down as maintainer. I am sad to see you leave. Without you InfiniBand would have never been accepted upstream at all. Thank you for all the hard work, Ira N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
On 3/4/2015 11:41 AM, Weiny, Ira wrote: InfiniBandInfiniBand InfiniBand Verbs iWARP InfiniBand iWARP Verbs (subset of IBV, with specific connection establishment requirements that don't exist with IBV) RoCE EthernetInfiniBand Verbs (but with different addressing because of the different link layer) OPA OPA InfiniBand Verbs Verbs is an interface definition to hardware that has been twisted to be a software API and extended to expose vendor-specific implementation 'features' as extensions. It is not a transport. The device capability bits seems to have evolved to mean: vendor A implemented some random 'feature' in their hardware and wants all applications to now check for this 'feature' and change their code to use it. Basically, what gets defined as a device cap is rather arbitrary. This was the point I was trying to make and the reason the OPA MAD support was implemented as a device capability. The proposed device capability stands for a change that is way more drastic than just a vendor extension. This is a device running a completely different wire protocol which does not interoperate with the IB device that is impersonating. Also, it does not come under the same jurisdication as IB. It is entirely possible that IBTA could make some change in the future where OPA can no longer masquerade as IB. OPA must also be identifiable by any verbs I disagree. Software applications running IP don't need to know they are running over IB vs Ethernet? Why would this have to be true for Verbs? perftest, libibverbs (example apps), mvapich (verbs), openmpi (verbs), ibacm (librdmacm based applications), srp, and ipoib all run without any knowledge of the link being OPA. While not an exhaustive list of verbs applications, this is a pretty good sampling. We have specifically designed OPA to support these applications without modifications. or management application. Agreed, but _only_ when talking to the hardware. Other MAD interfaces are IB compatible. The idea of the original series was to check the device capability bits (in kernel and in userspace) for SM diagnostic and diagnostic tool support. As currently submitted I would need to add a kernel ABI to get the extended capability bits (because the originals were exhausted). I have been waiting for this discussion to settle before going through that effort. To do that, it should be properly represented in node type, transport type, and link layer as all the other RDMA technologies have been regardless of the changes needed. All the other RDMA technologies have gone through this process. As I suggested above: Would setting the Link Layer to a new value (ie. IB_LINK_LAYER_OMNI_PATH_ARCH) while maintaining the Transport as InfiniBand Verbs be satisfactory? While this breaks at least some of the examples I list above I believe I have worked out a way to phase in this support through our provider library. To address your comments from the other fork of this thread: The 32 bit LIDs in the SMP are designed for future expansion. Currently OPA does not support 16 bit LIDs. It's not just verbs (structures and APIs) but also CM and any other place where LID is used which are numerous. Is a new verbs coming for this ? I can't guarantee that no changes will be required in the future. But right now the answer is No because we specifically implemented OPA to be InfiniBand Verbs. How will compatibility be dealt with ? Compatibility is provided by presenting identical InfiniBand Verbs interfaces (kernel abi, SA protocols, etc) to the user. This is also another example why a more complete picture of OPA is needed. -- Hal At this moment OPA provides ULP compatibility for all IB Verbs applications with the exception of management software which talks directly to the SMA and PMA. All ULP interactions with management (Path Record, CM, Multicast joins, etc) are still IB formatted and compatible. We have specifically designed this interoperability with OFA. To help illustrate my point I have included 2 patches below. The first defines a new Link Layer; IB_LINK_LAYER_OMNI_PATH_ARCH and modifies all the kernel interfaces which look at link layer. This has been minimally tested with IPoIB and the perftest tools (the modification of which is included as the 2nd patch). The entire patch boils down to changing if (InfiniBand) to if (InfiniBand || OPA). IMO this is rather inefficient but if this is more acceptable to the community I am willing to investigate this further. Ira From 9f09be92576204b3ead71f714fc231110f03bff6 Mon Sep 17 00:00:00 2001 From: Ira Weiny ira.we...@intel.com Date: Wed, 3 Dec 2014 20:01:09 -0500 Subject: [PATCH] WIP: IB/core: Add
RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
On Wed, Mar 04, 2015 at 07:21:48AM +, Weiny, Ira wrote: I think this is going to break quite a bit. I have prototyped setting OPA devices to OPA Link Layer and the perftest tools just fall over. Any changes to the Link layer or the transport types will require a transition period for ULPs. How do the perftest tools work with OPA in the first place? OPA seems to have 32 bit lids. Do you mean it 'works' as long as the lid is 16 bits? Same general point about all of verbs, lots of 'uint16_t lid' in the interfaces? The 32 bit LIDs in the SMP are designed for future expansion. Currently OPA does not support 16 bit LIDs. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
InfiniBand InfiniBand InfiniBand Verbs iWARP InfiniBand iWARP Verbs (subset of IBV, with specific connection establishment requirements that don't exist with IBV) RoCEEthernetInfiniBand Verbs (but with different addressing because of the different link layer) OPA OPA InfiniBand Verbs Verbs is an interface definition to hardware that has been twisted to be a software API and extended to expose vendor-specific implementation 'features' as extensions. It is not a transport. The device capability bits seems to have evolved to mean: vendor A implemented some random 'feature' in their hardware and wants all applications to now check for this 'feature' and change their code to use it. Basically, what gets defined as a device cap is rather arbitrary. This was the point I was trying to make and the reason the OPA MAD support was implemented as a device capability. Ira
RE: [PATCH libibumad] umad.h: Remove umad_reg_flags from enum declaration
-Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Hal Rosenstock Sent: Tuesday, March 03, 2015 12:23 PM To: linux-rdma (linux-rdma@vger.kernel.org) Cc: Weiny, Ira; ra...@mellanox.com Subject: [PATCH libibumad] umad.h: Remove umad_reg_flags from enum declaration This causes it to be a global variable which causes linking issues when used by multiple files linked together. It results in multiple definition of `umad_reg_flags'. Found-by: Rafi Weiner ra...@mellanox.com Reviewed-by: Ira Weiny ira.we...@intel.com Signed-off-by: Hal Rosenstock h...@mellanox.com --- -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
Doug, You have given me a lot to think about... Comments below... While it is a different type of technology, standard verbs[*] remains 100% compatible. Unlike other verbs technologies user space software does not need any knowledge that the underlying device is not IB. For example, PR (and SA) queries, CM, rdmacm, and verbs calls themselves are all 100% IB compatible. Even if OPA is 100% standard verbs compatible which it does not appear to be, that does not make OPA an extra capability of an IBA device. I don't want to make it an extra capability of an IBA device. I want to make it an extra capability of a verbs device. And this, friends, is why it's bad to make both a link layer and an user space API with the exact same name ;-). Anyway, I get your point Ira and it makes sense to me. However, I also get Hal's point. Our track record on this particular issue is a bit wonky though. Thanks for laying this out. I too understand Hals point. First we had InfiniBand. Then came iWARP, and we used the transport type to differentiate it from an actual InfiniBand device, but left the underlying link layer listed as InfiniBand. Then came RoCE, and we listed its transport type as InfiniBand, but changed the link layer to Ethernet. Which left us in the oxymoronic position that even though iWARP was over Ethernet, the tools said it was over InfiniBand, while RoCE was the only thing that listed Ethernet as the link layer. We later fixed that up with some hacks in tools to keep users from being confused and filing bugs. Maybe this represents an opportunity to straighten some of this mess out. If I remember correctly, this is the matrix of technologies today: TechnologyLinkLayer Transport InfiniBandInfiniBand InfiniBand Verbs iWARP InfiniBand iWARP Verbs (subset of IBV, with specific connection establishment requirements that don't exist with IBV) RoCE EthernetInfiniBand Verbs (but with different addressing because of the different link layer) OPA ? InfiniBand Verbs I think this is _relatively_ accurate. The one exception is with the various IB verbs extensions which have been introduced. While most are being pushed into the spec not all of them are in the spec prior to being in the kernel. It makes keeping up with what IB Verbs really is difficult. Mind you I'm not opposing having IB Verbs be flexible. But I think we can accurately have multiple underlying technologies which support IB Verbs with various extensions. It makes me wonder if we shouldn't make this matrix more accurate: TechnologyLinkLayer Transport InfiniBandInfiniBand InfiniBand Verbs iWARP EthernetiWARP Verbs RoCE EthernetRoCE-v1 or RoCE-v2 OPA ? OPA Verbs With this sort of setup, the core ib_mad/ib_umad code would simply check the verbs type to see what support it can enable. For IBV it would be the existing support, for OPAV it would be the additional jumbo support. OPA, to be compatible with IB Verbs, uses the same node types as InfiniBand verbs (1 == CA, 2 == Switch). As such it returns the same Transport type. I'm not sure how much we might expect a change like this to break existing software though, so maybe staightening this mess out is a non-starter. I think this is going to break quite a bit. I have prototyped setting OPA devices to OPA Link Layer and the perftest tools just fall over. Any changes to the Link layer or the transport types will require a transition period for ULPs. While it is a primary goal of the RDMA stack to have a common verbs API for various RDMA interconnects, each one is properly represented to allow it's unique characteristics to be exposed. The difference here is that we have maintained IB Verbs compatibility where other RDMA technologies did not. We have tested many Verbs applications (both kernel and user space) and they function _without_ _modification_. Despite this compatibility we are still having this discussion. I can think of no other way to signal the MAD capability to the MAD stack which will preserve the verbs compatibility in the same way. See above. Define a new transport type, OPAVerbs, that is a superset of IBV and enable jumbo support when OPAV is the transport on the link. But the transport type is not changing. Once again we are attempting to be completely verbs compatible. From the MAD stack POV the verbs calls in the kernel are not different. Would it be acceptable if the result of my patch series was: InfiniBand InfiniBand InfiniBand Verbs iWARP InfiniBand iWARP Verbs (subset of IBV, with specific
RE: [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 819b794..a6a33cf 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2924,6 +2924,12 @@ static int ib_mad_port_open(struct ib_device *device, char name[sizeof ib_mad123]; int has_smi; + if (device-cached_dev_attrs.max_mad_size IB_MGMT_MAD_SIZE) { + dev_err(device-dev, Min MAD size for device is %u\n, + IB_MGMT_MAD_SIZE); + return -EFAULT; + } + The printk message here is not very informative and it qualifies as an error. Someone reading that for the first time in the dmesg output and wondering why their device isn't working will be confused if they don't know about the mad size changes you are making here. Something like max supported MAD size (%u) min required by ib_mad (%u), ignoring dev \n Good suggestion. Fixed for v5 with this message. + dev_err(device-dev, + Max supported MAD size (%u) min required by ib_mad (%u), ignoring device (%s)\n, + device-cached_dev_attrs.max_mad_size, + IB_MGMT_MAD_SIZE, device-name); Ira
RE: [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr
Fixed for v5 with this message. + dev_err(device-dev, + Max supported MAD size (%u) min required by ib_mad (%u), ignoring device (%s)\n, + device-cached_dev_attrs.max_mad_size, + IB_MGMT_MAD_SIZE, device-name); It also seems redundant since the only call to ib_mad_port_open is: if (ib_mad_port_open(device, i)) { printk(KERN_ERR PFX Couldn't open %s port %d\n, device-name, i); So, why does this particular error deserve a special double error print? I assume it is basically impossible to hit? This does indicate a coding error. Generally I prefer details of why the device could not open the port. But if the community feels this is redundant or not possible I can drop the hunk. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 07/19] IB/mad: Convert ib_mad_private allocations from kmem_cache to kmalloc
+static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev) { + return (kmalloc(sizeof(struct ib_mad_private_header) + + sizeof(struct ib_grh) + + dev-cached_dev_attrs.max_mad_size, GFP_ATOMIC)); Ouch! GFP_ATOMIC? I thought that generally all of the mad processing was done from workqueue context where sleeping is allowed? In the two places where you removed kmem_cache_alloc() calls and replaced it with calls to this code, they both used GFP_KERNEL and now you have switched it to GFP_ATOMIC. If there isn't a good reason for this, it should be switched back to GFP_KERNEL. The original kmem_cache_allocs are actually both GFP_ATOMIC (1 usage, see below) and GFP_KERNEL (the 2 usages you reference). My bad for not making this specific to the allocation. I will research the original GFP_ATOMIC usage and if it is necessary have this function take gfp_t. Otherwise if we can get away with GFP_KERNEL I agree that would be best. +} + /* * Return 0 if SMP is to be sent * Return 1 if SMP was consumed locally (whether or not solicited) @@ -771,7 +776,8 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv, } local-mad_priv = NULL; local-recv_mad_agent = NULL; - mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_ATOMIC); Original usage here... Thanks, Ira + + mad_priv = alloc_mad_priv(mad_agent_priv-agent.device); if (!mad_priv) { ret = -ENOMEM; dev_err(device-dev, No memory for local response MAD\n); @@
RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
Do you have a suggestion for alternatives? The desire to leverage the IB MAD infrastructure for OPA is understood but the current approach represents OPA as a device capability which does not seem appropriate because OPA is clearly a different type of RDMA technology than IB. While it is a different type of technology, standard verbs[*] remains 100% compatible. Unlike other verbs technologies user space software does not need any knowledge that the underlying device is not IB. For example, PR (and SA) queries, CM, rdmacm, and verbs calls themselves are all 100% IB compatible. Even if OPA is 100% standard verbs compatible which it does not appear to be, that does not make OPA an extra capability of an IBA device. I don't want to make it an extra capability of an IBA device. I want to make it an extra capability of a verbs device. While it is a primary goal of the RDMA stack to have a common verbs API for various RDMA interconnects, each one is properly represented to allow it's unique characteristics to be exposed. The difference here is that we have maintained IB Verbs compatibility where other RDMA technologies did not. We have tested many Verbs applications (both kernel and user space) and they function _without_ _modification_. Despite this compatibility we are still having this discussion. I can think of no other way to signal the MAD capability to the MAD stack which will preserve the verbs compatibility in the same way. Therefore, to address your initial question regarding tradeoffs I believe this method is the least invasive to the code as well as removing any potential performance penalties to core verbs. Ira [*] We don't support some of the extensions particularly those which have been most recently introduced. And we would like to make our own extensions in the form of higher MTU availability, but the patch is not yet ready to be submitted upstream. There appear to be a number of things that are not exposed by the current patch set which will be needed in subsequent patches. It would be better to see the complete picture so it can be reviewed as a whole. Is there something in particular you would like to see? There are no other patches required in the core modules for verbs applications to function. The MTU patch only improves verbs performance. Ira -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RE-RESEND V2 for-next 4/5] IB/core: Make sure that the PSN does not overflow
0xff could be made a #define Question: userspace is allowed to ask for a PSN on 32bits, but it will be silently truncated, is it going to puzzle applications ? Anyway, it would have been better to return an error in the first place ... not sure if we can do it now ... Yes I'm still not sure what bug this fixes. Or did you ever find out? Ira
RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
On 2/4/2015 6:29 PM, ira.we...@intel.com wrote: From: Ira Weiny ira.we...@intel.com OPA MADs share a common header with IBTA MADs but with a different base version and an extended length. These jumbo MADs increase the performance of management traffic. Sharing a common header with IBTA MADs allows us to share most of the MAD processing code when dealing with OPA MADs in addition to supporting some IBTA MADs on OPA devices. Add a device capability flag to indicate OPA MAD support on the device. Signed-off-by: Ira Weiny ira.we...@intel.com --- include/rdma/ib_verbs.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 3ab4033..2614233 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -128,6 +128,10 @@ enum ib_device_cap_flags { IB_DEVICE_ON_DEMAND_PAGING = (131), }; +enum ib_device_cap_flags2 { + IB_DEVICE_OPA_MAD_SUPPORT = 1 +}; + enum ib_signature_prot_cap { IB_PROT_T10DIF_TYPE_1 = 1, IB_PROT_T10DIF_TYPE_2 = 1 1, @@ -210,6 +214,7 @@ struct ib_device_attr { int sig_prot_cap; int sig_guard_cap; struct ib_odp_caps odp_caps; + u64 device_cap_flags2; u32 max_mad_size; }; Why is OPA support determined via a device capability flag ? What are the tradeoffs for doing it this way versus the other choices that have been used in the past for other RDMA technologies like RoCE, iWARP, usNIC, ... ? None of those technologies use the MAD stack for Subnet Management. Other MAD support is very limited (ie IB compatible PMA queries on the local port only). Do you have a suggestion for alternatives? Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] libibmad 1.3.12 release
There is a new release of libibmad available at: https://www.openfabrics.org/downloads/management/ md5sum: 7aabf50569c4d30a9a0689fd96097e4f libibmad-1.3.12.tar.gz Dependencies: 1) libibumad = 1.3.7 2) opensm-libs = 3.3.10 3) ib_umad kernel module Release notes for 1.3.11 to 1.3.12 are: 1) bug fixes 2) new Fields for Port Extended Speeds Updates since 1.3.11 Hal Rosenstock : fields.c: Fix commentary typo Al Chu : libibmad/src/rpc.c: Remove superfluous packet dump Hal Rosenstock : fields.c: Print SwitchCongestionSetting threshold fields in hex rather than decimal Dan Ben Yosef : libibmad: Add new fields for SM:PortInfoExtended and for PM:PortExtendedSpeedsCounters Hal Rosenstock : dump.c: In mad_dump_linkwidth, output undefined widths -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] infiniband-diags 1.6.5 release
There is a new release of infiniband-diags at: https://www.openfabrics.org/downloads/management/ md5sum: 2a587bda5fc8287643c83363e4b6ec21 infiniband-diags-1.6.5.tar.gz Dependencies: 1) libibmad = 1.3.12 2) libibumad = 1.3.7 3) opensm-libs = 3.3.10 4) ib_umad kernel module 5) glib2 Release notes v1.6.4 = 1.6.5 1) bug fixes 2) updates for new hardware device IDs 3) rdma-ndd daemon Authors since 1.6.4 Dan Ben Yosef da...@mellanox.com ibtracert.c: Remove checking the port 0 state for base switch port 0 Add new configure flag to enable/disable the rdma-ndd build New MAD SM:PortInfoExtended and changing PM:PortExtendedSpeedsCounters Fix missing source dir in configure.ac Hal Rosenstock h...@dev.mellanox.co.il Align infiniband-diags MLNX device IDs with OpenSM vendstat.c: Add additional vendor IDs to ext_fw_info_device ibnetdisc.c: Add additional supported device IDs to is_mlnx_ext_port_info_supported ibdiag_common.c: Add more supported device IDs in is_mlnx_ext_port_info_supported ibccconfig.c: Account for enhanced switch port 0 in SwitchPortCongestionSetting handling ibccquery.c: Account for enhanced switch port 0 in SwitchPortCongestionSetting handling saquery.c: Return proper status from query_sa_cpi on bad result status saquery.c: Fix handling of cpi and ClassPortInfo options ibping.c: Fix support of OUIs other than IB_OPENIB_OUI ibportstate.c: Cosmetic changes to MLNX Extended PortInfo saquery.c: In query_sa_cpi, make error message consistent saquery.c: In print_node_record, change LID print format Ira Weiny ira.we...@intel.com infiniband-diags: update for udev changes. infiniband-diags/test-code: remove unused variables infiniband-diags: add rdma-ndd daemon infiniband-diags: update smpquery man page infiniband-diags: If rst2man is available fix missing source dir in configure -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 00/19] IB/mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing.
On 2/4/2015 6:29 PM, ira.we...@intel.com wrote: From: Ira Weiny ira.we...@intel.com The following patch series modifies the kernel MAD processing (ib_mad/ib_umad) and related interfaces to send and receive Intel Omni-Path Architecture MADs on devices which support them. In addition to supporting some IBTA management classes, OPA devices use MADs with lengths up to 2K. These jumbo MADs increase the performance of management traffic. To distinguish IBTA MADs from OPA MADs a new Base Version is introduced. With your recent changes, I don't think that statement above is strictly true any longer. While OPA does use a different base version for it's jumbo MADs, aren't OPA MADs distinguished from IBTA MADs by the new OPA MAD device capability bit ? True. However, OPA MADs with a base version of 0x1 are compatible with and therefore can be processed by the same code as IBTA MADs. If I need to respin the series I will update the comment. What performance tests were run in terms of IBTA MADs ? Sorry for the delay. As there have been a couple of versions of this series since I ran those tests in December I took the time to re-run these tests. OpenSM (sweep time) and infiniband-diag (iblinkinfo and saquery) were timed on my small cluster with no noticeable change in performance. But this is not the best test as I only have 6 or so nodes on 2 switches. For example iblinkinfo runs very quickly: [root@phcppriv12 OPENIB_FF]# time iblinkinfo /dev/null real0m0.072s user0m0.002s sys 0m0.041s The better test we have at this small scale are a couple of tools (closed source) which send SMA and PMA packets as rapidly as possible. Those showed no difference in performance. For example I ran these tools with 3 different kernels 1) stock roland, 2) the series in question up to the kmalloc patch, and final 3) then the full OPA series. Here is a summary of the results: Roland for-next (ecb7b12) up to kmalloc patch full OPA patch set SMA2107221324 21381 SMA rcv 1713917329 17303 PMA24159.4 24401.2 24166.5 PMA rcv 24159.4 24401.2 24166.5 NOTE: The results shown above are specifically shown without units as I am not allowed to publish performance numbers. However larger numbers equal better performance. The numbers are quite repeatable and as you can see are all close to each other from kernel to kernel. The remote node for these tests was running the stock RHEL7 kernel. All software and hardware was held constant except for the kernel patches in question. Ira -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask
On 05/02/2015 04:54, Weiny, Ira wrote: On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote: Anyway, I recognize that uverb way of abusing write() syscall is borderline (at best) regarding other Linux subsystems and Unix paradigm in general. But it's not enough to screw it more. Then we must return the correct output size explicitly in the struct. I was thinking this very same thing as I read through this thread. I too would like to avoid the use of comp_masks if at all possible. The query call seems to be a verb where the structure size is all you really need to know the set of values returned. As Jason says, other calls may require the comp_mask where 0 is not sufficient to indicate missing. Would it be okay to return it in the ib_uverbs_cmd_hdr.out_words? That would further abuse the write() syscall by writing to the input buffer. I don't think that is such a great idea. However, the only other alternative I see is to add it explicitly to every uverb response struct. I think this is the best solution. There is a 32 bit reserved field in ib_uverbs_ex_query_device_resp. Could we use all or part of that to be the size? For other extended commands I'm not sure what to do. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCHv2 infiniband-diags] ibtracert.c: Remove checking the port 0 state for base switch port 0
From: Dan Ben Yosef da...@mellanox.com The port 0 state check needed to be only for HCA/Routers and EnhancedPort0 switches. For BasePort0 switches, the PortState field, in PortInfo arrtibute, for port0 is undefined. Signed-off-by: Dan Ben Yosef da...@mellanox.com Thanks applied, Ira --- Changes since v1: Changed is_route_inactive_port0 to is_port_inactive Moved description from above use of routine to above routine (is_port_inactive) diff --git a/src/ibtracert.c b/src/ibtracert.c index d32968b..9414618 100644 --- a/src/ibtracert.c +++ b/src/ibtracert.c @@ -92,6 +92,7 @@ struct Switch { int mccap; int linearFDBtop; int fdb_base; + int enhsp0; int8_t fdb[64]; char switchinfo[64]; }; @@ -114,6 +115,24 @@ struct Node { Node *nodesdist[MAXHOPS]; uint64_t target_portguid; +/* + * is_port_inactive + * Checks whether or not the port state is other than active. + * The sw argument is only relevant when the port is on a + * switch; for HCAs and routers, this argument is ignored. + * Returns 1 when port is not active and 0 when active. + * Base switch port 0 is considered always active. + */ +static int is_port_inactive(Node * node, Port * port, Switch * sw) { + int res = 0; + if (port-state != 4 + (node-type != IB_NODE_SWITCH || + (node-type == IB_NODE_SWITCH sw-enhsp0))) + res = 1; + return res; +} + static int get_node(Node * node, Port * port, ib_portid_t * portid) { void *pi = port-portinfo, *ni = node-nodeinfo, *nd = node-nodedesc; @@ -164,6 +183,7 @@ static int switch_lookup(Switch * sw, ib_portid_t * portid, int lid) mad_decode_field(si, IB_SW_LINEAR_FDB_CAP_F, sw-linearcap); mad_decode_field(si, IB_SW_LINEAR_FDB_TOP_F, sw- linearFDBtop); + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, sw-enhsp0); if (lid = sw-linearcap lid sw-linearFDBtop) return -1; @@ -248,7 +268,7 @@ static int find_route(ib_portid_t * from, ib_portid_t * to, int dump) Port *port, fromport, toport, nextport; Switch sw; int maxhops = MAXHOPS; - int portnum, outport; + int portnum, outport = 255, next_sw_outport = 255; memset(fromnode,0,sizeof(Node)); memset(tonode,0,sizeof(Node)); @@ -274,20 +294,25 @@ static int find_route(ib_portid_t * from, ib_portid_t * to, int dump) portnum = port-portnum; dump_endnode(dump, From, node, port); + if (node-type == IB_NODE_SWITCH) { + next_sw_outport = switch_lookup(sw, from, to-lid); + if (next_sw_outport 0 || next_sw_outport node-numports) { + /* needed to print the port in badtbl */ + outport = next_sw_outport; + goto badtbl; + } + } while (maxhops--) { - if (port-state != 4) + if (is_port_inactive(node, port, sw)) goto badport; if (sameport(port, toport)) break; /* found */ - outport = portnum; if (node-type == IB_NODE_SWITCH) { DEBUG(switch node); - if ((outport = switch_lookup(sw, from, to-lid)) 0 || - outport node-numports) - goto badtbl; + outport = next_sw_outport; if (extend_dpath(from-drpath, outport) 0) goto badpath; @@ -307,6 +332,7 @@ static int find_route(ib_portid_t * from, ib_portid_t * to, int dump) (node-type == IB_NODE_ROUTER)) { int ca_src = 0; + outport = portnum; DEBUG(ca or router node); if (!sameport(port, fromport)) { IBWARN @@ -335,8 +361,19 @@ static int find_route(ib_portid_t * from, ib_portid_t * to, int dump) nextport.portnum = from-drpath.p[from-drpath.cnt + 1]; } + /* only if the next node is a switch, get switch info */ + if (nextnode.type == IB_NODE_SWITCH) { + next_sw_outport = switch_lookup(sw, from, to-lid); + if (next_sw_outport 0 || + next_sw_outport nextnode.numports) { + /* needed to print the port in badtbl */ + outport = next_sw_outport; + goto badtbl; + } + } + port = nextport; - if (port-state != 4) + if (is_port_inactive(nextnode, port, sw)) goto badoutport; node = nextnode; portnum = port-portnum; -- To unsubscribe from this list: send the line
RE: [PATCH infiniband-diags] Add new configure flag to enable/disable the rdma-ndd build
From: Dan Ben Yosef da...@mellanox.com Add new flag --enable-rdma-ndd (default=yes) Signed-off-by: Dan Ben Yosef da...@mellanox.com Thanks applied, Ira --- diff --git a/Makefile.am b/Makefile.am index 4e08c2b..63c4b48 100644 --- a/Makefile.am +++ b/Makefile.am @@ -15,9 +15,11 @@ sbin_PROGRAMS = src/ibaddr src/ibnetdiscover src/ibping src/ibportstate \ src/perfquery src/sminfo src/smpdump src/smpquery \ src/saquery src/vendstat src/iblinkinfo \ src/ibqueryerrors src/ibcacheedit src/ibccquery \ - src/ibccconfig \ - src/dump_fts \ - src/rdma-ndd + src/ibccconfig src/dump_fts + +if ENABLE_RDMA_NDD +sbin_PROGRAMS += src/rdma-ndd +endif if ENABLE_TEST_UTILS sbin_PROGRAMS += src/ibsendtrap src/mcm_rereg_test @@ -71,10 +73,13 @@ man_MANS = doc/man/ibaddr.8 \ doc/man/smpquery.8 \ doc/man/vendstat.8 \ doc/man/infiniband-diags.8 \ - doc/man/rdma-ndd.8 \ man/dump_lfts.8 \ man/dump_mfts.8 +if ENABLE_RDMA_NDD +man_MANS += doc/man/rdma-ndd.8 +endif + # define this for the dist target compat_man_pages = man/ibdiscover.8 man/ibcheckerrors.8 man/ibcheckerrs.8 \ man/ibchecknet.8 man/ibchecknode.8 man/ibcheckport.8 \ @@ -161,6 +166,8 @@ install-data-hook: $(top_srcdir)/config/install-sh -c -m 444 $(top_srcdir)/etc/error_thresholds $(DESTDIR)/$(sysconfdir)/infiniband-diags $(top_srcdir)/config/install-sh -c -m 400 $(top_srcdir)/etc/ibdiag.conf $(DESTDIR)/$(sysconfdir)/infiniband-diags +if ENABLE_RDMA_NDD install-exec-hook: $(top_srcdir)/config/install-sh -m 755 -d $(DESTDIR)/$(sysconfdir)/init.d $(top_srcdir)/config/install-sh -m 755 $(top_srcdir)/etc/rdma-ndd.init $(DESTDIR)/$(sysconfdir)/init.d/rdma-ndd +endif diff --git a/configure.ac b/configure.ac index ed1b834..11f47f7 100644 --- a/configure.ac +++ b/configure.ac @@ -180,6 +180,18 @@ fi AC_SUBST(ibnetdisc_api_version) dnl End libibnetdisc stuff +dnl with or with rdma-ndd +AC_MSG_CHECKING(for --enable-rdma-ndd) +AC_ARG_ENABLE(rdma-ndd, +[ --enable-rdma-ndd build rdma-ndd (default=yes)], [case +${enableval} in + yes) rdmandd=yes ;; + no) rdmandd=no ;; + *) AC_MSG_ERROR(bad value ${enableval} for --enable-rdma-ndd) ;; +esac],[rdmandd=yes]) +AM_CONDITIONAL(ENABLE_RDMA_NDD, test x$rdmandd = xyes) +AC_MSG_RESULT(${rdmandd=no}) + dnl configures for rdma-ndd startup script default_rdma_service=openibd AC_ARG_WITH([rdma_service], @@ -198,9 +210,12 @@ default_start=null AC_SUBST(DEFAULT_START, $default_start) AC_SUBST(DEFAULT_STOP, $default_stop) -AC_CHECK_HEADER(libudev.h, with_udev=yes, with_udev=no) - AC_CHECK_LIB(udev, udev_monitor_ref, [], AC_MSG_ERROR(libudev is required for rdma-ndd...)) -AC_CHECK_FUNCS_ONCE(udev_get_sys_path) +if test x$rdmandd = xyes; then +AC_CHECK_HEADER(libudev.h, with_udev=yes, with_udev=no) +AC_CHECK_LIB(udev, udev_monitor_ref, [], AC_MSG_ERROR(libudev is required for rdma-ndd...)) +AC_CONFIG_FILES([doc/man/rdma-ndd.8 etc/rdma-ndd.init]) +AC_CHECK_FUNCS_ONCE(udev_get_sys_path) +fi dnl Generate doc/man/*.in files if possible DOC_DATE=`date +%Y-%m-%d` @@ -275,8 +290,6 @@ AC_CONFIG_FILES([\ doc/man/smpquery.8 \ doc/man/vendstat.8 \ doc/man/infiniband-diags.8 \ - doc/man/rdma-ndd.8 \ - etc/rdma-ndd.init \ libibnetdisc/Makefile \ ]) AC_OUTPUT -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH infiniband-diags] Align infiniband-diags MLNX device IDs with OpenSM
Same device IDs as in opensm/osm_subnet.c:is_mlnx_ext_port_info_supported Switch-IB is device ID 0xcb20 rather than 0xcb84 Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks applied, Ira --- diff --git a/libibnetdisc/src/ibnetdisc.c b/libibnetdisc/src/ibnetdisc.c index f3c6000..e346905 100644 --- a/libibnetdisc/src/ibnetdisc.c +++ b/libibnetdisc/src/ibnetdisc.c @@ -203,7 +203,7 @@ static int is_mlnx_ext_port_info_supported(ibnd_port_t * port) { uint16_t devid = (uint16_t) mad_get_field(port-node-info, 0, IB_NODE_DEVID_F); - if (devid == 0xc738 || devid == 0xcb84) + if ((devid = 0xc738 devid = 0xc73b) || devid == 0xcb20) return 1; if (devid = 0x1003 devid = 0x1016) return 1; diff --git a/src/ibdiag_common.c b/src/ibdiag_common.c index 666edbc..e09623d 100644 --- a/src/ibdiag_common.c +++ b/src/ibdiag_common.c @@ -530,7 +530,7 @@ int is_port_info_extended_supported(ib_portid_t * dest, int port, int is_mlnx_ext_port_info_supported(uint32_t devid) { if (ibd_ibnetdisc_flags IBND_CONFIG_MLX_EPI) { - if (devid == 0xc738 || devid == 0xcb84) + if ((devid = 0xc738 devid = 0xc73b) || devid == 0xcb20) return 1; if (devid = 0x1003 devid = 0x1016) return 1; diff --git a/src/vendstat.c b/src/vendstat.c index 7fc4a11..11ee73e 100644 --- a/src/vendstat.c +++ b/src/vendstat.c @@ -144,8 +144,8 @@ typedef struct { static uint16_t ext_fw_info_device[][2] = { {0x0245, 0x0245}, /* Switch-X */ - {0xc738, 0xc738}, /* Switch-X */ - {0xcb84, 0xcb84}, /* Switch-IB */ + {0xc738, 0xc73b}, /* Switch-X */ + {0xcb20, 0xcb20}, /* Switch-IB */ {0x01b3, 0x01b3}, /* IS-4 */ {0x1003, 0x1016}, /* Connect-X */ {0x, 0x}}; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RE-RESEND V2 for-next 4/5] IB/core: Make sure that the PSN does not overflow
Subject: [PATCH RE-RESEND V2 for-next 4/5] IB/core: Make sure that the PSN does not overflow Is there a particular bug which this fixes? Ira From: Majd Dibbiny m...@mellanox.com The rq/sq-psn is 24 bits as defined in the IB spec, therefore ULPs and User space applications shouldn't use the 8 most significant bits in the 32 bits variables to avoid overflow in modify_qp. Fixed the PSN generation by the RDMA-CM to mask out the 8 most significant bits, also mask out these bits in uverbs for attributes provided by user-space. Signed-off-by: Majd Dibbiny m...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/cma.c|1 + drivers/infiniband/core/uverbs_cmd.c |4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d570030..fab0ee5 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -512,6 +512,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, INIT_LIST_HEAD(id_priv-listen_list); INIT_LIST_HEAD(id_priv-mc_list); get_random_bytes(id_priv-seq_num, sizeof id_priv-seq_num); + id_priv-seq_num = 0xff; return id_priv-id; } diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 532d8eb..ecb6430 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -2053,8 +2053,8 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file, attr-path_mtu= cmd.path_mtu; attr-path_mig_state = cmd.path_mig_state; attr-qkey= cmd.qkey; - attr-rq_psn = cmd.rq_psn; - attr-sq_psn = cmd.sq_psn; + attr-rq_psn = cmd.rq_psn 0xff; + attr-sq_psn = cmd.sq_psn 0xff; attr-dest_qp_num = cmd.dest_qp_num; attr-qp_access_flags = cmd.qp_access_flags; attr-pkey_index = cmd.pkey_index; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask
On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote: Anyway, I recognize that uverb way of abusing write() syscall is borderline (at best) regarding other Linux subsystems and Unix paradigm in general. But it's not enough to screw it more. Then we must return the correct output size explicitly in the struct. I was thinking this very same thing as I read through this thread. I too would like to avoid the use of comp_masks if at all possible. The query call seems to be a verb where the structure size is all you really need to know the set of values returned. As Jason says, other calls may require the comp_mask where 0 is not sufficient to indicate missing. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IB/mad: fix ifnullfree.cocci warnings
-Original Message- From: Wu, Fengguang Sent: Tuesday, January 27, 2015 1:36 PM To: Weiny, Ira Cc: kbuild-...@01.org; Roland Dreier; Hefty, Sean; Hal Rosenstock; Or Gerlitz; Yan Burman; linux-rdma@vger.kernel.org; linux-ker...@vger.kernel.org Subject: [PATCH] IB/mad: fix ifnullfree.cocci warnings drivers/infiniband/core/mad.c:2088:3-8: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values. NULL check before some freeing functions is not needed. Based on checkpatch warning kfree(NULL) is safe this check is probably not required and kfreeaddr.cocci by Julia Lawall. Generated by: scripts/coccinelle/free/ifnullfree.cocci Signed-off-by: Fengguang Wu fengguang...@intel.com Signed-off-by: Ira Weiny ira.we...@intel.com --- mad.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2084,8 +2084,7 @@ out: /* Post another receive request for this QP */ if (response) { ib_mad_post_receive_mads(qp_info, response); - if (recv) - kfree(recv); + kfree(recv); } else ib_mad_post_receive_mads(qp_info, recv); } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IB/mad: fix ifnullfree.cocci warnings
drivers/infiniband/core/mad.c:2088:3-8: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values. NULL check before some freeing functions is not needed. Based on checkpatch warning kfree(NULL) is safe this check is probably not required and kfreeaddr.cocci by Julia Lawall. Generated by: scripts/coccinelle/free/ifnullfree.cocci Signed-off-by: Fengguang Wu fengguang...@intel.com Signed-off-by: Ira Weiny ira.we...@intel.com Sorry this should have been: Reviewed-by: Ira Weiny ira.we...@intel.com Tested-by: Ira Weiny ira.we...@intel.com --- mad.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2084,8 +2084,7 @@ out: /* Post another receive request for this QP */ if (response) { ib_mad_post_receive_mads(qp_info, response); - if (recv) - kfree(recv); + kfree(recv); } else ib_mad_post_receive_mads(qp_info, recv); } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH infiniband-diags] ibtracert.c: Remove checking the port 0 state for BasePort0 switches
Sorry for the delay. -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Hal Rosenstock Sent: Tuesday, January 20, 2015 5:00 AM To: Weiny, Ira Cc: Dan Ben-Yosef; linux-rdma (linux-rdma@vger.kernel.org) Subject: [PATCH infiniband-diags] ibtracert.c: Remove checking the port 0 state for BasePort0 switches From: Dan Ben Yosef da...@mellanox.com The port 0 state check is needed only for HCA/Routers and EnhancedPort0 switches. For BasePort0 switches, the PortState field in the PortInfo attribute for port0 is undefined (not used). Signed-off-by: Dan Ben Yosef da...@mellanox.com --- src/ibtracert.c | 47 --- 1 files changed, 40 insertions(+), 7 deletions(-) diff --git a/src/ibtracert.c b/src/ibtracert.c index d32968b..326200b 100644 --- a/src/ibtracert.c +++ b/src/ibtracert.c @@ -92,6 +92,7 @@ struct Switch { int mccap; int linearFDBtop; int fdb_base; + int enhsp0; int8_t fdb[64]; char switchinfo[64]; }; @@ -114,6 +115,17 @@ struct Node { Node *nodesdist[MAXHOPS]; uint64_t target_portguid; +static int is_route_inactive_port0(Node * node, Port * port, Switch * +sw) { I would prefer this function be named something like: port_inactive_not_bsp0 The current name is confusing as to what the logic is. + int res = 0; + /* not active for HCA and for enhanced port0 Switches */ + if (port-state != 4 Please use #defines here even though the original code did not. + (node-type != IB_NODE_SWITCH || + (node-type == IB_NODE_SWITCH sw-enhsp0))) + res = 1; + return res; +} + static int get_node(Node * node, Port * port, ib_portid_t * portid) { void *pi = port-portinfo, *ni = node-nodeinfo, *nd = node-nodedesc; @@ -164,6 +176,7 @@ static int switch_lookup(Switch * sw, ib_portid_t * portid, int lid) mad_decode_field(si, IB_SW_LINEAR_FDB_CAP_F, sw-linearcap); mad_decode_field(si, IB_SW_LINEAR_FDB_TOP_F, sw- linearFDBtop); + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, sw-enhsp0); if (lid = sw-linearcap lid sw-linearFDBtop) return -1; @@ -248,7 +261,7 @@ static int find_route(ib_portid_t * from, ib_portid_t * to, int dump) Port *port, fromport, toport, nextport; Switch sw; int maxhops = MAXHOPS; - int portnum, outport; + int portnum, outport = 255, next_sw_outport = 255; memset(fromnode,0,sizeof(Node)); memset(tonode,0,sizeof(Node)); @@ -274,20 +287,28 @@ static int find_route(ib_portid_t * from, ib_portid_t * to, int dump) portnum = port-portnum; dump_endnode(dump, From, node, port); + if (node-type == IB_NODE_SWITCH) { + next_sw_outport = switch_lookup(sw, from, to-lid); + if (next_sw_outport 0 || next_sw_outport node-numports) { + /* needed to print the port in badtbl */ + outport = next_sw_outport; + goto badtbl; + } + } while (maxhops--) { - if (port-state != 4) + /* checking if the port state isn't active. + * The sw argument is only relevant when the port is a + * switch port for HCAs this argument is ignored */ This comment should go above the function signature and be enhanced to clarify that it is checking for port state inactive except for BSP0 which has no state. Ira + if (is_route_inactive_port0(node, port, sw)) goto badport; if (sameport(port, toport)) break; /* found */ - outport = portnum; if (node-type == IB_NODE_SWITCH) { DEBUG(switch node); - if ((outport = switch_lookup(sw, from, to-lid)) 0 || - outport node-numports) - goto badtbl; + outport = next_sw_outport; if (extend_dpath(from-drpath, outport) 0) goto badpath; @@ -307,6 +328,7 @@ static int find_route(ib_portid_t * from, ib_portid_t * to, int dump) (node-type == IB_NODE_ROUTER)) { int ca_src = 0; + outport = portnum; DEBUG(ca or router node); if (!sameport(port, fromport)) { IBWARN @@ -335,8 +357,19 @@ static int find_route(ib_portid_t * from, ib_portid_t * to, int dump) nextport.portnum = from-drpath.p[from-drpath.cnt + 1]; } + /* only if the next node is a switch, get switch info */ + if (nextnode.type == IB_NODE_SWITCH
RE: [PATCH infiniband-diags] ibtracert.c: Remove checking the port 0 state for BasePort0 switches
+static int is_route_inactive_port0(Node * node, Port * port, Switch +* +sw) { I would prefer this function be named something like: port_inactive_not_bsp0 The current name is confusing as to what the logic is. How about is_port_inactive ? That is fine. + int res = 0; + /* not active for HCA and for enhanced port0 Switches */ + if (port-state != 4 Please use #defines here even though the original code did not. This is an issue throughout infiniband-diags. I think it should be addressed by separate patch(es) for this. I would rather see us do the right thing going forward. This saves work trying to find all these majic numbers later. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 16/20] IB/mad: Add Intel Omni-Path Architecture defines
On Mon, Jan 19, 2015 at 10:50:38PM +, Weiny, Ira wrote: On Mon, Jan 19, 2015 at 07:57:22PM +, Weiny, Ira wrote: Perhaps you can give us an example of where the current code would work without modifications if the IBTA were to define any new base or smp class version? I think the point here is that every check for OPA_MIN_CLASS_VERSION must also be combined with a check if the device in question is OPA or not. This contradicts what you say below How so? if (mad_reg_req-mgmt_class_version = OPA_MIN_CLASS_VERSION !port_priv-qp_info[qpn].supports_jumbo_mads) { Sorry, I got a bit confused because this is not the latest version of the patch. Should be: if (port_is_opa mad_reg_req-mgmt_class_version = OPA_MIN_CLASS_VERSION !port_priv-qp_info[qpn].supports_jumbo_mads) { And since port_is_opa = true implies port_priv-qp_info[qpn].supports_jumbo_mads = true the above if can never evaluate to true and can just be removed. Right there is no need to check for supports_jumbo_mads. The latest version was: + if (mad_reg_req-mgmt_class_version = OPA_MIN_CLASS_VERSION + !(port_priv-device-attributes.device_cap_flags2 IB_DEVICE_OPA_MAD_SUPPORT)) { The bug in the patch is that any class version should be allowed on IB devices. This check prevented class versions = 0x80 for IB devices. NOTE however that the current code already limits users to a class version 8. (due to the implementation.) -#define MAX_MGMT_VERSION 8 +#define MAX_MGMT_VERSION 0x83 ... if (mad_reg_req-mgmt_class_version = MAX_MGMT_VERSION) { dev_notice(device-dev, ib_register_mad_agent: invalid Class Version %u\n, mad_reg_req-mgmt_class_version); goto error1; } Therefore I did not see this as limiting the IB implementation we already have. I will remove the patch as I agree with Jason and Hal it is technically more correct. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 16/20] IB/mad: Add Intel Omni-Path Architecture defines
On Mon, Jan 19, 2015 at 07:57:22PM +, Weiny, Ira wrote: Perhaps you can give us an example of where the current code would work without modifications if the IBTA were to define any new base or smp class version? I think the point here is that every check for OPA_MIN_CLASS_VERSION must also be combined with a check if the device in question is OPA or not. This contradicts what you say below I don't see a problem with sharing the kernel structs or MAD processing, as long as all OPA specific behavior is conditionalized on the device in question being OPA - it should never be conditionalized on the class version of the MAD. commit a70a5af286b2985a0a95e9733d1e3166845a8be8 Author: Ira Weiny ira.we...@intel.com Date: Tue Sep 23 20:04:49 2014 -0400 IB/mad: Add registration check for Intel Omni-Path Architecture MADs For instance: + if (mad_reg_req-mgmt_class_version = OPA_MIN_CLASS_VERSION + !port_priv-qp_info[qpn].supports_jumbo_mads) { Is wrong because it means that no IB software can register for those MADs anymore, which is cross polluting the two architectures in the kernel. Presumably since supports_jumbo_mads is always true for OPA the above test is not required at all and this patch should be dropped. But I expect if we keep looking at uses of OPA_MIN_CLASS_VERSION we will see other cases where they need to be conditionalized? That is the only use of OPA_MIN_CLASS_VERSION. I agree with your second assessment and I'll remove the patch from the next series. Hal was this your assessment as well? Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCHv2 infiniband-diags] vendstat.c: Add additional vendor IDs to ext_fw_info_device
-Original Message- From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il] Sent: Wednesday, January 14, 2015 6:07 AM To: Weiny, Ira Cc: linux-rdma (linux-rdma@vger.kernel.org) Subject: [PATCHv2 infiniband-diags] vendstat.c: Add additional vendor IDs to ext_fw_info_device Add additional supported device IDs to ext_fw_info_device. These IDs are ConnectX-4, ConnectX-IB, and Switch-IB. This affects is_ext_fw_info_supported which determine which Mellanox specific vendor MADs are issued to a device. Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks applied, Ira --- Change since v1: Changed ConnectX-3 to Connect-IB in patch description diff --git a/src/vendstat.c b/src/vendstat.c index f28ff02..7fc4a11 100644 --- a/src/vendstat.c +++ b/src/vendstat.c @@ -145,8 +145,9 @@ typedef struct { static uint16_t ext_fw_info_device[][2] = { {0x0245, 0x0245}, /* Switch-X */ {0xc738, 0xc738}, /* Switch-X */ + {0xcb84, 0xcb84}, /* Switch-IB */ {0x01b3, 0x01b3}, /* IS-4 */ - {0x1003, 0x1011}, /* Connect-X */ + {0x1003, 0x1016}, /* Connect-X */ {0x, 0x}}; static int is_ext_fw_info_supported(uint16_t device_id) { -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 infiniband-diags] ibdiag_common.c: Add more supported device IDs in is_mlnx_ext_port_info_supported
-Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Hal Rosenstock Sent: Wednesday, January 14, 2015 6:07 AM To: Weiny, Ira Cc: linux-rdma (linux-rdma@vger.kernel.org); Or Gerlitz Subject: [PATCH v3 infiniband-diags] ibdiag_common.c: Add more supported device IDs in is_mlnx_ext_port_info_supported Add support for some ConnectX-IB, ConnectX-4, and Switch-IB in is_mlnx_ext_port_info_supported. This only affects tool(s) which invoke this routine (is_mlnx_ext_port_info_supported) which is currently just ibportstate. is_mlnx_ext_port_info_supported determines whether or not the MLNX vendor extended port info SMP is sent which is used to determine FDR10 speed. Not having the device in the list would only result in FDR10 being misinterpreted as QDR. Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks, applied, Ira --- Change since v2: Changed ConnectX-3 to ConnectX-IB in patch description Change since v1: Improved patch description diff --git a/src/ibdiag_common.c b/src/ibdiag_common.c index 8c749c7..384d342 100644 --- a/src/ibdiag_common.c +++ b/src/ibdiag_common.c @@ -499,9 +499,9 @@ conv_cnt_human_readable(uint64_t val64, float *val, int data) int is_mlnx_ext_port_info_supported(uint32_t devid) { if (ibd_ibnetdisc_flags IBND_CONFIG_MLX_EPI) { - if (devid == 0xc738) + if (devid == 0xc738 || devid == 0xcb84) return 1; - if (devid = 0x1003 devid = 0x1011) + if (devid = 0x1003 devid = 0x1016) return 1; } return 0; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCHv2 infiniband-diags] ibnetdisc.c: Add additional supported device IDs to is_mlnx_ext_port_info_supported
-Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Hal Rosenstock Sent: Wednesday, January 14, 2015 6:07 AM To: Weiny, Ira Cc: linux-rdma (linux-rdma@vger.kernel.org) Subject: [PATCHv2 infiniband-diags] ibnetdisc.c: Add additional supported device IDs to is_mlnx_ext_port_info_supported Similar to previous ibdiag_common.c patch, this adds additional device IDs to be supported to the ibnetdiscover internal is_mlnx_ext_port_info routine. These IDs are ConnectX-4, ConnectX-IB, and Switch-IB. Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks applied. Ira --- Change from v1: Changed ConnectX-3 to ConnectX-IB in patch description diff --git a/libibnetdisc/src/ibnetdisc.c b/libibnetdisc/src/ibnetdisc.c index 121fe35..f3c6000 100644 --- a/libibnetdisc/src/ibnetdisc.c +++ b/libibnetdisc/src/ibnetdisc.c @@ -203,9 +203,9 @@ static int is_mlnx_ext_port_info_supported(ibnd_port_t * port) { uint16_t devid = (uint16_t) mad_get_field(port-node-info, 0, IB_NODE_DEVID_F); - if (devid == 0xc738) + if (devid == 0xc738 || devid == 0xcb84) return 1; - if (devid = 0x1003 devid = 0x1011) + if (devid = 0x1003 devid = 0x1016) return 1; return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH TRIVIAL libibmad] fields.c: Fix commentary typo
-Original Message- From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il] Sent: Monday, January 12, 2015 11:08 AM To: Weiny, Ira Cc: linux-rdma (linux-rdma@vger.kernel.org) Subject: [PATCH TRIVIAL libibmad] fields.c: Fix commentary typo Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks applied, Ira --- diff --git a/src/fields.c b/src/fields.c index 33a6364..9965811 100644 --- a/src/fields.c +++ b/src/fields.c @@ -835,7 +835,7 @@ static const ib_field_t ib_mad_f[] = { {BITSOFFS(16, 16), ThresholdEventCounter, mad_dump_uint}, {BITSOFFS(32, 16), ThresholdCongestionEventMap, mad_dump_hex}, /* XXX: Q3/2010 errata lists offset 48, but that means field is not - * world aligned. Assume will be aligned to offset 64 later. + * word aligned. Assume will be aligned to offset 64 later. */ {BITSOFFS(64, 32), CurrentTimeStamp, mad_dump_uint}, {0, 0}, /* IB_CC_CONGESTION_LOG_CA_LAST_F */ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 16/20] IB/mad: Add Intel Omni-Path Architecture defines
Currently, these patches are claiming values that are controlled by an established standards organization. It is the equivalent of just grabbing a currently unused Ethertype value for a new protocol and then claiming that whenever the IEEE allocates it for something else you will make changes to resolve the conflict. It would be equivalent if OPA was Infiniband, but it is not. It is a separate architecture, with an entirely separate name space. Conceptually, it is similar to the IB GRH and the IPv6 header. They have a similar structure, but they are not the same. The IBTA can define a new GRH field values, such as NxtHdr, without consulting or getting approval from the IETF. Ira could introduce patches to define OPA management packets, but the general structure of those packet headers would be exactly the same as IB management packets. There's no need to introduce duplicate definitions into the code base. The existing structures can be reused. However, OPA makes use of different values for the fields. Perhaps you can give us an example of where the current code would work without modifications if the IBTA were to define any new base or smp class version? Would removing this patch satisfy your concerns? commit a70a5af286b2985a0a95e9733d1e3166845a8be8 Author: Ira Weiny ira.we...@intel.com Date: Tue Sep 23 20:04:49 2014 -0400 IB/mad: Add registration check for Intel Omni-Path Architecture MADs If the registration specifies an OPA MAD class version and the device does not support OPA MADs, fail the MAD registration. Signed-off-by: Ira Weiny ira.we...@intel.com Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 16/20] IB/mad: Add Intel Omni-Path Architecture defines
On 1/15/2015 6:30 PM, Weiny, Ira wrote: On 1/12/2015 12:11 PM, ira.we...@intel.com wrote: From: Ira Weiny ira.we...@intel.com OPA_MIN_CLASS_VERSION -- OPA Class versions are 0x80 OPA_SMP_CLASS_VERSION -- Defined at 0x80 OPA_MGMT_BASE_VERSION -- Defined at 0x80 Increase max management version to accommodate OPA Allocation of MAD base and class version numbers is owned by the IBTA. It doesn't seem appropriate to arbitrarily claim code points without proper approval. OPA is its own architecture space. While this space uses some of the same values as IB we are not claiming any IBTA values. You *are* claiming IBTA values. When the IBTA chooses to use those values, then there will be a conflict. There is no conflict. It is true that when the IBTA assigns meaning to those values the code may have to be changed to interpret this new meaning. That has to happen regardless of these patches or their meaning on OPA devices. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 16/20] IB/mad: Add Intel Omni-Path Architecture defines
On 1/12/2015 12:11 PM, ira.we...@intel.com wrote: From: Ira Weiny ira.we...@intel.com OPA_MIN_CLASS_VERSION -- OPA Class versions are 0x80 OPA_SMP_CLASS_VERSION -- Defined at 0x80 OPA_MGMT_BASE_VERSION -- Defined at 0x80 Increase max management version to accommodate OPA Allocation of MAD base and class version numbers is owned by the IBTA. It doesn't seem appropriate to arbitrarily claim code points without proper approval. OPA is its own architecture space. While this space uses some of the same values as IB we are not claiming any IBTA values. This is similar to how IB GIDs and IPv6 addresses look the same but are in separate addressing domains. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 14/20] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
On 1/12/2015 7:11 PM, ira.we...@intel.com wrote: --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h enum ib_signature_prot_cap { IB_PROT_T10DIF_TYPE_1 = 1, IB_PROT_T10DIF_TYPE_2 = 1 1, @@ -210,6 +214,7 @@ struct ib_device_attr { int sig_prot_cap; int sig_guard_cap; struct ib_odp_caps odp_caps; + u64 device_cap_flags2; Just make the existing kernel size device_cap_flags field a u64, note it's not blankly copied to user space in uverbs as part of a chunk, so just go there and copy the lower 32 bits. It is an easy change but before changing this I'd like to hear what Roland or others think. I prefer to define an additional field. Eventually we are likely to have flags which need to be communicated to user space. I think it will be cleaner at that time to report the entire new field rather than go through some sort of upper 32 bit device capability logic. Thoughts? Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 14/20] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
On 1/12/2015 7:11 PM, ira.we...@intel.com wrote: Add a device capability flag to flag OPA MAD support on devices. You should put few words here telling what is OPA/OPA MADs and why supporting them fits the IB core. See for example the IB core patch [1] that added signature verbs API is states in a manner which is both generic across vendors and across different Mellanox HCAs. Also see [2] the IB core patch that added support for BMME API as another example. I'm not quite sure what should be covered in this message which is different from the cover letter or the subsequent patches. Would this be more acceptable? quote IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag OPA MADs share a common header with IBTA MADs but with a different base version and an extended length. These jumbo MADs increase the performance of management traffic. Sharing a common header with IBTA MADs allows us to share most of the MAD processing code when dealing with OPA MADs in addition to supporting some IBTA MADs on OPA devices. Add a device capability flag to indicate OPA MAD support on the device. /quote Ira [1] 1b01d33 IB/core: Introduce signature verbs API [2] 00f7ec3 RDMA/core: Add memory management extensions support -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 02/20] IB/core: Cache device attributes for use by upper level drivers
On 1/12/2015 7:10 PM, ira.we...@intel.com wrote: From: Ira Weiny ira.we...@intel.com Please avoid empty change-logs, e.g one liner will perfectly do here. Done. b/include/rdma/ib_verbs.h index 0d74f1d..86fc90f 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1675,6 +1675,7 @@ struct ib_device { u32 local_dma_lkey; u8 node_type; u8 phys_port_cnt; + struct ib_device_attrattributes; I think cached_dev_attrs will be better name. Done for the entire series. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 05/20] IB/mad: Change cast in rcv_has_same_class
On 1/12/2015 7:10 PM, ira.we...@intel.com wrote: From: Ira Weiny ira.we...@intel.com Save dereference and clarifies that rcv_has_same_class can process both IB and OPA MADs. I don't see any clarification below... something missing here? Nothing missing. I was trying to indicate that this change simply indicates that this function can operate on any buffer which has an ib_mad_hdr. Ira Signed-off-by: Ira Weiny ira.we...@intel.com --- drivers/infiniband/core/mad.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 66b3940..819b794 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -1750,7 +1750,7 @@ static int is_rmpp_data_mad(struct ib_mad_agent_private *mad_agent_priv, static inline int rcv_has_same_class(struct ib_mad_send_wr_private *wr, struct ib_mad_recv_wc *rwc) { - return ((struct ib_mad *)(wr-send_buf.mad))-mad_hdr.mgmt_class == + return ((struct ib_mad_hdr *)(wr-send_buf.mad))-mgmt_class == rwc-recv_buf.mad-mad_hdr.mgmt_class; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 06/20] IB/core: Add mad_size to ib_device_attr
On 1/12/2015 7:10 PM, ira.we...@intel.com wrote: Change all IB drivers to report the IB management size. Do you mean the maximal size of IB MADs they support? if this is the case, please reflect it in the field name. Done. Changed to max_mad_size. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 15/20] IB/mad: Create jumbo_mad data structures
On 1/12/2015 7:11 PM, ira.we...@intel.com wrote: Define jumbo_mad and jumbo_rmpp_mad For the sake of review and maintenance, please add few more words here on what are these creatures... Proposed new text: Define jumbo_mad and jumbo_rmpp_mad. Jumbo MAD structures are 2K versions of ib_mad and ib_rmpp_mad structures. Currently only OPA base version MADs are of this type. Sound reasonable? Ira Create an RMPP Base header to share between ib_rmpp_mad and jumbo_rmpp_mad -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH infiniband-diags] ibccconfig.c: Account for enhanced switch port 0 in SwitchPortCongestionSetting handling
Enhanced switch port 0 can support CC (indicated in CC ClassPortInfo:CapabilityMask.EnhancedPort0CC). portnum 0 is a valid port number and offsets should be calculated based on port 0 rather than port 1. Also, fix some error messages to indicate switch port congestion setting rather then switch congestion setting failure. Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks applied. Ira --- diff --git a/src/ibccconfig.c b/src/ibccconfig.c index 11dfc07..22b16db 100644 --- a/src/ibccconfig.c +++ b/src/ibccconfig.c @@ -371,9 +371,6 @@ static char *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int if ((errstr = parseint(argv[5], cong_parm_marking_rate, 0))) return errstr; - if (!portnum) - return invalid port number specified; - /* Figure out number of ports first */ if (!smp_query_via(data, dest, IB_ATTR_NODE_INFO, 0, 0, srcport)) return node info config failed; @@ -389,10 +386,10 @@ static char *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int /* We are modifying only 1 port, so get the current config */ if (!cc_query_status_via(payload, dest, IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING, - (portnum - 1) / 32, 0, NULL, srcport, cckey)) - return switch congestion setting query failed; + portnum / 32, 0, NULL, srcport, cckey)) + return switch port congestion setting query failed; - ptr = payload + (((portnum % 32 - 1) * 4)); + ptr = payload + (((portnum % 32) * 4)); mad_encode_field(ptr, IB_CC_SWITCH_PORT_CONGESTION_SETTING_ELEMENT_VALID_F, @@ -415,8 +412,8 @@ static char *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int cong_parm_marking_rate); if (!cc_config_status_via(payload, rcv, dest, IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING, - (portnum - 1) / 32, 0, NULL, srcport, cckey)) - return switch congestion setting config failed; + portnum / 32, 0, NULL, srcport, cckey)) + return switch port congestion setting config failed; return NULL; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH infiniband-diags] ibccquery.c: Account for enhanced switch port 0 in SwitchPortCongestionSetting handling
Enhanced switch port 0 can support CC (indicated in CC ClassPortInfo:CapabilityMask.EnhancedPort0CC). portnum 0 is a valid port number and offsets should be calculated based on port 0 rather than port 1. Also, fix some error messages to indicate switch port congestion setting rather then switch congestion setting failure. Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks applied. Ira --- diff --git a/src/ibccquery.c b/src/ibccquery.c index ac8bca2..39e45b7 100644 --- a/src/ibccquery.c +++ b/src/ibccquery.c @@ -224,11 +224,11 @@ static char *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int memset(data, '\0', sizeof data); if (!cc_query_status_via(data, dest, IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING, - (portnum - 1) / 32, 0, NULL, srcport, cckey)) - return switch congestion setting query failed; + portnum / 32, 0, NULL, srcport, cckey)) + return switch port congestion setting query failed; mad_dump_cc_switchportcongestionsettingelement(buf, sizeof buf, -data + (((portnum % 32) - 1) * 4), +data + ((portnum % 32) * 4), 4); printf(%s, buf); return NULL; @@ -236,16 +236,16 @@ static char *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int /* else get all port info */ - maxblocks = ((numports - 1) / 32) + 1; + maxblocks = numports / 32 + 1; for (i = 0; i maxblocks; i++) { memset(data, '\0', sizeof data); if (!cc_query_status_via(data, dest, IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING, i, 0, NULL, srcport, cckey)) - return switch congestion setting query failed; + return switch port congestion setting query failed; - for (j = 0; j 32 outputcount numports; j++) { - printf(Port:%u\n, i * 32 + j + 1); + for (j = 0; j 32 outputcount = numports; j++) { + printf(Port:%u\n, i * 32 + j); mad_dump_cc_switchportcongestionsettingelement(buf, sizeof buf, data + j * 4, 4); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] libibmad/src/rpc.c: Remove superfluous packet dump
Any reason this should not be removed from mad_rpc_rmpp as well? Ira -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Albert Chu Sent: Friday, December 19, 2014 11:08 AM To: linux-rdma@vger.kernel.org Subject: [PATCH] libibmad/src/rpc.c: Remove superfluous packet dump When idebug == 1, mad_rpc outputs only the mad data response in debug output. It does not output the request mad data. This is confusing since it's not clear if it's request or response output. When idebug 1, full mad send and receive buffers are output, which makes the mad data output with idebug == 1 superfluous. Remove it to remove confusion. Signed-off-by: Albert Chu ch...@llnl.gov --- src/rpc.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/src/rpc.c b/src/rpc.c index 8d961f2..202fac5 100644 --- a/src/rpc.c +++ b/src/rpc.c @@ -272,11 +272,6 @@ void *mad_rpc(const struct ibmad_port *port, ib_rpc_t * rpc, return NULL; } - if (ibdebug) { - IBWARN(data offs %d sz %d, rpc-dataoffs, rpc-datasz); - xdump(stderr, mad data\n, mad + rpc-dataoffs, rpc- datasz); - } - if (rcvdata) memcpy(rcvdata, mad + rpc-dataoffs, rpc-datasz); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCHv2 libibmad] fields.c: Print SwitchCongestionSetting threshold fields in hex rather than decimal
for consistency with CC Annex which shows meaning of threshold fields in hex not decimal Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks applied. Ira --- Change since v1: Change fields to hex rather than SwitchPortCongestionSetting:Threshold to be consistent with CC Annex diff --git a/src/fields.c b/src/fields.c index 33a6364..9965811 100644 --- a/src/fields.c +++ b/src/fields.c @@ -858,9 +858,9 @@ static const ib_field_t ib_mad_f[] = { {0, 32, Control_Map, mad_dump_hex}, {32, 256, Victim_Mask, mad_dump_array}, {288, 256, Credit_Mask, mad_dump_array}, - {BITSOFFS(544, 4), Threshold, mad_dump_uint}, + {BITSOFFS(544, 4), Threshold, mad_dump_hex}, {BITSOFFS(552, 8), Packet_Size, mad_dump_uint}, - {BITSOFFS(560, 4), CS_Threshold, mad_dump_uint}, + {BITSOFFS(560, 4), CS_Threshold, mad_dump_hex}, {BITSOFFS(576, 16), CS_ReturnDelay, mad_dump_hex}, /* TODO: CCT dump */ {BITSOFFS(592, 16), Marking_Rate, mad_dump_uint}, {0, 0}, /* IB_CC_SWITCH_CONGESTION_SETTING_LAST_F */ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH infiniband-diags] ibccconfig.c: Account for enhanced switch port 0 in SwitchPortCongestionSetting handling
[off-list] This replaces the first one? -Original Message- From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il] Sent: Friday, January 09, 2015 7:18 AM To: Weiny, Ira Cc: linux-rdma (linux-rdma@vger.kernel.org); Chu, Al Subject: [PATCH infiniband-diags] ibccconfig.c: Account for enhanced switch port 0 in SwitchPortCongestionSetting handling Enhanced switch port 0 can support CC (indicated in CC ClassPortInfo:CapabilityMask.EnhancedPort0CC). portnum 0 is a valid port number and offsets should be calculated based on port 0 rather than port 1. Also, fix some error messages to indicate switch port congestion setting rather then switch congestion setting failure. Signed-off-by: Hal Rosenstock h...@mellanox.com --- diff --git a/src/ibccconfig.c b/src/ibccconfig.c index 11dfc07..22b16db 100644 --- a/src/ibccconfig.c +++ b/src/ibccconfig.c @@ -371,9 +371,6 @@ static char *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int if ((errstr = parseint(argv[5], cong_parm_marking_rate, 0))) return errstr; - if (!portnum) - return invalid port number specified; - /* Figure out number of ports first */ if (!smp_query_via(data, dest, IB_ATTR_NODE_INFO, 0, 0, srcport)) return node info config failed; @@ -389,10 +386,10 @@ static char *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int /* We are modifying only 1 port, so get the current config */ if (!cc_query_status_via(payload, dest, IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING, - (portnum - 1) / 32, 0, NULL, srcport, cckey)) - return switch congestion setting query failed; + portnum / 32, 0, NULL, srcport, cckey)) + return switch port congestion setting query failed; - ptr = payload + (((portnum % 32 - 1) * 4)); + ptr = payload + (((portnum % 32) * 4)); mad_encode_field(ptr, IB_CC_SWITCH_PORT_CONGESTION_SETTING_ELEMENT_VALID_F, @@ -415,8 +412,8 @@ static char *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int cong_parm_marking_rate); if (!cc_config_status_via(payload, rcv, dest, IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING, - (portnum - 1) / 32, 0, NULL, srcport, cckey)) - return switch congestion setting config failed; + portnum / 32, 0, NULL, srcport, cckey)) + return switch port congestion setting config failed; return NULL; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device
On 11/27/2014 3:51 PM, Sagi Grimberg wrote: We're already short on bits in device_cap_flags no shortage @ the kernel... we can add more 32 bits if/when we need it Why is there a gap in the bits? enum ib_device_cap_flags { ... IB_DEVICE_MEM_WINDOW_TYPE_2B= (124), IB_DEVICE_MANAGED_FLOW_STEERING = (129), ... }; 25-28 are not used? Is there some legacy software out there which uses these? Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device
On Mon, Dec 8, 2014 at 2:23 AM, Weiny, Ira ira.we...@intel.com wrote: I find it very annoying that upper level drivers replicate in different ways elements from the IB device attributes returned by ib_query_device. I met that in multiple drivers and upcoming designs for which I do code review. Are you up to come up with a patch that caches the device attributes on the device structure? I don't follow what you are asking for. Could you give more details? 1. add a struct ib_device_attr field to struct ib_device 2. when the device registers itself with the IB core, go and run the query_device verb with the param being pointer to that field I see where you are going. Then the MAD stack does not have to cache a max_mad_size value but rather looks in the ib_device structure on the fly... So, something like the diff below? What are the chances we end up with attributes which are not constant? If Roland would like to go this way I can rework my series based on the attributes being cached. -- Ira 17:15:59 git di diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 18c1ece..db18795 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -322,6 +322,8 @@ int ib_register_device(struct ib_device *device, client-add(device); } + device-query_device(device, device-attributes); + out: mutex_unlock(device_mutex); return ret; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 470a011..241a882 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1630,6 +1630,7 @@ struct ib_device { u32 local_dma_lkey; u8 node_type; u8 phys_port_cnt; + struct ib_device_attrattributes; }; struct ib_client {
RE: [RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device
On Sun, Dec 7, 2014 at 4:23 PM, Weiny, Ira ira.we...@intel.com wrote: I don't think this is a bad idea, however, the complication is determining the kmem_cache object size in that case. How about we limit this value to 2K until such time as there is a need for larger support? Can we just get rid of all the kmem_caches used in the MAD code? I don't think any of this is so performance-critical that we couldn't just use kmalloc... On an SM node the MAD code is _very_ performance-critical. I don’t think we should give that up. Another alternative is to have a cache per device based on the size reported. -- Ira N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: [RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device
On 11/13/2014 9:54 PM, ira.we...@intel.com wrote: Add a device capability flag for OPA devices to signal their support of jumbo MADs. Check for IB_DEVICE_JUMBO_MAD_SUPPORT in the device capabilities and if supported mark the special QPs created. Ira, Few comments here, the device capability is OK, specifically if it helps for the mad layer logic to be simpler and/or more robust. You should add device attribute telling what is the size of MADs supported by the device, I think the IBTA mandated 256Band in the OPA case, the driver will fill there 2k. I don't think this is a bad idea, however, the complication is determining the kmem_cache object size in that case. How about we limit this value to 2K until such time as there is a need for larger support? You can't just state that (jumbo == 2k) and go to complete design/changes which use this hard-coded assumption. You need to see how to re-spin this series w.o this assumption. I'm looking into it. I believe I will still need to have a capability bit, but it may end up having a different meaning. I find it very annoying that upper level drivers replicate in different ways elements from the IB device attributes returned by ib_query_device. I met that in multiple drivers and upcoming designs for which I do code review. Are you up to come up with a patch that caches the device attributes on the device structure? I don't follow what you are asking for. Could you give more details? if not, I can do that.. and have your code to see it. Also, I would go and unify (squash) this patch and the preceding one. I'll keep this in mind as I rework the patches. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 07/16] ib/mad: create a jumbo MAD kmem_cache
On 11/13/2014 9:54 PM, ira.we...@intel.com wrote: @@ -773,7 +782,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv, } local-mad_priv = NULL; local-recv_mad_agent = NULL; - mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_ATOMIC); + + if (mad_agent_priv-qp_info-supports_jumbo_mads) + mad_priv = kmem_cache_alloc(jumbo_mad_cache, GFP_ATOMIC); + else + mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_ATOMIC); + @ minimum (if you really think that one kmem cache for both jumbo and non- jumbo mads isn't the way to get) lets have one pointer that is directed to the cache you want to use and this way all branch as the above one and it's such can be avoided, right? That is a good idea, however, I'm going to address your other comments before changing anything here. -- Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 00/16] ib_mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing.
[RFC PATCH 07/16] ib/mad: create a jumbo MAD kmem_cache why not use a single kmem-cache instance with a non hard coded element size, 256B (or whatever we use today) or 2KB? I wanted to be able to adjust the element count of the caches separately to better tune overall memory usage. However, I stopped short of adding additional module parameters to adjust the 2K cache at this time. I tend to think that the resulted code is too much of a special purpose one under a (jumbo == 2K) assumption. See some more comments in the individual patches and we'll take it from there. Ok, I'll address those comments in the other email threads. Also (nit), please change the prefix for all patches to be IB/mad: and not ib/mad: to comply with the existing habit of patch titles for the IB subsystem I will thanks. Good. See below another easy-to-fix nitpicking comment, but before that, for the sake of easier review and post-robustness of the code to future bisections, please do a re-ordering of the series such that all general refactoring and pre- patches come before the OPApatches. This goes to re-order the current series such tat patches 8/9/10 are located after patch 14, as listed here: [RFC PATCH 01/16] ib/mad: rename is_data_mad to is_rmpp_data_mad [RFC PATCH 02/16] ib/core: add IB_DEVICE_JUMBO_MAD_SUPPORT device cap [RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device [RFC PATCH 04/16] ib/mad: add base version parameter to [RFC PATCH 05/16] ib/mad: Add MAD size parameters to process_mad [RFC PATCH 06/16] ib/mad: Create jumbo_mad data structures [RFC PATCH 07/16] ib/mad: create a jumbo MAD kmem_cache [RFC PATCH 11/16] ib/mad: create helper function for [RFC PATCH 12/16] ib/mad: create helper function for [RFC PATCH 13/16] ib/mad: create helper function for [RFC PATCH 14/16] ib/mad: Create helper function for SMI processing [RFC PATCH 08/16] ib/mad: Add Intel Omni-Path Architecture defines [RFC PATCH 09/16] ib/mad: Implement support for Intel Omni-Path [RFC PATCH 10/16] ib/mad: Add registration check for Intel Omni-Path [RFC PATCH 15/16] ib/mad: Implement Intel Omni-Path Architecture SMP [RFC PATCH 16/16] ib/mad: Implement Intel Omni-Path Architecture General Done. Another easy-to-fix nitpicking comment would be to have all the patches be consistent w.r.t to the capitalization of the 1st letter in the 1st word after the IB/core: or IB/mad: prefix, e.g ib/mad: create helper function for smi_handle_dr_smp_send becomes IB/mad: Create helper function for smi_handle_dr_smp_send Done. BTW, here my personal preference is Add helper and not Create helper IB/mad: Add helper function for smi_handle_dr_smp_send Done. Ira N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: [RFC PATCH 00/16] ib_mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing.
On Thu, Nov 13, 2014 at 9:54 PM, ira.we...@intel.com wrote: The following patch series modifies the kernel MAD processing (ib_mad/ib_umad) and related interfaces to process Intel Omni-Path Architecture MADs on devices which support them. So this series allows to process such mad when it arrives or goes beyond that and allows to send such mad too? Both send and receive is supported. My apologies for not being clear. In addition to supporting some IBTA management classes, OPA devices use MADs with lengths up to 2K. These jumbo MADs increase the performance of management traffic. Can you provide 1-2 use cases where such mads will be sent and by what entity? I recall 2KB mads were mentioned over our LWG talks 8 years ago on IB routers... The Intel Omni-Path driver and SM will be the entities using these MADs. The patch series is written such that other devices could use jumbo MAD's but there is no attempt to predict how other technologies would do so. [snip] [RFC PATCH 07/16] ib/mad: create a jumbo MAD kmem_cache why not use a single kmem-cache instance with a non hard coded element size, 256B (or whatever we use today) or 2KB? I wanted to be able to adjust the element count of the caches separately to better tune overall memory usage. However, I stopped short of adding additional module parameters to adjust the 2K cache at this time. Also (nit), please change the prefix for all patches to be IB/mad: and not ib/mad: to comply with the existing habit of patch titles for the IB subsystem I will thanks. And (another nit), generate patch 0/N using $ git format-patch --cover-letter so we have the exact subject line for each patch and the over-all diffstat in the cover-letter I will. My reason for not doing this previously was that I run the git send-email command on generated patches from a different machine than the one on which my repo is located due to Intel's firewall setup. I will clone the repo there and do this in the future. -- Ira
RE: [PATCHv3] libibmad: Add new fields for SM:PortInfoExtended and for PM:PortExtendedSpeedsCounters
-Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of da...@dev.mellanox.co.il Sent: Wednesday, November 12, 2014 5:11 AM To: Weiny, Ira Cc: da...@mellanox.com; linux-rdma@vger.kernel.org Subject: [PATCHv3] libibmad: Add new fields for SM:PortInfoExtended and for PM:PortExtendedSpeedsCounters This change includes: 1. Adding new AttributeID and fields for MAD SM:PortInfoExtended. 2. Adding new fields to support PM:PortExtendedSpeedsCounters when RSFEC mode is active. Signed-off-by: Dan Ben Yosef da...@mellanox.com Thanks applied, Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH infiniband-diags] New MAD SM:PortInfoExtended and changing PM:PortExtendedSpeedsCounters
-Original Message- From: Dan Ben Yosef [mailto:da...@dev.mellanox.co.il] Sent: Monday, November 17, 2014 6:57 AM To: Weiny, Ira Cc: da...@mellanox.com; linux-rdma@vger.kernel.org Subject: [PATCH infiniband-diags] New MAD SM:PortInfoExtended and changing PM:PortExtendedSpeedsCounters This change needed for EDR support. This change includes: 1. Adding new MAD option to smpquery: PortInfoExtended (PIE). 2. Changing perfquery to support PortExtendedSpeedsCounters with RSFEC mode active. Signed-off-by: Dan Ben Yosef da...@mellanox.com Applied with the following change: -static void extended_speeds_query(ib_portid_t * portid, int port, uint64_t ext_mask) +static uint8_t is_fec_mode_active(ib_portid_t * portid, int port, + uint16_t cap_mask) Changed to is_rsfec_mode_active... Thanks, Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] infiniband-diags: add rdma-ndd daemon
On Mon, Nov 10, 2014 at 12:28:27AM +, Weiny, Ira wrote: Sadly, I think the proper way to address this is the same way netdev addresses it - do not activate the interface on module load, wait for an explicit enablement so userspace can configure before it tries to link up. (Not to say that is even possible considering RDMA's history, but still..) I don't think this is really possible. When would you consider the interface to be up? Even before the port goes active the SM and other diags can query this value. This is why I propose that the default of the drivers, without user space involvement, should change. The user space daemon is only trying to react to other user space activity. Same as net dev, after the module load the physical layer is disabled. Nothing can see the NodeDescription because the link is kept down. The everything gets configured, then the physical layer is allowed to come up. I don't know if this is practical, but it is the only race free way to properly address all of this. I don't think it is practical at all. Assuming that we can hold the link from transitioning to Init how does any userspace software know if the current hostname is valid? If, at the time of module load, the hostname is localhost you end up with the same problem as the proposed kernel patches. In fact the proposed kernel patches actually aid in the setting of alternate policies in that a %h provides some dynamic lookup. (See below) TBH, I'd rather see just this daemon and drop the kernel side... If the daemon is started before the modules are loaded then it might be able to set the description while the port is still down. The problem is this sequence. 1) rdma-ndd daemon started (hostname == localhost, no rdma devices set) 2) dhcp started (or other hostname change, rdma-ndd triggered but no rdma devices set; not loaded yet) 3) rdma devices loaded, default node_desc 4) daemon detects new rdma device instantly and programs the admin desired node name in the 100ms before the physical link reaches up. Since the link is down when the change occurs no traps are scheduled. Or in the 10% case where the race is lost you get a trap. Oh well. I'm not against adding this to rdma-ndd but I would like it to be optional. With the kernel patches rdma-ndd itself is optional. What I have proposed will work no matter what order things are started/loaded. Well, not really - it works in the sense that the kernel default node name works properly, but it doesn't allow the admin to set a custom name without hitting all these same issues (well, except perhaps by module parameter..). I think changing the default is a worthwhile change. In addition, alternate admin policies are aided by the general use of the %h specifier. 1) SM's which periodically scan the Node Description always get up to date hostname info. 2) Up to date hostname info is provided even if a user space daemons fails. Note: OpenSM has an update node description feature for this condition. 3) Low level diag tools always get up to date hostname info Do you believe that hostname device is an unreasonable default? Roland, Hal, do you have any input? Finally, I contemplated the alternative of having the rdma-ndd daemon detect new rdma devices. Well, without that all it does is link two parts of the kernel together through user space with no way toset policy in userspace (policy in this case is the node description string). Seems very strange. Yes the daemon is somewhat strange and I envision it being updated for other policies in the future which may include some of the functionality you suggest. But it seemed appropriate to add it now as it enhances the SM knowing about hostname changes. -- Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCHv2] libibmad: Add new fields for SM:PortInfoExtended and for PM:PortExtendedSpeedsCounters
On 11/8/2014 9:29 AM, Weiny, Ira wrote: --- a/src/fields.c +++ b/src/fields.c @@ -949,8 +949,42 @@ static const ib_field_t ib_mad_f[] = { {480, 32, Counter14, mad_dump_uint}, {0, 0}, /* IB_PSR_LAST_F */ -{0, 0} /* IB_FIELD_LAST_ */ +/* + * PortInfoExtended fields + */ +{0, 32, CapMask, mad_dump_hex}, I'm not quite sure why the spec was written this way but technically CapMask is only 1 bit??? PortInfoExtended:CapMask is 32 bit field. Current 1.3 draft is misformatted. Thanks... That makes more sense. Sorry I only just got access to the new 1.3 draft. -- Ira The two rows are supposed to be combined into 1 and things are misaligned. IsFECModeSupported is 1 bit (first bit out of those 32 bits). Therefore I'm not sure we can define this field as above. Can we get some assurances from the management WG that the 31 bit reserved field following the CapMask are only going to extend the CapMask? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] infiniband-diags: add rdma-ndd daemon
rdma-ndd is a system daemon which watches the procfs hostname file for updates. Upon detecting an update it will update the Node Descriptions of the RDMA devices in the system. What is the effect of this on RoCE and IWARP adapters ? While they populate node_desc in sysfs, I don't think they support IB_DEVICE_MODIFY_NODE_DESC for ib_modify_device. No they don't support it. Shouldn't such devices be skipped in terms of setting NodeDescription ? AFAIK all of the current user space software which write to the node_desc sysfs simply ignore any errors. Those devices which do not support ib_modify_device return EIO from sysfs. This is the same mechanism I used and should be safe. If we want to go down this path[*] I think the core sysfs code should be changed such that the node_desc is either A) not available on those devices or B) not writable if ib_modify_device is not supported. -- Ira [*] Changing this sysfs behavior was not the intent of this patch series and I think should be dealt with separately. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 1/6] ib/mad: Add function to support format specifiers for node description
On 11/6/2014 5:06 PM, ira.we...@intel.com wrote: From: Ira Weiny ira.we...@intel.com ib_build_node_desc - prints src node description into dest while mapping format specifiers Specifiers supported: %h system hostname %d device name Define a default Node Description format to be %h %d There are a lot of Node Description schemes in use in the field currently. New format needs to be made as unlikely as possible to interfere with those. So perhaps H:%h D:%d ? What I am proposing was generally agreed upon to be a reasonable default in this thread: http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg07295.html Furthermore RHEL 6.4 has started to use this naming scheme in their rdma start up script as was proposed and agreed upon by Doug Ledford in this thread: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg11282.html Also, in order to support such installations, whether or not to use this format should be configurable. It is configurable. You can overwrite the node_desc field anyway you want just as before and no current user space software needs to change if a user has customizations there. What is nice about this addition is that _if_ the user does _not_ have customizations the defaults are, IMHO, much more usable by default. -- Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] infiniband-diags: add rdma-ndd daemon
Sadly, I think the proper way to address this is the same way netdev addresses it - do not activate the interface on module load, wait for an explicit enablement so userspace can configure before it tries to link up. (Not to say that is even possible considering RDMA's history, but still..) I don't think this is really possible. When would you consider the interface to be up? Even before the port goes active the SM and other diags can query this value. This is why I propose that the default of the drivers, without user space involvement, should change. The user space daemon is only trying to react to other user space activity. TBH, I'd rather see just this daemon and drop the kernel side... If the daemon is started before the modules are loaded then it might be able to set the description while the port is still down. The problem is this sequence. 1) rdma-ndd daemon started (hostname == localhost, no rdma devices set) 2) dhcp started (or other hostname change, rdma-ndd triggered but no rdma devices set; not loaded yet) 3) rdma devices loaded, default node_desc What I have proposed will work no matter what order things are started/loaded. Furthermore, it avoids races between the start up scripts and the SM because the current hostname is always mapped at the time the query is being answered. With parallel start up systems like systemd you could have the hostname set by DHCP while some of your rdma devices are loaded and others are not. Those which were loaded before the hostname change will get set by rdma-ndd and trigger a trap. The others will simply pick up the correct hostname in the initial SM query. Finally, I contemplated the alternative of having the rdma-ndd daemon detect new rdma devices. I don't like this solution because it results in additional unnecessary MAD traffic for those devices. (ie. initial ND query, followed by a trap/trap repress, and the final correct ND query.) With my proposed solution any devices loaded after the hostname change (or in the case of a static hostname) will be properly queried with one ND query. -- Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCHv2] libibmad: Add new fields for SM:PortInfoExtended and for PM:PortExtendedSpeedsCounters
+ + /* + * PortExtendedSpeedsCounters RSFEC active fields + */ Use IB_PESC_RSFEC_FIRST_F and then set IB_PESC_RSFEC_PORT_SELECT_F to that. + IB_PESC_RSFEC_PORT_SELECT_F, + IB_PESC_RSFEC_COUNTER_SELECT_F, + IB_PESC_RSFEC_SYNC_HDR_ERR_CTR_F, + IB_PESC_RSFEC_UNK_BLOCK_CTR_F, What about ErrorDetectionCounterLaneX? + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE0_F, I'm not quite sure of the name here because this can be either Symbol or Block counter... What if we just drop the SYMBOL and leave it as: IB_PESC_RSFEC_FEC_CORR_CTR_LANE(X)_F + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE1_F, + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE2_F, + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE3_F, + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE4_F, + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE5_F, + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE6_F, + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE7_F, + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE8_F, + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE9_F, + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE10_F, + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE11_F, What about FECUncorrectable*CounterLaneX? Again it seems this counter is likely to have multiple meanings so use IB_PESC_RSFEC_FEC_UNCORR_CTR_LANE(X)_F + IB_PESC_PORT_FEC_CORR_BLOCK_CTR_F, + IB_PESC_PORT_FEC_UNCORR_BLOCK_CTR_F, + IB_PESC_PORT_FEC_CORR_SYMBOL_CTR_F, + IB_PESC_RSFEC_LAST_F, + IB_FIELD_LAST_ /* must be last */ }; diff --git a/src/dump.c b/src/dump.c index efe4bc4..64577c0 100644 --- a/src/dump.c +++ b/src/dump.c @@ -857,6 +857,13 @@ void mad_dump_portsamples_result(char *buf, int bufsz, void *val, int valsz) _dump_fields(buf, bufsz, val, IB_PSR_TAG_F, IB_PSR_LAST_F); } +void mad_dump_port_ext_speeds_counters_rsfec_active(char *buf, int bufsz, + void *val, int valsz) +{ + _dump_fields(buf, bufsz, val, IB_PESC_RSFEC_PORT_SELECT_F, Based on the above change use the FIRST field here. + IB_PESC_RSFEC_LAST_F); +} + void mad_dump_port_ext_speeds_counters(char *buf, int bufsz, void *val, int valsz) { _dump_fields(buf, bufsz, val, IB_PESC_PORT_SELECT_F, IB_PESC_LAST_F); @@ -1115,6 +1122,12 @@ void mad_dump_classportinfo(char *buf, int bufsz, void *val, int valsz) _dump_fields(buf, bufsz, val, IB_CPI_BASEVER_F, IB_CPI_TRAP_QKEY_F + 1); } +void mad_dump_portinfo_ext(char *buf, int bufsz, void *val, int valsz) +{ + _dump_fields(buf, bufsz, val, IB_PORT_EXT_CAPMASK_F, Use IB_PORT_EXT_FIRST_F here. + IB_PORT_EXT_LAST_F); +} + void xdump(FILE * file, char *msg, void *p, int size) { #define HEX(x) ((x) 10 ? '0' + (x) : 'a' + ((x) -10)) diff --git a/src/fields.c b/src/fields.c index 33a6364..a848ebf 100644 --- a/src/fields.c +++ b/src/fields.c @@ -949,8 +949,42 @@ static const ib_field_t ib_mad_f[] = { {480, 32, Counter14, mad_dump_uint}, {0, 0}, /* IB_PSR_LAST_F */ - {0, 0} /* IB_FIELD_LAST_ */ + /* + * PortInfoExtended fields + */ + {0, 32, CapMask, mad_dump_hex}, I'm not quite sure why the spec was written this way but technically CapMask is only 1 bit??? Therefore I'm not sure we can define this field as above. Can we get some assurances from the management WG that the 31 bit reserved field following the CapMask are only going to extend the CapMask? + {BITSOFFS(32, 16), FECModeActive, mad_dump_uint}, + {BITSOFFS(48, 16), FDRFECModeSupported, mad_dump_uint}, As this is a bit field it is probably better to be printed as hex. + {BITSOFFS(64, 16), FDRFECModeEnabled, mad_dump_uint}, Print as hex. + {BITSOFFS(80, 16), EDRFECModeSupported, mad_dump_uint}, Print as hex. + {BITSOFFS(96, 16), EDRFECModeEnabled, mad_dump_uint}, Print as hex. + {0, 0}, /* IB_PORT_EXT_LAST_F */ + /* + * PortExtendedSpeedsCounters RSFEC Active fields + */ + {BITSOFFS(8, 8), PortSelect, mad_dump_uint}, + {64, 64, CounterSelect, mad_dump_hex}, + {BITSOFFS(128, 16), SyncHeaderErrorCounter, mad_dump_uint}, + {BITSOFFS(144, 16), UnknownBlockCounter, mad_dump_uint}, Add missing fields. + {352, 32, FECCorrectableSymbolCtrLane0, mad_dump_uint}, Same comment as above regarding the name here. It is unfortunate that the lib could not change the name based on RS-FEC. That will be up to other software to decode in a more user friendly way. For these raw dumps we should use the more generic name: FECCorrectableCtrLaneX + {384, 32, FECCorrectableSymbolCtrLane1, mad_dump_uint}, + {416, 32, FECCorrectableSymbolCtrLane2, mad_dump_uint}, + {448, 32, FECCorrectableSymbolCtrLane3, mad_dump_uint}, + {480, 32, FECCorrectableSymbolCtrLane4, mad_dump_uint}, + {512, 32, FECCorrectableSymbolCtrLane5, mad_dump_uint}, + {544,
RE: [PATCH infiniband-diags] saquery.c: Fix handling of cpi and ClassPortInfo options
-Original Message- From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il] Sent: Wednesday, November 05, 2014 12:22 PM To: Weiny, Ira Cc: linux-rdma (linux-rdma@vger.kernel.org); Doug Ledford; Dan Ben-Yosef Subject: [PATCH infiniband-diags] saquery.c: Fix handling of cpi and ClassPortInfo options Need to check if query_type is CLASS_PORT_INFO in order to handle saquery [cpi | ClassPortInfo]. Otherwise, it just dumps ClassPortInfo attribute as all 0s. Note that saquery -c works fine because it set command to SAQUERY_CMD_CLASS_PORT_INFO. Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks applied, Ira --- diff --git a/src/saquery.c b/src/saquery.c index 0f39064..1af3401 100644 --- a/src/saquery.c +++ b/src/saquery.c @@ -1869,6 +1869,7 @@ int main(int argc, char **argv) params.dlid = get_lid(h, dst_lid); if (command == SAQUERY_CMD_CLASS_PORT_INFO || + query_type == CLASS_PORT_INFO || query_type == IB_SA_ATTR_SWITCHINFORECORD) sa_cpi_required = 1; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH infiniband-diags] saquery.c: Return proper status from query_sa_cpi on bad result status
-Original Message- From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il] Sent: Wednesday, November 05, 2014 12:22 PM To: Weiny, Ira Cc: linux-rdma (linux-rdma@vger.kernel.org); Dan Ben-Yosef Subject: [PATCH infiniband-diags] saquery.c: Return proper status from query_sa_cpi on bad result status When result status was other than IB_SA_MAD_STATUS_SUCCESS, 0 was being returned rather than EIO as intended. Signed-off-by: Hal Rosenstock h...@mellanox.com Thanks applied, Ira --- diff --git a/src/saquery.c b/src/saquery.c index 0f39064..1af3401 100644 --- a/src/saquery.c +++ b/src/saquery.c @@ -1445,7 +1445,7 @@ static int query_sa_cpi(struct sa_handle *h, struct query_params *query_params) memcpy(query_params-cpi, cpi, sizeof(query_params-cpi)); Exit: sa_free_result_mad(result); - return (0); + return ret; } static const struct query_cmd query_cmds[] = { -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] infiniband-diags: add rdma-ndd daemon
On 10/31/14 00:54, ira.we...@intel.com wrote: +int set_rdma_device_names(const char *hostname) { + DIR *class_dir; + struct dirent *dent; + + class_dir = opendir(SYS_INFINIBAND); + if (!class_dir) { + syslog(LOG_INFO, Failed to open %s, SYS_INFINIBAND); + return -ENOSYS; + } + + while ((dent = readdir(class_dir))) { + int retry = set_retry_cnt; + if (dent-d_name[0] == '.') + continue; + + while (update_node_desc(dent-d_name, hostname) retry 0) { + syslog(LOG_ERR, retrying set Node Description on %s\n, + dent-d_name); + retry--; + } + } + + return 0; +} Has this code been tested with Valgrind and --track-fds=yes and --leak- check=full ? I see an opendir() call in the above code but no closedir(). Great catch thanks! I'll run with valgrind on v2. +int read_and_set_hostname(int fd) +{ + int rc; + char buf[128]; + if (read(fd, buf, 65) = 0) { + buf[65] = '\0'; + newline_to_null(buf); + strip_domain(buf); + syslog(LOG_INFO, new hostname detected: %s, buf); + set_rdma_device_names((char *)buf); + rc = 0; + } else { + syslog(LOG_ERR, Read %s Failed\n, SYS_HOSTNAME); + rc = -EIO; + } + return rc; +} The above code will trigger reading uninitialized data if the data read from fd is not newline-terminated and less than 65 characters are read. I'm not sure what you mean. Do you mean will trigger _using_ uninitialized data? How about the following? @@ -155,8 +156,8 @@ int read_and_set_hostname(int fd) { int rc; char buf[128]; - if (read(fd, buf, 65) = 0) { - buf[65] = '\0'; + memset(buf, 0, sizeof(buf)); + if (read(fd, buf, 64) = 0) { newline_to_null(buf); strip_domain(buf); syslog(LOG_INFO, new hostname detected: %s, buf); Ira Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] infiniband-diags: add rdma-ndd daemon
How about the following? @@ -155,8 +156,8 @@ int read_and_set_hostname(int fd) { int rc; char buf[128]; - if (read(fd, buf, 65) = 0) { - buf[65] = '\0'; + memset(buf, 0, sizeof(buf)); + if (read(fd, buf, 64) = 0) { newline_to_null(buf); strip_domain(buf); syslog(LOG_INFO, new hostname detected: %s, buf); This change looks fine to me. A possible alternative is to use dup() + fdopen() + fgets() + fclose(). fgets() namely automatically appends a terminating '\0'. Agreed. But I'd like to avoid creating additional fd's. Thanks for the review, Ira Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Are the PathRecord reserved components really reserved?
Hello, I am looking into the OFED code for the component mask bits of the PathRecord requests and I need some further explanation. The IB specification in section 15.2.5.16 PathRecord, table 207 PathRecord defines the first two components as Reserved. Each one of them is 32 bits long and the description of the first one is Reserved while the description of the second reserved component is Offset for alignment. The IBTA has published errata to the 1.2.1 specification which includes the use of many reserved fields. I believe these errata are public on their web site. Ira However, in the file include/rdma/ib_sa.h where the component mask bits for the Path Records are defined, the 1st and 2nd component are combined together under the name IB_SA_PATH_REC_SERVICE_ID. #define IB_SA_PATH_REC_SERVICE_ID (IB_SA_COMP_MASK( 0) |\ IB_SA_COMP_MASK( 1)) The same definition exists in the OpenSM code in file include/iba/ib_types.h /* Path Record Component Masks */ #define IB_PR_COMPMASK_SERVICEID_MSB (CL_HTON64(((uint64_t)1)0)) #define IB_PR_COMPMASK_SERVICEID_LSB (CL_HTON64(((uint64_t)1)1)) . . #define IB_PR_COMPMASK_SERVICEID (IB_PR_COMPMASK_SERVICEID_MSB | \ IB_PR_COMPMASK_SERVICEID_LSB) Why are these fields named as SERVICE_ID? I would expect something like IB_SA_PATH_REC_RESERVED_1 and IB_SA_PATH_REC_RESERVED_2 according to the specs. Are they really reserved or used, and if they are used, when does this happen? Vangelis
RE: Question on merge windows.
If that is the case would you like me to rebase the latest version of following patches onto your most recent for-next branch and resubmit? Or would you prefer I _not_ do so? I would like to see if we can get this series into 3.17 if possible. I'm guess the answer is no, but are there any changes to the patches if you rebase to 3.16-rc7? No changes. They all apply cleanly with git am to for-next. I don't see a list of patches staged for 3.17 in Roland's upstream tree yet, but he either may not have gotten to it yet, or just hasn't pushed them. Thanks for the reply, Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Question on merge windows and MAD submitted MAD patches.
Roland, I see the merge window closed on Sunday and Linus has released 3.16-rc1. Does that mean now is the time you will start accepting things destined for 3.17 while 3.16 bakes? If that is the case would you like me to rebase the latest version of following patches onto your most recent for-next branch and resubmit? Or would you prefer I _not_ do so? I would like to see if we can get this series into 3.17 if possible. Roland, I did not get a response to this email so I'm not sure if I should resubmit my patches or not. I have rejected the older versions of the patches on patchwork and marked you the Delegate for the following current versions of this series. [PATCH v2 1/5] ib/umad: update module to [pr|dev]_* style print messages [PATCH v2 2/5] ib/mad: update module to [pr|dev]_* style print messages [PATCH v2 3/5] ib/mad: Add dev_notice messages for various umad/mad [PATCH v3 4/5] ib/mad: add new ioctl to user space to support new [PATCH 5/5] ib/mad: Add user space RMPP support I hope this helps. I'm also more than happy to resend the patches which I have rebased on top of your for-next branch but there are no differences and the old patches cherry-pick cleanly. So I'm holding off on submitting to avoid clutter on the mailing list and patchwork. Thanks, Ira Thanks, Ira -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of ira.we...@intel.com Sent: Monday, May 05, 2014 6:45 PM To: rol...@purestorage.com Cc: linux-rdma@vger.kernel.org Subject: [PATCH RESEND 0/5] ib/mad: Add ability for user space applications to control RMPP directly My apologies if anyone receives duplicates of these emails. Apparently the series did not make it to linux-rdma the first time. The first 3 patches update the umad and mad modules to use pr_* style print messages in preparation for adding pr_notice messages to the existing and new registration functions. The final 2 patches add a new registration mechanism which allows for flags to be specified at ib_mad registration time. The first such flag is to allow user control of RMPP transactions. Existing registration and RMPP transactions are unaffected by this change. [PATCH 1/5] ib/umad: update module to pr_* style print messages [PATCH 2/5] ib/mad: update module to pr_* style print messages [PATCH 3/5] ib/mad: Add pr_notice messages for various umad/mad [PATCH 4/5] ib/mad: add new ioctl to user space to support new [PATCH 5/5] ib/mad: Add user space RMPP support -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html