RE: [PATCH 2/2] ipoib mcast sendonly join: Move multicast specific code out of ipoib_main.c.

2015-12-14 Thread Weiny, Ira
> 
> On Fri, 11 Dec 2015, ira.weiny wrote:
> 
> > I think I would rather see this called something like
> >
> > ipoib_add_to_list_sendonly
> >
> > Or something...
> >
> > Calling it iboib_check* sounds like it should return a bool.
> 
> Hmm... It only adds the multicast group if the check was successful.
> 
> How about
> 
>   ipoib_check_and_add_mcast_sendonly()

Better.

> 
> 
> > > +void ipoib_check_mcast_sendonly(struct ipoib_dev_priv *priv, u8
> *mgid,
> > > + struct list_head *remove_list)
> > > +{
> > > + /* Is this multicast ? */
> > > + if (*mgid == 0xff) {
> >
> > Odd to see a mgid variable which is only u8?
> >
> > How about "gid_prefix"?
> 
> That is only used in the qib driver and there it is a field.
> 
> mgid is a pointer to the seres of bytes of the MGID and the first byte of that
> signifies multicast if 0xff

Understood, I misread the code at first.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] IB/mad: In validate_mad, validate CM send method for attributes other than ClassPortInfo

2015-11-13 Thread Weiny, Ira
> 
> 
> Receipt of CM MAD with other than the Send method for an attribute other
> than the ClassPortInfo attribute is invalid.
> 
> CM attributes other than ClassPortInfo only use the send method.
> 
> The SRP initiator does not maintain a timeout policy for CM connect requests
> relies on the CM layer to do that. The result was that the SRP initiator hung 
> as
> the connect request never completed.
> 
> A new SRP target has been observed to respond to Send CM REQ with GetResp
> of CM REQ with bad status. This is non conformant with IBA spec but exposes a
> vulnerability in the current MAD/CM code which will respond to the incoming
> GetResp of CM REQ as if it was a valid incoming Send of CM REQ rather than
> tossing this on the floor. It also causes the MAD layer not to retransmit the
> original REQ even though it has not received a REP.
> 
> Reviewed-by: Sagi Grimberg 

Reviewed-by: Ira Weiny 

> Signed-off-by: Hal Rosenstock 
> ---
> Changes since v1:
> Removed ClassPortInfo method validation
> 
>  drivers/infiniband/core/mad.c |5 +
>  include/rdma/ib_mad.h |2 ++
>  2 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 8d8af7a..2281de1 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -1811,6 +1811,11 @@ static int validate_mad(const struct ib_mad_hdr
> *mad_hdr,
>   if (qp_num == 0)
>   valid = 1;
>   } else {
> + /* CM attributes other than ClassPortInfo only use Send
> method */
> + if ((mad_hdr->mgmt_class == IB_MGMT_CLASS_CM) &&
> + (mad_hdr->attr_id != IB_MGMT_CLASSPORTINFO_ATTR_ID)
> &&
> + (mad_hdr->method != IB_MGMT_METHOD_SEND))
> + goto out;
>   /* Filter GSI packets sent to QP0 */
>   if (qp_num != 0)
>   valid = 1;
> diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h index
> 188df91..ec9b44d 100644
> --- a/include/rdma/ib_mad.h
> +++ b/include/rdma/ib_mad.h
> @@ -237,6 +237,8 @@ struct ib_vendor_mad {
>   u8  data[IB_MGMT_VENDOR_DATA];
>  };
> 
> +#define IB_MGMT_CLASSPORTINFO_ATTR_IDcpu_to_be16(0x0001)
> +
>  struct ib_class_port_info {
>   u8  base_version;
>   u8  class_version;
> --
> 1.7.8.2
> 



RE: [PATCH] IB/sa: replace GFP_KERNEL with GFP_ATOMIC

2015-10-27 Thread Weiny, Ira
> 
> On Tue, Oct 27, 2015 at 06:56:50PM +, Wan, Kaike wrote:
> 
> > > I do wonder if it is a good idea to call ib_nl_send_msg with a
> > > spinlock held though.. Would be nice to see that go away.
> >
> > We have to hold the lock to protect against a race condition that a
> > quick response will try to free the request from the
> > ib_nl_request_list before we even put it on the list.
> 
> Put is on the list first? Use a kref? Doesn't look like a big deal to clean 
> this up.
> 
> Jason

Not sure what "Put is on the list first?" means.  I think it is valid to build 
the request, 
if success, add to the list, then send it.  That would solve the problem you 
mention 
above.  Was that what you hand in mind, Jason?

I don't have time to work on this right now, not sure about Kaike.  Until we 
can remove 
the spinlock the current proposed patch should be applied in the interim.  
Sorry for the
noise before.

Reviewed-By: Ira Weiny 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 07/22] staging/rdma/hfi1: Fix sparse error in sdma.h file

2015-10-21 Thread Weiny, Ira
> 
> On Mon, Oct 19, 2015 at 10:11:22PM -0400, ira.we...@intel.com wrote:
> > From: Niranjana Vishwanathapura 
> >
> > Use NULL instead of 0 for pointer argument to fix the sparse error.
> >
> > Reviewed-by: Mike Marciniszyn 
> > Reviewed-by: Mitko Haralanov 
> > Reviewed-by: Dennis Dalessandro 
> > Signed-off-by: Niranjana Vishwanathapura
> > 
> > Signed-off-by: Ira Weiny 
> 
> This should have just been folded in with the previous patch.  Don't introduce
> problems and fix them in the patchset.

My bad, I will fix in v3.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 00/23] Update driver to 0.9-294

2015-10-19 Thread Weiny, Ira
> 
> On Mon, Oct 19, 2015 at 12:43:24PM -0400, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> >
> > The following series has bug fixes and updates to the staging hfi1 driver.
> 
> Why are you adding new functionality to this driver before it is moved out of
> drivers/staging/ ?  I _REALLY_ don't like to see that, because it implies 
> that you
> are spending time and energy on new stuff, not fixing up the known issues
> instead.

Perhaps I did not chose my words carefully enough.

The largest issue on the TODO list is the refactoring of the code to be shared 
between the hfi1 and qib driver.  While the hardware between hfi1 and qib is 
similar and thus the initial code looked similar, our performance tuning on 
hfi1 has revealed that some changes will be required to the hfi1 code to fully 
utilize the hardware.  We would like to get these updates in to make sure that 
the refactoring work is done properly for the 2 hardware types.

> 
> Please redo this patch series and don't add new stuff.
> 

I can do this but it will cause us to do our refactoring work 2 times in order 
for it to be done correctly (properly support both hardware types).  Of the 23 
patches I sent only one has significant updates which don't conflict with this 
refactoring work.

[PATCH 15/23] staging/rdma/hfi1: Implement Expected Receive TID caching

However, without this patch users performance will be impacted.

In the interest of full disclosure we still have at least 2 more series, 
possibly 3 which are similar to this series, bug fixes and performance 
improvements.

Given the above explanation, would you find it acceptable to take these 
"updates"?  I can certainly rework patch 15 to separate out the code clean up 
as per your other email.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RFC] IB/mad: remove obsolete snoop interface

2015-09-30 Thread Weiny, Ira
> 
> > I have a series of about 5 patches which implement tracing in the MAD
> > stack which I was working through and was going to submit
> 
> Does your MAD stack tracing include the dumping and decode of sent and
> received MADs ?
> 

Yes with a bit more details in some places and probably less in others.  It 
also traces MADs throughout the stack as well.  So there is some indication of 
where they are in the stack umad vs mad for example.

I'll post them later today for RFC.  The most recent work I did on them 
involved a panic (which I fixed).  But this is the main reason I have not 
submitted this series.  (I wanted to get more testing on it before submission.)

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] infiniband:Remove unneeded function definitions and declarations in the file, mad.c

2015-09-30 Thread Weiny, Ira
> 
> This removes the definitions and declarations of the functions
> ib_register_mad_snoop and register_snoop_agent in the file mad.c due to have
> no more callers and thus can be removed to reduce code size of this particular
> file.

I don't disagree with removing the snoop code, however, I don't believe this 
goes far enough.  I have a similar patch which removes the snoop interface 
completely.  It has a medium level of testing.

I'll send it to the list for review.  If you concur with the larger removal, 
I'd be happy to add your signed-off by line.

Thanks,
Ira

> 
> Signed-off-by: Nicholas Krause 
> ---
>  drivers/infiniband/core/mad.c | 98 
> ---
>  1 file changed, 98 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 4b5c723..e5978e1 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -444,104 +444,6 @@ static inline int is_snooping_recvs(int
> mad_snoop_flags)
>IB_MAD_SNOOP_RMPP_RECVS*/));
>  }
> 
> -static int register_snoop_agent(struct ib_mad_qp_info *qp_info,
> - struct ib_mad_snoop_private
> *mad_snoop_priv)
> -{
> - struct ib_mad_snoop_private **new_snoop_table;
> - unsigned long flags;
> - int i;
> -
> - spin_lock_irqsave(_info->snoop_lock, flags);
> - /* Check for empty slot in array. */
> - for (i = 0; i < qp_info->snoop_table_size; i++)
> - if (!qp_info->snoop_table[i])
> - break;
> -
> - if (i == qp_info->snoop_table_size) {
> - /* Grow table. */
> - new_snoop_table = krealloc(qp_info->snoop_table,
> -sizeof mad_snoop_priv *
> -(qp_info->snoop_table_size + 1),
> -GFP_ATOMIC);
> - if (!new_snoop_table) {
> - i = -ENOMEM;
> - goto out;
> - }
> -
> - qp_info->snoop_table = new_snoop_table;
> - qp_info->snoop_table_size++;
> - }
> - qp_info->snoop_table[i] = mad_snoop_priv;
> - atomic_inc(_info->snoop_count);
> -out:
> - spin_unlock_irqrestore(_info->snoop_lock, flags);
> - return i;
> -}
> -
> -struct ib_mad_agent *ib_register_mad_snoop(struct ib_device *device,
> -u8 port_num,
> -enum ib_qp_type qp_type,
> -int mad_snoop_flags,
> -ib_mad_snoop_handler
> snoop_handler,
> -ib_mad_recv_handler recv_handler,
> -void *context)
> -{
> - struct ib_mad_port_private *port_priv;
> - struct ib_mad_agent *ret;
> - struct ib_mad_snoop_private *mad_snoop_priv;
> - int qpn;
> -
> - /* Validate parameters */
> - if ((is_snooping_sends(mad_snoop_flags) && !snoop_handler) ||
> - (is_snooping_recvs(mad_snoop_flags) && !recv_handler)) {
> - ret = ERR_PTR(-EINVAL);
> - goto error1;
> - }
> - qpn = get_spl_qp_index(qp_type);
> - if (qpn == -1) {
> - ret = ERR_PTR(-EINVAL);
> - goto error1;
> - }
> - port_priv = ib_get_mad_port(device, port_num);
> - if (!port_priv) {
> - ret = ERR_PTR(-ENODEV);
> - goto error1;
> - }
> - /* Allocate structures */
> - mad_snoop_priv = kzalloc(sizeof *mad_snoop_priv, GFP_KERNEL);
> - if (!mad_snoop_priv) {
> - ret = ERR_PTR(-ENOMEM);
> - goto error1;
> - }
> -
> - /* Now, fill in the various structures */
> - mad_snoop_priv->qp_info = _priv->qp_info[qpn];
> - mad_snoop_priv->agent.device = device;
> - mad_snoop_priv->agent.recv_handler = recv_handler;
> - mad_snoop_priv->agent.snoop_handler = snoop_handler;
> - mad_snoop_priv->agent.context = context;
> - mad_snoop_priv->agent.qp = port_priv->qp_info[qpn].qp;
> - mad_snoop_priv->agent.port_num = port_num;
> - mad_snoop_priv->mad_snoop_flags = mad_snoop_flags;
> - init_completion(_snoop_priv->comp);
> - mad_snoop_priv->snoop_index = register_snoop_agent(
> - _priv->qp_info[qpn],
> - mad_snoop_priv);
> - if (mad_snoop_priv->snoop_index < 0) {
> - ret = ERR_PTR(mad_snoop_priv->snoop_index);
> - goto error2;
> - }
> -
> - atomic_set(_snoop_priv->refcount, 1);
> - return _snoop_priv->agent;
> -
> -error2:
> - kfree(mad_snoop_priv);
> -error1:
> - return ret;
> -}
> -EXPORT_SYMBOL(ib_register_mad_snoop);
> -
>  static inline void deref_mad_agent(struct ib_mad_agent_private
> *mad_agent_priv)  {
>   if (atomic_dec_and_test(_agent_priv->refcount))
> --
> 

RE: [PATCH RFC] IB/mad: remove obsolete snoop interface

2015-09-30 Thread Weiny, Ira
> 
> On 9/30/2015 2:01 AM, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> >
> > This interface has no current users and is obsolete.
> 
> There are no in tree users of this but there is Sean's madeye tool (which is 
> out
> of tree). This is still a useful debug tool for MADs.

Where is that?  I did not think it was supported any longer?

I have a series of about 5 patches which implement tracing in the MAD stack 
which I was working through and was going to submit along with this patch 
(those are still being tested in my spare time).  I just wanted put this out 
here because of the patch Nicholas posted yesterday.

> 
> > Signed-off-by: Ira Weiny 
> > ---
> >
> > This has undergone a medium level of testing.  I have run it against
> > mlx4, qib, and OPA hardware and it does not seem to cause any regressions.
> >
> >  drivers/infiniband/core/mad.c  | 226 
> > +
> >  drivers/infiniband/core/mad_priv.h |  13 ---
> >  2 files changed, 5 insertions(+), 234 deletions(-)
> 
> There are also snoop changes needed in include/rdma/ib_mad.h.

I'll look into it if there are no other objections to removing the snoop 
interface.

> 
> Is it correct to assume that OPA_CAP_MASK3_IsSnoopSupported in
> opa_port_info.h is not related to this ?

Nope, not related.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: 0-day build issues

2015-09-24 Thread Weiny, Ira
> 
> What is being done in the parent directory has a bug, you're right that far, 
> but
> this is the wrong fix. If you check the 0day failures, it isn't because the
> InfiniBand core isn't enabled, it's because the current menuconfig setup 
> allows
> the IB core to be a module and the staging drivers to be compiled into the
> kernel. That obviously is an invalid condition and it applies to all of the 
> drivers
> in staging right now. The correct fix is to resolve that issue.

That is our conclusion as well.

But how do we do that globally in staging/rdma?

The module setting of INFINIBAND does not seem to carry to the subdirs while 
the following does work.

It seems like this is the easiest fix since this is a temp home for these 
drivers.

Ira


13:54:05 > git di
diff --git a/drivers/staging/rdma/amso1100/Kconfig 
b/drivers/staging/rdma/amso1100/Kconfig
index e6ce5f209e47..809cb14ac6de 100644
--- a/drivers/staging/rdma/amso1100/Kconfig
+++ b/drivers/staging/rdma/amso1100/Kconfig
@@ -1,6 +1,6 @@
 config INFINIBAND_AMSO1100
tristate "Ammasso 1100 HCA support"
-   depends on PCI && INET
+   depends on PCI && INET && INFINIBAND
---help---
  This is a low-level driver for the Ammasso 1100 host
  channel adapter (HCA).
diff --git a/drivers/staging/rdma/hfi1/Kconfig 
b/drivers/staging/rdma/hfi1/Kconfig
index fd25078ee923..97372103e947 100644
--- a/drivers/staging/rdma/hfi1/Kconfig
+++ b/drivers/staging/rdma/hfi1/Kconfig
@@ -1,6 +1,6 @@
 config INFINIBAND_HFI1
tristate "Intel OPA Gen1 support"
-   depends on X86_64
+   depends on X86_64 && INFINIBAND
default m
---help---
This is a low-level driver for Intel OPA Gen1 adapter.
diff --git a/drivers/staging/rdma/ipath/Kconfig 
b/drivers/staging/rdma/ipath/Kconfig
index 041ce0634968..12dcd4010a18 100644
--- a/drivers/staging/rdma/ipath/Kconfig
+++ b/drivers/staging/rdma/ipath/Kconfig
@@ -1,6 +1,6 @@
 config INFINIBAND_IPATH
tristate "QLogic HTX HCA support"
-   depends on 64BIT && NET && HT_IRQ
+   depends on 64BIT && NET && HT_IRQ && INFINIBAND
---help---
This is a driver for the deprecated QLogic Hyper-Transport
IB host channel adapter (model QHT7140),

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: 0-day build issues

2015-09-24 Thread Weiny, Ira
 
> >
> > What is being done in the parent directory has a bug, you're right
> > that far, but this is the wrong fix. If you check the 0day failures,
> > it isn't because the InfiniBand core isn't enabled, it's because the
> > current menuconfig setup allows the IB core to be a module and the
> > staging drivers to be compiled into the kernel. That obviously is an
> > invalid condition and it applies to all of the drivers in staging right 
> > now. The
> correct fix is to resolve that issue.
> 
> That is our conclusion as well.
> 
> But how do we do that globally in staging/rdma?
> 
> The module setting of INFINIBAND does not seem to carry to the subdirs while
> the following does work.
> 
> It seems like this is the easiest fix since this is a temp home for these 
> drivers.

Ok looks like this will fix it...  Still testing.

14:18:37 > git di
diff --git a/drivers/staging/rdma/Kconfig b/drivers/staging/rdma/Kconfig
index d7f62359d743..46ddc03c6839 100644
--- a/drivers/staging/rdma/Kconfig
+++ b/drivers/staging/rdma/Kconfig
@@ -1,10 +1,11 @@
 menuconfig STAGING_RDMA
-bool "RDMA staging drivers"
+tristate "RDMA staging drivers"
depends on INFINIBAND
depends on PCI || BROKEN
depends on HAS_IOMEM
depends on NET
depends on INET
+   depends on m
 default n
 ---help---
   This option allows you to select a number of RDMA drivers that

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [patch] IB/hfi1: checking for NULL instead of IS_ERR

2015-09-19 Thread Weiny, Ira
> 
> On Fri, Sep 18, 2015 at 11:51:09AM -0400, Doug Ledford wrote:
> > On 09/16/2015 02:22 AM, Dan Carpenter wrote:
> > > __get_txreq() returns an ERR_PTR() but this checks for NULL so it
> > > would oops on failure.
> > >
> > > Signed-off-by: Dan Carpenter 
> >
> > Thanks, applied.
> 
> Applied to what?  Should I just ignore these types of patches and not take 
> them
> in my tree and you will send them on later on?  I don't remember what we
> agreed to do, sorry.

My recollection was that Doug was going to handle the staging/rdma sub-tree 
because of the large churn he expected between the rdma subsystem and those 
drivers.

Therefore, I have been submitting my patches against staging/rdma/hfi directly 
to Doug and Linux-rdma.

Ira

> 
> thanks,
> 
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] IB/hfi: Properly set permissions for user device files

2015-09-17 Thread Weiny, Ira
> 
> On Thu, Sep 17, 2015 at 06:18:15PM +0200, Michal Schmidt wrote:
> > On 09/16/2015 11:41 PM, ira.we...@intel.com wrote:
> > > @@ -125,7 +151,20 @@ int __init dev_init(void)
> > >   ret = PTR_ERR(class);
> > >   pr_err("Could not create device class (err %d)\n", -ret);
> > >   unregister_chrdev_region(hfi1_dev, HFI1_NMINORS);
> > > + goto done;
> > >   }
> > > + class->devnode = hfi1_devnode;
> > > +
> > > + user_class = class_create(THIS_MODULE, class_name_user());
> > > + if (IS_ERR(user_class)) {
> > > + ret = PTR_ERR(user_class);
> > > + pr_err("Could not create device class for user accisble files
> > > +(err %d)\n",
> >
> >  Typo in error message.
> 
> And what is the deal with all these pr_err's? This is a driver, it needs to 
> use
> dev_err and related always. Does thatneed to go in the todo list?
> 
> I'm also skeptical we need a print on every error case :|
> 

This is very early in the driver code and we don't have a struct device at this 
point.

The bulk of the driver uses macros which use dev_*.  So no I don't think we 
need to add anything to the todo list.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [GIT PULL] pull request v2: linux-firmware: Add Intel OPA hfi1 firmware

2015-09-03 Thread Weiny, Ira
> 
> On 09/03/2015 04:26 PM, Kyle McMartin wrote:
> > On Wed, Sep 02, 2015 at 10:15:15PM +0000, Weiny, Ira wrote:
> >>>
> >>> On 08/24/2015 11:28 AM, ira.weiny wrote:
> >>>> git://github.com/weiny2/linux-firmware.git  master-hfi1-firmware-v2
> >>>
> >>> Should this go through me or though the linux-firmware tree?
> >>
> >> I don't know.  My apologizes for being ignorant in this case.
> >>
> >> I was thinking that as the rdma kernel maintainer you would have access to
> merge it into the Linux-firmware tree.  The README in the Linux-firmware
> project only says to email linux-firmw...@kernel.org.  I have not heard from
> anyone on that list.
> >>
> >> Does anyone on Linux-firmware have any objections to merging my branch?
> >
> > No objections, I've just been on vacation most of August. Let me look
> > over the license text and then I'll be right with you.
> >
> > (linux-firmware@ isn't a list because of vger rules about attachment
> > size and such, it just goes to dwmw2, ben hutchings and myself.)
> >
> > cheers, --Kyle
> 
> Thanks Kyle :-)

Thanks!

:-D

Ira


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [GIT PULL] pull request v2: linux-firmware: Add Intel OPA hfi1 firmware

2015-09-02 Thread Weiny, Ira
> 
> On 08/24/2015 11:28 AM, ira.weiny wrote:
> > git://github.com/weiny2/linux-firmware.git  master-hfi1-firmware-v2
> 
> Should this go through me or though the linux-firmware tree?

I don't know.  My apologizes for being ignorant in this case.

I was thinking that as the rdma kernel maintainer you would have access to 
merge it into the Linux-firmware tree.  The README in the Linux-firmware 
project only says to email linux-firmw...@kernel.org.  I have not heard from 
anyone on that list.

Does anyone on Linux-firmware have any objections to merging my branch?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv3 infiniband-diags] iblinkinfo.c: Close additional file descriptor in advance

2015-08-02 Thread Weiny, Ira
 
 
 Additional file descriptor for SMP MADs should be closed before running
 ibnd_discover_fabric() to avoid parallel usage of two SMP file descriptors
 
 Signed-off-by: Vladimir Koushnir vladim...@mellanox.com
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks applied,
Ira



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/2] update ocrdma to dual license

2015-07-03 Thread Weiny, Ira
 
 Christoph,
 
 
 Apologies, I misspoke in my response to you.  There was a study of the code 
 and
 we thought it was reasonable to post.  However, in retrospect we should have
 used more due diligence.  We're going back to seek explicit consent from key
 contributors.

I'm no legal expert, but don't you need consent from _all_ contributors?

Ira



RE: [GIT] Networking

2015-06-25 Thread Weiny, Ira
 
  Please accept my apologies.  The original patch used WARN_ON but I was
 advised to use BUG_ON in a review and I should have thought about it more
 rather than blindly make the change.
 
 
 Ira,
 
 Can you please point me to the review thread where this advise was made? I
 can't track it. In internal reviews I always fight with developers that put
 BUG_ON assertions whenever something goes wrong, I'd like to see on what
 context this feedback was provided to you.

This was recommended for 2 patches but I did not change the first.

http://www.spinics.net/lists/linux-rdma/msg25514.html

This is an earlier version of the patch Linus objected to

http://www.spinics.net/lists/linux-rdma/msg25498.html

Ira



RE: [PATCH V2] IB/mad: Remove improper use of BUG_ON

2015-06-25 Thread Weiny, Ira
 On 6/25/2015 4:52 PM, ira.we...@intel.com wrote:
  From: Ira Weinyira.we...@intel.com
 
  We recently added BUG_ON's which were inappropriate for a condition
  which should never happen. Change these to be WARN_ON_ONCE as a
 debugging aid.
 
  Fixes: 4cd7c9479aff ('IB/mad: Add support for additional MAD info
  to/from drivers')
 
  Signed-off-by: Ira Weinyira.we...@intel.com
 
 remove the blank line,

Is there some documentation on this stuff?

This does not mention anything about white space.

https://www.kernel.org/doc/Documentation/SubmittingPatches

I don't mind following the proper conventions if I knew what they were.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RFC] IB: Add rdma_cap_ib_switch helper and use where appropriate

2015-06-18 Thread Weiny, Ira
 
 On 6/18/2015 4:25 PM, Jason Gunthorpe wrote:
  Ira had patch that made some functions related to this public, not
  sure if it is applied yet..
 
 Which patch from Ira are you referring to ?

This patch which is in  Dougs for 4.2 tree:


commit ca0369e31d1794a4f4e39c52fe39b0406617e2b4
Author: Ira Weiny ira.we...@intel.com
Date:   Wed May 13 20:02:55 2015 -0400

IB/core: Create common start/end port functions

Previously start_port and end_port were defined in 2 places, cache.c and
device.c and this prevented their use in other modules.

Make these common functions, change the name to reflect the rdma
name space, and update existing users.

Signed-off-by: Ira Weiny ira.we...@intel.com
Signed-off-by: Doug Ledford dledf...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing

2015-06-17 Thread Weiny, Ira
 
 ib_verbs define an *extensive* direct HW access API, which is constantly
 evolving.

This is the problem with verbs...

 You cannot describe the intricate object relations and semantics through an
 API.
 In addition, you can't abstract anything or fix stuff in SW.
 The only way to *truly* know what to expect when performing Verbs calls is to
 check the node type.

How can you say this?

mthca, mlx4, mlx5, and qib all have different sets of functionality... all with 
the same node type.  OPA has the same set as qib...  same node type.

 
 ib_verbs was never only an API. It started as the Linux implementation of the
 IBTA standard, with guaranteed semantics and wire protocol.
 Later, the interface was reused to support additional RDMA devices. However,
 you could *always* check the node type if you wanted to, thereby retaining the
 standard guarantees. Win-win situation...

Not true at all.  For example, Qib does not support XRC and yet has the same 
node type as mlx4 (5)...

 
 This is a very strong property; we should not give up on it.

On the contrary the property is weak and implies functionality or lack of 
functionality rather than being explicit.  This was done because getting 
changes to kernel ABIs was hard and we took a shortcut with node type which we 
should not have.  OPA attempts to stop this madness and supports the 
functionality of verbs _As_ _Defined_ rather than creating yet another set of 
things which applications need to check against.

 
 
  You're right that apps can be coded to other CA types, like RNICs and
  USNICs.  However, those are all very different from an IB_CA due to
  limited queue pair types or limited primitives.  If OPA had that same
  limitation then I would agree it needs a different node type.
 
 
 How do you know that it doesn't?

Up until now you have had to take my word for it.  Now that the driver has been 
posted it should be clear what verbs we support (same as qib).

Ira



RE: [PATCH v5 1/4] IB/netlink: Add defines for local service requests through netlink

2015-06-15 Thread Weiny, Ira
 
 On Fri, Jun 12, 2015 at 05:24:32AM +, Weiny, Ira wrote:
 
  PATH_USE_* is a better name.  But I think the defines should be:
 
  PATH_USE_ANY (or ALL?)
 
 How about PATH_USE_CONNECTED?

I don't want to bikeshed this too much as I'm not really good on names but...

How does Connected relate to wanting all available Path options?

Can't a connected transport use a reversible GMP path?

I guess that is where I am getting hung up.

Ira

 
  Why not treat unknown type as ANY/ALL and return all 6 if possible.
  If not possible/supported (ie like ibacm) return GMP which is always
  valid.
 
 Make sense to me..
 
 Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 4/4] IB/sa: Route SA pathrecord query through netlink

2015-06-11 Thread Weiny, Ira
  If user space PR capabilities (unicast, multicast, both) is
  supported, it affects the API.
  How? If we can query SA for multicast PRs without joining the
  multicast groups, what additional changes in netlink API do we need to
 support both?
 
 Nothing to support both but if we wanted to disable one or other based on user
 space we would but it sounds like we don't need/want this but would use user
 space rejection/no data for this.
 

Yes I think we should allow userspace to decide if this is supported or not.  
That allows for more flexibility.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5 1/1] ibacm: Add support for pathrecord query through netlink

2015-06-11 Thread Weiny, Ira
 On Thu, Jun 11, 2015 at 01:04:25PM -0400, kaike@intel.com wrote:
  +static int acm_nl_parse_path_attr(struct nlattr *attr,
  +  struct acm_ep_addr_data *data)
 
  +   switch (attr-nla_type  NLA_TYPE_MASK) {
  +   default:
  +   acm_log(1, WARN: unknown attr %x\n, attr-nla_type);
  +   ret = -1;
  +   break;
 
 I would like to see a mandatory/optional scheme here.
 
 For instance if we add SL it would be mandatory, but policy information like
 requesting net_device would be optional.

Why would SL be mandatory?

Ira
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5 1/1] ibacm: Add support for pathrecord query through netlink

2015-06-11 Thread Weiny, Ira
 
 On Thu, Jun 11, 2015 at 08:16:41PM +, Weiny, Ira wrote:
   For instance if we add SL it would be mandatory, but policy
   information like requesting net_device would be optional.
 
  Why would SL be mandatory?
 
 If the kernel asks for a specific SL then user space must respect that and 
 return
 a path with that exact SL.
 
 If userspace does not know how to do that, then it must not answer the query. 
 It
 cannot ignore the SL requirement and return a random SL.

Oh, right, I misunderstood.  I thought that you were saying that SL would be 
mandatory as a netlink attribute.

 
 Hypothetically, assuming we add SL..

Right...  ;-)

Ira
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5 1/4] IB/netlink: Add defines for local service requests through netlink

2015-06-11 Thread Weiny, Ira
 
  From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
  Sent: Thursday, June 11, 2015 8:22 PM
  To: Wan, Kaike
  Cc: linux-rdma@vger.kernel.org; Fleck, John; Weiny, Ira
  Subject: Re: [PATCH v5 1/4] IB/netlink: Add defines for local service
  requests through netlink
 
  On Thu, Jun 11, 2015 at 11:45:46PM +, Wan, Kaike wrote:
   Apparently, to achieve this goal, we need the following changes:
 
  I don't expect anything more than a proper hole in the UAPI to make
  this work.
 
  I already have patches that implement full kernel support for the
  6-path format, and globally enable APM. With this netlink interface it
  becomes feasible to mainline them. That will require some reworking of
  what you've done, but I don't expect you to do that.
 
  The only remaining thing we need here is that tiny bit of policy
  information for the kernel to specify what the intended use of the path is:
   - For connected CM operation (6 path records)
   - For unidirectional UD (1 path record)
   - For 'misc' GMP like operation (at least 1 reversible path record)
 
  I'm happy if you call it PATH_USE_XX
 
  Testing the reversible bit should be enough to rough this in, if not
  reversible it is PATH_USE_UNIDIRECTIONAL_UD, otherwise it is
 PATH_USE_GMP.
 

PATH_USE_* is a better name.  But I think the defines should be:

PATH_USE_ANY (or ALL?)
PATH_USE_UNIDIRECTIONAL
PATH_USE_GMP

  Recommend that userspace treat any unknown usage type as GMP (it is
  the
 
  most general).

Why not treat unknown type as ANY/ALL and return all 6 if possible.  If not 
possible/supported (ie like ibacm) return GMP which is always valid.

 
 That would be cool. If we add this attribute, should we remove reversible and
 numb_path attributes?

Yes.

 
 Kaike
 
 
  Ira: Do you like 'path use' better as a name? I think I do..

Yea that is a much better name.

Thanks,
Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] IB/core: Don't warn on no SA support in event handler

2015-06-10 Thread Weiny, Ira
   Registering an event handler is done for a device. This device may
   have one RoCE port (no SA cap) and one InfiniBand port (has SA cap).
   Therefore, warning from the event handler about a specific port that
   doesn't have SA cap is correct but pollutes the kernel log without a
   need.
 
  Maybe we should think about register event handler per port :-P
 
 I agree, though I think that can be added separately from this change.

Also agreed.  We identified many places where we should split support to be per 
port while we did this work.  We need to start working up to that.  For the 
time being we should not break existing users (or in this case annoy them... ;-)

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 4/4] IB/sa: Route SA pathrecord query through netlink

2015-06-10 Thread Weiny, Ira
 On 6/10/2015 3:10 PM, Jason Gunthorpe wrote:
  On Wed, Jun 10, 2015 at 01:47:36PM -0400, Hal Rosenstock wrote:
  On 6/9/2015 10:57 AM, kaike@intel.com wrote:
  From: Kaike Wan kaike@intel.com
 
  This patch routes a SA pathrecord query to netlink first
 
  Should only unicast PRs be done in this manner or should API support
  enabling for unicast and/or multicast ?
 
  AFAIK kernel doesn't query multicast PRs now (queries MCMRs) but this
  seems like it would help make it future proof and not have to take
  timeout on local query unless app supports it.
 
  It is a good question. We can clearly extend toward that, using a MGID
  as the DGID and adding additional nested netlink fields.
 
  However, does it make sense?
 
 If it doesn't make sense, then should MGIDs as DGIDs never be requested via
 the PR netlink API ?
 

Why should we prevent it?  What does it hurt?

Ira
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 0/4] Sending kernel pathrecord query to user cache server

2015-06-10 Thread Weiny, Ira
 
This series does not attempt to optimize the kernel needing to
know that a PR has been updated.  There are existing mechanisms for
 that.
  
   Does this exist in the kernel?
 
  At least some support, yes.  For example client reregister marks all
  IPoIB paths as invalid.
 
 Reregister indicates that all PRs are now invalid?

Not directly.  IPoIB treats it that way.  I guess to be safe.

Officially one should register for UnPath/RePath traps.  But no one has ever 
implemented that.

To be clear I am agreeing with Hal that having some sort of update signal 
would be nice.  But I don't think that must be done before this series goes in.

I think a PR update signal should be an extension to this interface.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 0/4] Sending kernel pathrecord query to user cache server

2015-06-10 Thread Weiny, Ira
 
  This series does not attempt to optimize the kernel needing to know
  that a PR has been updated.  There are existing mechanisms for that.
 
 Does this exist in the kernel?

At least some support, yes.  For example client reregister marks all IPoIB 
paths as invalid.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] IB/core: Change rdma_protcol_iboe to roce

2015-05-15 Thread Weiny, Ira
 
 Hi,
 
 Le jeudi 14 mai 2015 à 15:01 -0400, ira.we...@intel.com a écrit :
  From: Ira Weiny ira.we...@intel.com
 
  It has been decided that ROCE should be used within the kernel rather
  than IBOE as we used before.  Change iboe to roce on the new
 rdma_protocol_* functions.
 
 
 Erk ... What's the usefulness of such patch ?

Currently there is inconsistency in the naming of the RoCE technology.

12:27:35  grep -r roce drivers/infiniband | grep -v Binary | grep -i -v proce 
| wc -l
85

All in the driver code.

12:27:48  grep -r iboe drivers/infiniband | grep -v Binary | wc -l
130

Mainly in the core and mlx4 driver.

This patch was to clean up the management helper function as a general move 
toward standardizing on roce rather than iboe.  Most people in the community 
refer to this as RoCE so that name was chosen to move to.

 
 IBoE is used throughout the IB/RDMA subsystem.
 
 Changing only these occurences is rather inconsistent.
 

I asked about changing all the references and Doug mentioned he would make a 
patch to change the other references.  Personally I don't want to see a massive 
rename patch but this could be done.

Ira



RE: [RFC PATCH 1/5] IB/core: Add Core Capability flags to ib_device

2015-05-11 Thread Weiny, Ira
 
 On May 10, 2015, at 11:42 PM, ira.weiny ira.we...@intel.com wrote:
 
  On Tue, May 05, 2015 at 08:22:21PM +, Dave Goodell (dgoodell) wrote:
 
  In the case that usNIC is operating in UDP mode (which is the overwhelming
 majority of the cases), there is absolutely no additional protocol that ends 
 up on
 the wire or headers in the user buffers besides UDP/IP/Ethernet.  They are
 100% plain UDP packets, they just happen to be sent via OS-bypass queues
 instead of traveling through the kernel networking stack.
 
  [^ there continues to be confusion about this for some reason, but I
 don't know why]
 
  So what is this patch for?
 
 Does my earlier email clarify the situation any?  http://marc.info/?l=linux-
 rdmam=142972178630720w=2

Somewhat, is there any reason applications need to distinguish between the  
The legacy RDMA_TRANSPORT_USNIC type and  The current 
RDMA_TRANSPORT_USNIC_UDP type?

Or does the former no longer exist?

 
  commit 248567f79304b953ea492fb92ade097b62ed09b2
  Author: Upinder Malhi uma...@cisco.com
  Date:   Thu Jan 9 14:48:19 2014 -0800
 
  IB/core: Add RDMA_TRANSPORT_USNIC_UDP
 
  Add RDMA_TRANSPORT_USNIC_UDP which will be used by usNIC.
 
  Signed-off-by: Upinder Malhi uma...@cisco.com
  Signed-off-by: Roland Dreier rol...@purestorage.com
 
  This is probably where a lot of the confusion is coming from.
 
 Arguably RDMA_TRANSPORT_USNIC_UDP could/should have simply been
 named RDMA_TRANSPORT_UDP.

I guess I'm wondering if there needs to be an RDMA_TRANSPORT_USNIC to represent 
the  The legacy RDMA_TRANSPORT_USNIC type you mention in the link above.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 07/28] IB/Verbs: Reform IB-ulp ipoib

2015-04-16 Thread Weiny, Ira
 
 On Wed, Apr 15, 2015 at 09:58:18AM +0200, Michael Wang wrote:
 
  We can give client-add() callback a return value and make
  ib_register_device() return -ENOMEM when it failed, just wondering why
  we don't do this at first, any special reason?
 
 No idea, but having ib_register_device fail and unwind if a client fails to 
 attach
 makes sense to me.

Yes that is what we should do _but_ 

I think we should tackle that in a different series.

As you said in another email, this series is getting very long and hard to 
review/prove is correct.  This is why I was advocating keeping a check at the 
top of cm_add_one which verified all Ports supported the CM.  This is the 
current logic today and is proven to work for the devices/use cases out there.

We can clean up the initialization code and implement support for individual 
ports in follow on patches.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 10/27] IB/Verbs: Reform cm related part in IB-core cma/ucm

2015-04-16 Thread Weiny, Ira
 
 On 4/16/2015 11:58 AM, Jason Gunthorpe wrote:
  It also looks like hardwired 1 won't work on switch ports, so it is no-go.
 
 AFAIK enhanced switch port 0 is not supported by CM/RDMA CM in the current
 code. There is no need for CM/RDMA CM on base switch port 0.

I concur and I thought I mentioned that before.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Stepping down as maintainer (was Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management)

2015-04-09 Thread Weiny, Ira
 
 On Tue, Apr 7, 2015 at 10:12 AM, Jason Gunthorpe
 jguntho...@obsidianresearch.com wrote:
  I don't think you understand how deep the problem Or is describing
  goes.
 
  [...Appropriate and correct critique...]
 
 This thread has made me realize that even as I am able to carve out more time
 to work on things like IB maintaintership, I no longer have the desire to 
 spend
 my time maintaining the IB subsystem.  Since my current level of activity is
 clearly hurting the community, I've decided to step down as maintainer.
 

I am sad to see you leave.  Without you InfiniBand would have never been 
accepted upstream at all.

Thank you for all the hard work,
Ira

N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

2015-03-11 Thread Weiny, Ira
 On 3/4/2015 11:41 AM, Weiny, Ira wrote:
 
  InfiniBandInfiniBand  InfiniBand Verbs
  iWARP InfiniBand  iWARP Verbs (subset of IBV, with
specific connection establishment
requirements that don't exist with IBV)
  RoCE  EthernetInfiniBand Verbs (but with different
addressing because of the different
link layer)
  OPA   OPA InfiniBand Verbs
 
  Verbs is an interface definition to hardware that has been twisted to
  be a software API and extended to expose vendor-specific implementation
 'features'
  as extensions.  It is not a transport.
 
  The device capability bits seems to have evolved to mean: vendor A
  implemented some random 'feature' in their hardware and wants all
  applications to now check for this 'feature' and change their code to use 
  it.
  Basically, what gets defined as a device cap is rather arbitrary.
 
 
  This was the point I was trying to make and the reason the OPA MAD support
 was implemented as a device capability.
 
 The proposed device capability stands for a change that is way more drastic
 than just a vendor extension. This is a device running a completely different
 wire protocol which does not interoperate with the IB device that is
 impersonating.
 
 Also, it does not come under the same jurisdication as IB. It is entirely 
 possible
 that IBTA could make some change in the future where OPA can no longer
 masquerade as IB.
 
 OPA must also be identifiable by any verbs 

I disagree.  Software applications running IP don't need to know they are 
running over IB vs Ethernet?  Why would this have to be true for Verbs?

perftest, libibverbs (example apps), mvapich (verbs), openmpi (verbs), ibacm 
(librdmacm based applications), srp, and ipoib all run without any knowledge of 
the link being OPA.  While not an exhaustive list of verbs applications, this 
is a pretty good sampling.  We have specifically designed OPA to support these 
applications without modifications.

 or management application. 

Agreed, but _only_ when talking to the hardware.  Other MAD interfaces are IB 
compatible.

The idea of the original series was to check the device capability bits (in 
kernel and in userspace) for SM diagnostic and diagnostic tool support.  As 
currently submitted I would need to add a kernel ABI to get the extended 
capability bits (because the originals were exhausted).  I have been waiting 
for this discussion to settle before going through that effort.

 To do
 that, it should be properly represented in node type, transport type, and link
 layer as all the other RDMA technologies have been regardless of the changes
 needed. All the other RDMA technologies have gone through this process.
 

As I suggested above: Would setting the Link Layer to a new value (ie. 
IB_LINK_LAYER_OMNI_PATH_ARCH) while maintaining the Transport as InfiniBand 
Verbs be satisfactory?

While this breaks at least some of the examples I list above I believe I have 
worked out a way to phase in this support through our provider library.


To address your comments from the other fork of this thread:

  The 32 bit LIDs in the SMP are designed for future expansion.  Currently OPA
 does not support  16 bit LIDs.
 
 It's not just verbs (structures and APIs) but also CM and any other place 
 where
 LID is used which are numerous.
 
 Is a new verbs coming for this ?

I can't guarantee that no changes will be required in the future.  But right 
now the answer is No because we specifically implemented OPA to be 
InfiniBand Verbs.

 How will compatibility be dealt with ?

Compatibility is provided by presenting identical InfiniBand Verbs interfaces 
(kernel abi, SA protocols, etc) to the user.

 
 This is also another example why a more complete picture of OPA is needed.
 
 -- Hal


At this moment OPA provides ULP compatibility for all IB Verbs applications 
with the exception of management software which talks directly to the SMA and 
PMA.  All ULP interactions with management (Path Record, CM, Multicast joins, 
etc) are still IB formatted and compatible.  We have specifically designed this 
interoperability with OFA.

To help illustrate my point I have included 2 patches below.  The first defines 
a new Link Layer; IB_LINK_LAYER_OMNI_PATH_ARCH and modifies all the kernel 
interfaces which look at link layer.  This has been minimally tested with IPoIB 
and the perftest tools (the modification of which is included as the 2nd 
patch).  The entire patch boils down to changing if (InfiniBand) to if 
(InfiniBand || OPA).  IMO this is rather inefficient but if this is more 
acceptable to the community I am willing to investigate this further.

Ira



From 9f09be92576204b3ead71f714fc231110f03bff6 Mon Sep 17 00:00:00 2001
From: Ira Weiny ira.we...@intel.com
Date: Wed, 3 Dec 2014 20:01:09 -0500
Subject: [PATCH] WIP: IB/core: Add

RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

2015-03-06 Thread Weiny, Ira
 
 On Wed, Mar 04, 2015 at 07:21:48AM +, Weiny, Ira wrote:
 
  I think this is going to break quite a bit.  I have prototyped setting
  OPA devices to OPA Link Layer and the perftest tools just fall over.
  Any changes to the Link layer or the transport types will require a
  transition period for ULPs.
 
 How do the perftest tools work with OPA in the first place? OPA seems to have
 32 bit lids. Do you mean it 'works' as long as the lid is  16 bits? Same 
 general
 point about all of verbs, lots of 'uint16_t lid' in the interfaces?

The 32 bit LIDs in the SMP are designed for future expansion.  Currently OPA 
does not support  16 bit LIDs.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

2015-03-04 Thread Weiny, Ira
 
  InfiniBand  InfiniBand  InfiniBand Verbs
  iWARP   InfiniBand  iWARP Verbs (subset of IBV, with
  specific connection establishment
  requirements that don't exist with IBV)
  RoCEEthernetInfiniBand Verbs (but with different
  addressing because of the different
  link layer)
  OPA OPA InfiniBand Verbs
 
 Verbs is an interface definition to hardware that has been twisted to be a
 software API and extended to expose vendor-specific implementation 'features'
 as extensions.  It is not a transport.
 
 The device capability bits seems to have evolved to mean: vendor A
 implemented some random 'feature' in their hardware and wants all
 applications to now check for this 'feature' and change their code to use it.
 Basically, what gets defined as a device cap is rather arbitrary.
 

This was the point I was trying to make and the reason the OPA MAD support was 
implemented as a device capability.

Ira



RE: [PATCH libibumad] umad.h: Remove umad_reg_flags from enum declaration

2015-03-03 Thread Weiny, Ira
 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Hal Rosenstock
 Sent: Tuesday, March 03, 2015 12:23 PM
 To: linux-rdma (linux-rdma@vger.kernel.org)
 Cc: Weiny, Ira; ra...@mellanox.com
 Subject: [PATCH libibumad] umad.h: Remove umad_reg_flags from enum
 declaration
 
 
 This causes it to be a global variable which causes linking issues when used 
 by
 multiple files linked together. It results in multiple definition of
 `umad_reg_flags'.
 
 Found-by: Rafi Weiner ra...@mellanox.com


Reviewed-by: Ira Weiny ira.we...@intel.com
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com
 ---

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

2015-03-03 Thread Weiny, Ira
Doug,

You have given me a lot to think about...  Comments below...

   
While it is a different type of technology, standard verbs[*]
remains 100%
   compatible.  Unlike other verbs technologies user space software
   does not need any knowledge that the underlying device is not IB.
   For example, PR (and SA) queries, CM, rdmacm, and verbs calls themselves
 are all 100% IB compatible.
  
   Even if OPA is 100% standard verbs compatible which it does not
   appear to be, that does not make OPA an extra capability of an IBA device.
 
  I don't want to make it an extra capability of an IBA device.  I want to 
  make it
 an extra capability of a verbs device.
 
 And this, friends, is why it's bad to make both a link layer and an user 
 space API
 with the exact same name ;-).  Anyway, I get your point Ira and it makes sense
 to me.  However, I also get Hal's point.  Our track record on this particular
 issue is a bit wonky though.

Thanks for laying this out.  I too understand Hals point.

 
 First we had InfiniBand.
 
 Then came iWARP, and we used the transport type to differentiate it from an
 actual InfiniBand device, but left the underlying link layer listed as 
 InfiniBand.
 Then came RoCE, and we listed its transport type as InfiniBand, but changed
 the link layer to Ethernet.  Which left us in the oxymoronic position that 
 even
 though iWARP was over Ethernet, the tools said it was over InfiniBand, while
 RoCE was the only thing that listed Ethernet as the link layer.  We later 
 fixed
 that up with some hacks in tools to keep users from being confused and filing
 bugs.
 
 Maybe this represents an opportunity to straighten some of this mess out.  If 
 I
 remember correctly, this is the matrix of technologies today:
 
 TechnologyLinkLayer   Transport
 
 InfiniBandInfiniBand  InfiniBand Verbs
 iWARP InfiniBand  iWARP Verbs (subset of IBV, with
   specific connection establishment
   requirements that don't exist with IBV)
 RoCE  EthernetInfiniBand Verbs (but with different
   addressing because of the different
   link layer)
 OPA   ?   InfiniBand Verbs

I think this is _relatively_ accurate.  The one exception is with the various 
IB verbs extensions which have been introduced.  While most are being pushed 
into the spec not all of them are in the spec prior to being in the kernel.  It 
makes keeping up with what IB Verbs really is difficult.

Mind you I'm not opposing having IB Verbs be flexible.  But I think we can 
accurately have multiple underlying technologies which support IB Verbs with 
various extensions.

 
 It makes me wonder if we shouldn't make this matrix more accurate:
 
 TechnologyLinkLayer   Transport
 
 InfiniBandInfiniBand  InfiniBand Verbs
 iWARP EthernetiWARP Verbs
 RoCE  EthernetRoCE-v1 or RoCE-v2
 OPA   ?   OPA Verbs
 
 With this sort of setup, the core ib_mad/ib_umad code would simply check the
 verbs type to see what support it can enable.  For IBV it would be the 
 existing
 support, for OPAV it would be the additional jumbo support.

OPA, to be compatible with IB Verbs, uses the same node types as InfiniBand 
verbs (1 == CA, 2 == Switch).  As such it returns the same Transport type.

 
 I'm not sure how much we might expect a change like this to break existing
 software though, so maybe staightening this mess out is a non-starter.

I think this is going to break quite a bit.  I have prototyped setting OPA 
devices to OPA Link Layer and the perftest tools just fall over.  Any changes 
to the Link layer or the transport types will require a transition period for 
ULPs.

 
   While it is a primary goal of the RDMA stack to have a common verbs
   API for various RDMA interconnects, each one is properly represented
   to allow it's unique characteristics to be exposed.
 
  The difference here is that we have maintained IB Verbs compatibility where
 other RDMA technologies did not.  We have tested many Verbs applications
 (both kernel and user space) and they function _without_ _modification_.
 
  Despite this compatibility we are still having this discussion.
 
  I can think of no other way to signal the MAD capability to the MAD stack
 which will preserve the verbs compatibility in the same way.
 
 See above.  Define a new transport type, OPAVerbs, that is a superset of IBV
 and enable jumbo support when OPAV is the transport on the link.

But the transport type is not changing.  Once again we are attempting to be 
completely verbs compatible.  From the MAD stack POV the verbs calls in the 
kernel are not different.

Would it be acceptable if the result of my patch series was:

InfiniBand  InfiniBand  InfiniBand Verbs
iWARP   InfiniBand  iWARP Verbs (subset of IBV, with
specific 

RE: [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr

2015-02-25 Thread Weiny, Ira
  diff --git a/drivers/infiniband/core/mad.c
  b/drivers/infiniband/core/mad.c index 819b794..a6a33cf 100644
  --- a/drivers/infiniband/core/mad.c
  +++ b/drivers/infiniband/core/mad.c
  @@ -2924,6 +2924,12 @@ static int ib_mad_port_open(struct ib_device
 *device,
  char name[sizeof ib_mad123];
  int has_smi;
 
  +   if (device-cached_dev_attrs.max_mad_size  IB_MGMT_MAD_SIZE) {
  +   dev_err(device-dev, Min MAD size for device is %u\n,
  +   IB_MGMT_MAD_SIZE);
  +   return -EFAULT;
  +   }
  +
 
 The printk message here is not very informative and it qualifies as an error.
 Someone reading that for the first time in the dmesg output and wondering
 why their device isn't working will be confused if they don't know about the
 mad size changes you are making here.  Something like max supported MAD
 size (%u)  min required by ib_mad (%u), ignoring dev \n

Good suggestion.

Fixed for v5 with this message.

+   dev_err(device-dev,
+   Max supported MAD size (%u)  min required by ib_mad 
(%u), ignoring device (%s)\n,
+   device-cached_dev_attrs.max_mad_size,
+   IB_MGMT_MAD_SIZE, device-name);


Ira



RE: [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr

2015-02-25 Thread Weiny, Ira
 
  Fixed for v5 with this message.
 
  +   dev_err(device-dev,
  +   Max supported MAD size (%u)  min required by 
  ib_mad
 (%u), ignoring device (%s)\n,
  +   device-cached_dev_attrs.max_mad_size,
  +   IB_MGMT_MAD_SIZE, device-name);
 
 It also seems redundant since the only call to ib_mad_port_open is:
 
 if (ib_mad_port_open(device, i)) {
 printk(KERN_ERR PFX Couldn't open %s port %d\n,
device-name, i);
 
 So, why does this particular error deserve a special double error print? I
 assume it is basically impossible to hit?

This does indicate a coding error.  Generally I prefer details of why the 
device could not open the port.  But if the community feels this is redundant 
or not possible I can drop the hunk.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 07/19] IB/mad: Convert ib_mad_private allocations from kmem_cache to kmalloc

2015-02-25 Thread Weiny, Ira
 
  +static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev) {
  +   return (kmalloc(sizeof(struct ib_mad_private_header) +
  +   sizeof(struct ib_grh) +
  +   dev-cached_dev_attrs.max_mad_size,
 GFP_ATOMIC));
 
 Ouch!  GFP_ATOMIC?  I thought that generally all of the mad processing was
 done from workqueue context where sleeping is allowed?  In the two places
 where you removed kmem_cache_alloc() calls and replaced it with calls to this
 code, they both used GFP_KERNEL and now you have switched it to
 GFP_ATOMIC.  If there isn't a good reason for this, it should be switched back
 to GFP_KERNEL.

The original kmem_cache_allocs are actually both GFP_ATOMIC (1 usage, see 
below) and GFP_KERNEL (the 2 usages you reference).

My bad for not making this specific to the allocation.

I will research the original GFP_ATOMIC usage and if it is necessary have this 
function take gfp_t.  Otherwise if we can get away with GFP_KERNEL I agree that 
would be best.

 
  +}
  +
   /*
* Return 0 if SMP is to be sent
* Return 1 if SMP was consumed locally (whether or not solicited) @@
  -771,7 +776,8 @@ static int handle_outgoing_dr_smp(struct
 ib_mad_agent_private *mad_agent_priv,
  }
  local-mad_priv = NULL;
  local-recv_mad_agent = NULL;
  -   mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_ATOMIC);

Original usage here... 

Thanks,
Ira

  +
  +   mad_priv = alloc_mad_priv(mad_agent_priv-agent.device);
  if (!mad_priv) {
  ret = -ENOMEM;
  dev_err(device-dev, No memory for local response
 MAD\n); @@



RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

2015-02-24 Thread Weiny, Ira
 
  Do you have a suggestion for alternatives?
 
  The desire to leverage the IB MAD infrastructure for OPA is
  understood but the current approach represents OPA as a device
  capability which does not seem appropriate because OPA is clearly a
  different type of RDMA technology than IB.
 
 
  While it is a different type of technology, standard verbs[*] remains 100%
 compatible.  Unlike other verbs technologies user space software does not need
 any knowledge that the underlying device is not IB.  For example, PR (and SA)
 queries, CM, rdmacm, and verbs calls themselves are all 100% IB compatible.
 
 Even if OPA is 100% standard verbs compatible which it does not appear to be,
 that does not make OPA an extra capability of an IBA device.

I don't want to make it an extra capability of an IBA device.  I want to make 
it an extra capability of a verbs device.

 While it is a primary goal of the RDMA stack to have a common verbs API for
 various RDMA interconnects, each one is properly represented to allow it's
 unique characteristics to be exposed.

The difference here is that we have maintained IB Verbs compatibility where 
other RDMA technologies did not.  We have tested many Verbs applications (both 
kernel and user space) and they function _without_ _modification_.

Despite this compatibility we are still having this discussion.

I can think of no other way to signal the MAD capability to the MAD stack which 
will preserve the verbs compatibility in the same way.

 
  Therefore, to address your initial question regarding tradeoffs I believe 
  this
 method is the least invasive to the code as well as removing any potential
 performance penalties to core verbs.
 
  Ira
 
  [*] We don't support some of the extensions particularly those which have
 been most recently introduced.  And we would like to make our own extensions
 in the form of higher MTU availability, but the patch is not yet ready to be
 submitted upstream.
 
 There appear to be a number of things that are not exposed by the current
 patch set which will be needed in subsequent patches. It would be better to 
 see
 the complete picture so it can be reviewed as a whole.

Is there something in particular you would like to see?  There are no other 
patches required in the core modules for verbs applications to function.  The 
MTU patch only improves verbs performance.

Ira 

 
 -- Hal
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RE-RESEND V2 for-next 4/5] IB/core: Make sure that the PSN does not overflow

2015-02-16 Thread Weiny, Ira

 
 0xff could be made a #define
 
 Question: userspace is allowed to ask for a PSN on 32bits, but it will be 
 silently
 truncated, is it going to puzzle applications ?
 
 Anyway, it would have been better to return an error in the first place ... 
 not
 sure if we can do it now ...
 

Yes I'm still not sure what bug this fixes.  Or did you ever find out?

Ira
 



RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

2015-02-11 Thread Weiny, Ira
 On 2/4/2015 6:29 PM, ira.we...@intel.com wrote:
  From: Ira Weiny ira.we...@intel.com
 
  OPA MADs share a common header with IBTA MADs but with a different
  base version and an extended length.  These jumbo MADs increase the
  performance of management traffic.
 
  Sharing a common header with IBTA MADs allows us to share most of the
  MAD processing code when dealing with OPA MADs in addition to
  supporting some IBTA MADs on OPA devices.
 
  Add a device capability flag to indicate OPA MAD support on the device.
 
  Signed-off-by: Ira Weiny ira.we...@intel.com
 
  ---
   include/rdma/ib_verbs.h | 5 +
   1 file changed, 5 insertions(+)
 
  diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
  3ab4033..2614233 100644
  --- a/include/rdma/ib_verbs.h
  +++ b/include/rdma/ib_verbs.h
  @@ -128,6 +128,10 @@ enum ib_device_cap_flags {
  IB_DEVICE_ON_DEMAND_PAGING  = (131),
   };
 
  +enum ib_device_cap_flags2 {
  +   IB_DEVICE_OPA_MAD_SUPPORT   = 1
  +};
  +
   enum ib_signature_prot_cap {
  IB_PROT_T10DIF_TYPE_1 = 1,
  IB_PROT_T10DIF_TYPE_2 = 1  1,
  @@ -210,6 +214,7 @@ struct ib_device_attr {
  int sig_prot_cap;
  int sig_guard_cap;
  struct ib_odp_caps  odp_caps;
  +   u64 device_cap_flags2;
  u32 max_mad_size;
   };
 
 
 Why is OPA support determined via a device capability flag ? What are the
 tradeoffs for doing it this way versus the other choices that have been used 
 in
 the past for other RDMA technologies like RoCE, iWARP, usNIC, ... ?

None of those technologies use the MAD stack for Subnet Management.  Other MAD 
support is very limited (ie IB compatible PMA queries on the local port only).

Do you have a suggestion for alternatives?

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] libibmad 1.3.12 release

2015-02-10 Thread Weiny, Ira
There is a new release of libibmad available at:

https://www.openfabrics.org/downloads/management/

md5sum:
7aabf50569c4d30a9a0689fd96097e4f  libibmad-1.3.12.tar.gz

Dependencies:

1) libibumad = 1.3.7
2) opensm-libs = 3.3.10
3) ib_umad kernel module

Release notes for 1.3.11 to 1.3.12 are:

   1) bug fixes
   2) new Fields for Port Extended Speeds


Updates since 1.3.11


  Hal Rosenstock  :  fields.c: Fix commentary typo
  Al Chu  :  libibmad/src/rpc.c: Remove superfluous packet dump
  Hal Rosenstock  :  fields.c: Print SwitchCongestionSetting threshold fields 
in hex rather than decimal
  Dan Ben Yosef  :  libibmad: Add new fields for SM:PortInfoExtended and for 
PM:PortExtendedSpeedsCounters
  Hal Rosenstock  :  dump.c: In mad_dump_linkwidth, output undefined widths

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] infiniband-diags 1.6.5 release

2015-02-10 Thread Weiny, Ira
There is a new release of infiniband-diags at:

https://www.openfabrics.org/downloads/management/

md5sum:
2a587bda5fc8287643c83363e4b6ec21  infiniband-diags-1.6.5.tar.gz

Dependencies:

1) libibmad = 1.3.12
2) libibumad = 1.3.7
3) opensm-libs = 3.3.10
4) ib_umad kernel module
5) glib2

Release notes v1.6.4 = 1.6.5

   1) bug fixes
   2) updates for new hardware device IDs
   3) rdma-ndd daemon

Authors since 1.6.4

Dan Ben Yosef da...@mellanox.com
ibtracert.c: Remove checking the port 0 state for base switch port 0
Add new configure flag to enable/disable the rdma-ndd build
New MAD SM:PortInfoExtended and changing PM:PortExtendedSpeedsCounters
Fix missing source dir in configure.ac

Hal Rosenstock h...@dev.mellanox.co.il
Align infiniband-diags MLNX device IDs with OpenSM
vendstat.c: Add additional vendor IDs to ext_fw_info_device
ibnetdisc.c: Add additional supported device IDs to 
is_mlnx_ext_port_info_supported
ibdiag_common.c: Add more supported device IDs in 
is_mlnx_ext_port_info_supported
ibccconfig.c: Account for enhanced switch port 0 in 
SwitchPortCongestionSetting handling
ibccquery.c: Account for enhanced switch port 0 in 
SwitchPortCongestionSetting handling
saquery.c: Return proper status from query_sa_cpi on bad result status
saquery.c: Fix handling of cpi and ClassPortInfo options
ibping.c: Fix support of OUIs other than IB_OPENIB_OUI
ibportstate.c: Cosmetic changes to MLNX Extended PortInfo
saquery.c: In query_sa_cpi, make error message consistent
saquery.c: In print_node_record, change LID print format

Ira Weiny ira.we...@intel.com
infiniband-diags: update for udev changes.
infiniband-diags/test-code: remove unused variables
infiniband-diags: add rdma-ndd daemon
infiniband-diags: update smpquery man page
infiniband-diags: If rst2man is available fix missing source dir in 
configure

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 00/19] IB/mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing.

2015-02-09 Thread Weiny, Ira
 
 On 2/4/2015 6:29 PM, ira.we...@intel.com wrote:
  From: Ira Weiny ira.we...@intel.com
 
  The following patch series modifies the kernel MAD processing
  (ib_mad/ib_umad) and related interfaces to send and receive Intel
  Omni-Path Architecture MADs on devices which support them.
 
  In addition to supporting some IBTA management classes, OPA devices
  use MADs with lengths up to 2K.  These jumbo MADs increase the
  performance of management traffic.
 
  To distinguish IBTA MADs from OPA MADs a new Base Version is introduced.
 
 With your recent changes, I don't think that statement above is strictly true 
 any
 longer. While OPA does use a different base version for it's jumbo MADs, 
 aren't
 OPA MADs distinguished from IBTA MADs by the new OPA MAD device
 capability bit ?
 

True.

However, OPA MADs with a base version of 0x1 are compatible with and therefore 
can be processed by the same code as IBTA MADs.

If I need to respin the series I will update the comment.

 
 What performance tests were run in terms of IBTA MADs ?

Sorry for the delay.  As there have been a couple of versions of this series 
since I ran those tests in December I took the time to re-run these tests.  

OpenSM (sweep time) and infiniband-diag (iblinkinfo and saquery) were timed on 
my small cluster with no noticeable change in performance.  But this is not the 
best test as I only have 6 or so nodes on 2 switches.

For example iblinkinfo runs very quickly:

[root@phcppriv12 OPENIB_FF]# time iblinkinfo  /dev/null

real0m0.072s
user0m0.002s
sys 0m0.041s


The better test we have at this small scale are a couple of tools (closed 
source) which send SMA and PMA packets as rapidly as possible.

Those showed no difference in performance.

For example I ran these tools with 3 different kernels 1) stock roland,  2) 
the series in question up to the kmalloc patch,  and  final 3) then the full 
OPA series.

Here is a summary of the results:

   Roland for-next (ecb7b12) up to kmalloc patch
   full OPA patch set

SMA2107221324   
 21381
SMA rcv  1713917329 
   17303

PMA24159.4 24401.2  
   24166.5
PMA rcv  24159.4 24401.2
 24166.5


NOTE: The results shown above are specifically shown without units as I am not 
allowed to publish performance numbers.  However larger numbers equal better 
performance.

The numbers are quite repeatable and as you can see are all close to each other 
from kernel to kernel.
 
The remote node for these tests was running the stock RHEL7 kernel.  All 
software and hardware was held constant except for the kernel patches in 
question.

Ira

 
 -- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

2015-02-06 Thread Weiny, Ira
 
 On 05/02/2015 04:54, Weiny, Ira wrote:
 
  On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote:
 
  Anyway, I recognize that uverb way of abusing write() syscall is
  borderline (at best) regarding other Linux subsystems and Unix
  paradigm in general. But it's not enough to screw it more.
 
  Then we must return the correct output size explicitly in the struct.
 
  I was thinking this very same thing as I read through this thread.
 
  I too would like to avoid the use of comp_masks if at all possible.  The 
  query
 call seems to be a verb where the structure size is all you really need to 
 know
 the set of values returned.
 
  As Jason says, other calls may require the comp_mask where 0 is not
 sufficient to indicate missing.
 
 Would it be okay to return it in the ib_uverbs_cmd_hdr.out_words? That would
 further abuse the write() syscall by writing to the input buffer.

I don't think that is such a great idea.
 
 However, the only other alternative I see is to add it explicitly to every 
 uverb
 response struct.
 

I think this is the best solution.  There is a 32 bit reserved field in 
ib_uverbs_ex_query_device_resp.  Could we use all or part of that to be the 
size?

For other extended commands I'm not sure what to do.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv2 infiniband-diags] ibtracert.c: Remove checking the port 0 state for base switch port 0

2015-02-06 Thread Weiny, Ira
 
 From: Dan Ben Yosef da...@mellanox.com
 
 The port 0 state check needed to be only for HCA/Routers and
 EnhancedPort0 switches.
 For BasePort0 switches, the PortState field, in PortInfo arrtibute, for port0 
 is
 undefined.
 
 Signed-off-by: Dan Ben Yosef da...@mellanox.com

Thanks applied,
Ira 

 ---
 Changes since v1:
 Changed is_route_inactive_port0 to is_port_inactive Moved description from
 above use of routine to above routine (is_port_inactive)
 
 diff --git a/src/ibtracert.c b/src/ibtracert.c index d32968b..9414618 100644
 --- a/src/ibtracert.c
 +++ b/src/ibtracert.c
 @@ -92,6 +92,7 @@ struct Switch {
   int mccap;
   int linearFDBtop;
   int fdb_base;
 + int enhsp0;
   int8_t fdb[64];
   char switchinfo[64];
  };
 @@ -114,6 +115,24 @@ struct Node {
  Node *nodesdist[MAXHOPS];
  uint64_t target_portguid;
 
 +/*
 + * is_port_inactive
 + * Checks whether or not the port state is other than active.
 + * The sw argument is only relevant when the port is on a
 + * switch; for HCAs and routers, this argument is ignored.
 + * Returns 1 when port is not active and 0 when active.
 + * Base switch port 0 is considered always active.
 + */
 +static int is_port_inactive(Node * node, Port * port, Switch * sw) {
 + int res = 0;
 + if (port-state != 4 
 + (node-type != IB_NODE_SWITCH ||
 +  (node-type == IB_NODE_SWITCH  sw-enhsp0)))
 + res = 1;
 + return res;
 +}
 +
  static int get_node(Node * node, Port * port, ib_portid_t * portid)  {
   void *pi = port-portinfo, *ni = node-nodeinfo, *nd = node-nodedesc;
 @@ -164,6 +183,7 @@ static int switch_lookup(Switch * sw, ib_portid_t *
 portid, int lid)
 
   mad_decode_field(si, IB_SW_LINEAR_FDB_CAP_F, sw-linearcap);
   mad_decode_field(si, IB_SW_LINEAR_FDB_TOP_F, sw-
 linearFDBtop);
 + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, sw-enhsp0);
 
   if (lid = sw-linearcap  lid  sw-linearFDBtop)
   return -1;
 @@ -248,7 +268,7 @@ static int find_route(ib_portid_t * from, ib_portid_t *
 to, int dump)
   Port *port, fromport, toport, nextport;
   Switch sw;
   int maxhops = MAXHOPS;
 - int portnum, outport;
 + int portnum, outport = 255, next_sw_outport = 255;
 
   memset(fromnode,0,sizeof(Node));
   memset(tonode,0,sizeof(Node));
 @@ -274,20 +294,25 @@ static int find_route(ib_portid_t * from, ib_portid_t *
 to, int dump)
   portnum = port-portnum;
 
   dump_endnode(dump, From, node, port);
 + if (node-type == IB_NODE_SWITCH) {
 + next_sw_outport = switch_lookup(sw, from, to-lid);
 + if (next_sw_outport  0 || next_sw_outport  node-numports)
 {
 + /* needed to print the port in badtbl */
 + outport = next_sw_outport;
 + goto badtbl;
 + }
 + }
 
   while (maxhops--) {
 - if (port-state != 4)
 + if (is_port_inactive(node, port, sw))
   goto badport;
 
   if (sameport(port, toport))
   break;  /* found */
 
 - outport = portnum;
   if (node-type == IB_NODE_SWITCH) {
   DEBUG(switch node);
 - if ((outport = switch_lookup(sw, from, to-lid))  0 ||
 - outport  node-numports)
 - goto badtbl;
 + outport = next_sw_outport;
 
   if (extend_dpath(from-drpath, outport)  0)
   goto badpath;
 @@ -307,6 +332,7 @@ static int find_route(ib_portid_t * from, ib_portid_t *
 to, int dump)
  (node-type == IB_NODE_ROUTER)) {
   int ca_src = 0;
 
 + outport = portnum;
   DEBUG(ca or router node);
   if (!sameport(port, fromport)) {
   IBWARN
 @@ -335,8 +361,19 @@ static int find_route(ib_portid_t * from, ib_portid_t *
 to, int dump)
   nextport.portnum =
   from-drpath.p[from-drpath.cnt + 1];
   }
 + /* only if the next node is a switch, get switch info */
 + if (nextnode.type == IB_NODE_SWITCH) {
 + next_sw_outport = switch_lookup(sw, from, to-lid);
 + if (next_sw_outport  0 ||
 + next_sw_outport  nextnode.numports) {
 + /* needed to print the port in badtbl */
 + outport = next_sw_outport;
 + goto badtbl;
 + }
 + }
 +
   port = nextport;
 - if (port-state != 4)
 + if (is_port_inactive(nextnode, port, sw))
   goto badoutport;
   node = nextnode;
   portnum = port-portnum;
--
To unsubscribe from this list: send the line 

RE: [PATCH infiniband-diags] Add new configure flag to enable/disable the rdma-ndd build

2015-02-06 Thread Weiny, Ira
 
 From: Dan Ben Yosef da...@mellanox.com
 
 Add new flag --enable-rdma-ndd (default=yes)
 
 Signed-off-by: Dan Ben Yosef da...@mellanox.com

Thanks applied,
Ira 

 ---
 diff --git a/Makefile.am b/Makefile.am
 index 4e08c2b..63c4b48 100644
 --- a/Makefile.am
 +++ b/Makefile.am
 @@ -15,9 +15,11 @@ sbin_PROGRAMS = src/ibaddr src/ibnetdiscover
 src/ibping src/ibportstate \
   src/perfquery src/sminfo src/smpdump src/smpquery \
   src/saquery src/vendstat src/iblinkinfo \
   src/ibqueryerrors src/ibcacheedit src/ibccquery \
 - src/ibccconfig \
 - src/dump_fts \
 - src/rdma-ndd
 + src/ibccconfig src/dump_fts
 +
 +if ENABLE_RDMA_NDD
 +sbin_PROGRAMS += src/rdma-ndd
 +endif
 
  if ENABLE_TEST_UTILS
  sbin_PROGRAMS += src/ibsendtrap src/mcm_rereg_test @@ -71,10 +73,13
 @@ man_MANS = doc/man/ibaddr.8 \
   doc/man/smpquery.8 \
   doc/man/vendstat.8 \
   doc/man/infiniband-diags.8 \
 - doc/man/rdma-ndd.8 \
   man/dump_lfts.8 \
   man/dump_mfts.8
 
 +if ENABLE_RDMA_NDD
 +man_MANS += doc/man/rdma-ndd.8
 +endif
 +
  # define this for the dist target
  compat_man_pages = man/ibdiscover.8 man/ibcheckerrors.8
 man/ibcheckerrs.8 \
   man/ibchecknet.8 man/ibchecknode.8 man/ibcheckport.8 \
 @@ -161,6 +166,8 @@ install-data-hook:
   $(top_srcdir)/config/install-sh -c -m 444
 $(top_srcdir)/etc/error_thresholds $(DESTDIR)/$(sysconfdir)/infiniband-diags
   $(top_srcdir)/config/install-sh -c -m 400 $(top_srcdir)/etc/ibdiag.conf
 $(DESTDIR)/$(sysconfdir)/infiniband-diags
 
 +if ENABLE_RDMA_NDD
  install-exec-hook:
   $(top_srcdir)/config/install-sh -m 755 -d
 $(DESTDIR)/$(sysconfdir)/init.d
   $(top_srcdir)/config/install-sh -m 755 $(top_srcdir)/etc/rdma-ndd.init
 $(DESTDIR)/$(sysconfdir)/init.d/rdma-ndd
 +endif
 diff --git a/configure.ac b/configure.ac index ed1b834..11f47f7 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -180,6 +180,18 @@ fi
  AC_SUBST(ibnetdisc_api_version)
  dnl End libibnetdisc stuff
 
 +dnl with or with rdma-ndd
 +AC_MSG_CHECKING(for --enable-rdma-ndd)
 +AC_ARG_ENABLE(rdma-ndd,
 +[  --enable-rdma-ndd build rdma-ndd (default=yes)], [case
 +${enableval} in
 +  yes) rdmandd=yes ;;
 +  no)  rdmandd=no ;;
 +  *) AC_MSG_ERROR(bad value ${enableval} for --enable-rdma-ndd) ;;
 +esac],[rdmandd=yes])
 +AM_CONDITIONAL(ENABLE_RDMA_NDD, test x$rdmandd = xyes)
 +AC_MSG_RESULT(${rdmandd=no})
 +
  dnl configures for rdma-ndd startup script  default_rdma_service=openibd
 AC_ARG_WITH([rdma_service], @@ -198,9 +210,12 @@ default_start=null
  AC_SUBST(DEFAULT_START, $default_start)  AC_SUBST(DEFAULT_STOP,
 $default_stop)
 
 -AC_CHECK_HEADER(libudev.h, with_udev=yes, with_udev=no) -
 AC_CHECK_LIB(udev, udev_monitor_ref, [], AC_MSG_ERROR(libudev is
 required for rdma-ndd...))
 -AC_CHECK_FUNCS_ONCE(udev_get_sys_path)
 +if test x$rdmandd = xyes; then
 +AC_CHECK_HEADER(libudev.h, with_udev=yes, with_udev=no)
 +AC_CHECK_LIB(udev, udev_monitor_ref, [], AC_MSG_ERROR(libudev is
 required for rdma-ndd...))
 +AC_CONFIG_FILES([doc/man/rdma-ndd.8 etc/rdma-ndd.init])
 +AC_CHECK_FUNCS_ONCE(udev_get_sys_path)
 +fi
 
  dnl Generate doc/man/*.in files if possible  DOC_DATE=`date +%Y-%m-%d`
 @@ -275,8 +290,6 @@ AC_CONFIG_FILES([\
   doc/man/smpquery.8 \
   doc/man/vendstat.8 \
   doc/man/infiniband-diags.8 \
 - doc/man/rdma-ndd.8 \
 - etc/rdma-ndd.init \
   libibnetdisc/Makefile \
  ])
  AC_OUTPUT
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH infiniband-diags] Align infiniband-diags MLNX device IDs with OpenSM

2015-02-06 Thread Weiny, Ira
 
 
 Same device IDs as in opensm/osm_subnet.c:is_mlnx_ext_port_info_supported
 
 Switch-IB is device ID 0xcb20 rather than 0xcb84
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks applied,
Ira

 ---
 diff --git a/libibnetdisc/src/ibnetdisc.c b/libibnetdisc/src/ibnetdisc.c index
 f3c6000..e346905 100644
 --- a/libibnetdisc/src/ibnetdisc.c
 +++ b/libibnetdisc/src/ibnetdisc.c
 @@ -203,7 +203,7 @@ static int is_mlnx_ext_port_info_supported(ibnd_port_t
 * port)  {
   uint16_t devid = (uint16_t) mad_get_field(port-node-info, 0,
 IB_NODE_DEVID_F);
 
 - if (devid == 0xc738 || devid == 0xcb84)
 + if ((devid = 0xc738  devid = 0xc73b) || devid == 0xcb20)
   return 1;
   if (devid = 0x1003  devid = 0x1016)
   return 1;
 diff --git a/src/ibdiag_common.c b/src/ibdiag_common.c index
 666edbc..e09623d 100644
 --- a/src/ibdiag_common.c
 +++ b/src/ibdiag_common.c
 @@ -530,7 +530,7 @@ int is_port_info_extended_supported(ib_portid_t *
 dest, int port,  int is_mlnx_ext_port_info_supported(uint32_t devid)  {
   if (ibd_ibnetdisc_flags  IBND_CONFIG_MLX_EPI) {
 - if (devid == 0xc738 || devid == 0xcb84)
 + if ((devid = 0xc738  devid = 0xc73b) || devid == 0xcb20)
   return 1;
   if (devid = 0x1003  devid = 0x1016)
   return 1;
 diff --git a/src/vendstat.c b/src/vendstat.c index 7fc4a11..11ee73e 100644
 --- a/src/vendstat.c
 +++ b/src/vendstat.c
 @@ -144,8 +144,8 @@ typedef struct {
 
  static uint16_t ext_fw_info_device[][2] = {
   {0x0245, 0x0245},   /* Switch-X */
 - {0xc738, 0xc738},   /* Switch-X */
 - {0xcb84, 0xcb84},   /* Switch-IB */
 + {0xc738, 0xc73b},   /* Switch-X */
 + {0xcb20, 0xcb20},   /* Switch-IB */
   {0x01b3, 0x01b3},   /* IS-4 */
   {0x1003, 0x1016},   /* Connect-X */
   {0x, 0x}};
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RE-RESEND V2 for-next 4/5] IB/core: Make sure that the PSN does not overflow

2015-02-05 Thread Weiny, Ira
 Subject: [PATCH RE-RESEND V2 for-next 4/5] IB/core: Make sure that the PSN
 does not overflow

Is there a particular bug  which this fixes?

Ira

 
 From: Majd Dibbiny m...@mellanox.com
 
 The rq/sq-psn is 24 bits as defined in the IB spec, therefore ULPs and User
 space applications shouldn't use the 8 most significant bits in the 32 bits
 variables to avoid overflow in modify_qp.
 
 Fixed the PSN generation by the RDMA-CM to mask out the 8 most significant
 bits, also mask out these bits in uverbs for attributes provided by 
 user-space.
 
 Signed-off-by: Majd Dibbiny m...@mellanox.com
 Signed-off-by: Or Gerlitz ogerl...@mellanox.com
 ---
  drivers/infiniband/core/cma.c|1 +
  drivers/infiniband/core/uverbs_cmd.c |4 ++--
  2 files changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index d570030..fab0ee5 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -512,6 +512,7 @@ struct rdma_cm_id
 *rdma_create_id(rdma_cm_event_handler event_handler,
   INIT_LIST_HEAD(id_priv-listen_list);
   INIT_LIST_HEAD(id_priv-mc_list);
   get_random_bytes(id_priv-seq_num, sizeof id_priv-seq_num);
 + id_priv-seq_num = 0xff;
 
   return id_priv-id;
  }
 diff --git a/drivers/infiniband/core/uverbs_cmd.c
 b/drivers/infiniband/core/uverbs_cmd.c
 index 532d8eb..ecb6430 100644
 --- a/drivers/infiniband/core/uverbs_cmd.c
 +++ b/drivers/infiniband/core/uverbs_cmd.c
 @@ -2053,8 +2053,8 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file
 *file,
   attr-path_mtu= cmd.path_mtu;
   attr-path_mig_state  = cmd.path_mig_state;
   attr-qkey= cmd.qkey;
 - attr-rq_psn  = cmd.rq_psn;
 - attr-sq_psn  = cmd.sq_psn;
 + attr-rq_psn  = cmd.rq_psn  0xff;
 + attr-sq_psn  = cmd.sq_psn  0xff;
   attr-dest_qp_num = cmd.dest_qp_num;
   attr-qp_access_flags = cmd.qp_access_flags;
   attr-pkey_index  = cmd.pkey_index;
 --
 1.7.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask

2015-02-04 Thread Weiny, Ira
 
 On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote:
 
  Anyway, I recognize that uverb way of abusing write() syscall is
  borderline (at best) regarding other Linux subsystems and Unix
  paradigm in general. But it's not enough to screw it more.
 
 Then we must return the correct output size explicitly in the struct.

I was thinking this very same thing as I read through this thread.

I too would like to avoid the use of comp_masks if at all possible.  The query 
call seems to be a verb where the structure size is all you really need to know 
the set of values returned.

As Jason says, other calls may require the comp_mask where 0 is not sufficient 
to indicate missing.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] IB/mad: fix ifnullfree.cocci warnings

2015-02-03 Thread Weiny, Ira


 -Original Message-
 From: Wu, Fengguang
 Sent: Tuesday, January 27, 2015 1:36 PM
 To: Weiny, Ira
 Cc: kbuild-...@01.org; Roland Dreier; Hefty, Sean; Hal Rosenstock; Or Gerlitz;
 Yan Burman; linux-rdma@vger.kernel.org; linux-ker...@vger.kernel.org
 Subject: [PATCH] IB/mad: fix ifnullfree.cocci warnings
 
 drivers/infiniband/core/mad.c:2088:3-8: WARNING: NULL check before freeing
 functions like kfree, debugfs_remove, debugfs_remove_recursive or
 usb_free_urb is not needed. Maybe consider reorganizing relevant code to
 avoid passing NULL values.
 
  NULL check before some freeing functions is not needed.
 
  Based on checkpatch warning
  kfree(NULL) is safe this check is probably not required
  and kfreeaddr.cocci by Julia Lawall.
 
 Generated by: scripts/coccinelle/free/ifnullfree.cocci
 
 Signed-off-by: Fengguang Wu fengguang...@intel.com


Signed-off-by: Ira Weiny ira.we...@intel.com


 ---
 
  mad.c |3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 --- a/drivers/infiniband/core/mad.c
 +++ b/drivers/infiniband/core/mad.c
 @@ -2084,8 +2084,7 @@ out:
   /* Post another receive request for this QP */
   if (response) {
   ib_mad_post_receive_mads(qp_info, response);
 - if (recv)
 - kfree(recv);
 + kfree(recv);
   } else
   ib_mad_post_receive_mads(qp_info, recv);  }
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] IB/mad: fix ifnullfree.cocci warnings

2015-02-03 Thread Weiny, Ira
 
  drivers/infiniband/core/mad.c:2088:3-8: WARNING: NULL check before
  freeing functions like kfree, debugfs_remove, debugfs_remove_recursive
  or usb_free_urb is not needed. Maybe consider reorganizing relevant
  code to avoid passing NULL values.
 
   NULL check before some freeing functions is not needed.
 
   Based on checkpatch warning
   kfree(NULL) is safe this check is probably not required
   and kfreeaddr.cocci by Julia Lawall.
 
  Generated by: scripts/coccinelle/free/ifnullfree.cocci
 
  Signed-off-by: Fengguang Wu fengguang...@intel.com
 
 
 Signed-off-by: Ira Weiny ira.we...@intel.com

Sorry this should have been:

Reviewed-by: Ira Weiny ira.we...@intel.com
Tested-by: Ira Weiny ira.we...@intel.com


 
 
  ---
 
   mad.c |3 +--
   1 file changed, 1 insertion(+), 2 deletions(-)
 
  --- a/drivers/infiniband/core/mad.c
  +++ b/drivers/infiniband/core/mad.c
  @@ -2084,8 +2084,7 @@ out:
  /* Post another receive request for this QP */
  if (response) {
  ib_mad_post_receive_mads(qp_info, response);
  -   if (recv)
  -   kfree(recv);
  +   kfree(recv);
  } else
  ib_mad_post_receive_mads(qp_info, recv);  }
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH infiniband-diags] ibtracert.c: Remove checking the port 0 state for BasePort0 switches

2015-01-28 Thread Weiny, Ira
Sorry for the delay.

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Hal Rosenstock
 Sent: Tuesday, January 20, 2015 5:00 AM
 To: Weiny, Ira
 Cc: Dan Ben-Yosef; linux-rdma (linux-rdma@vger.kernel.org)
 Subject: [PATCH infiniband-diags] ibtracert.c: Remove checking the port 0 
 state
 for BasePort0 switches
 
 From: Dan Ben Yosef da...@mellanox.com
 
 The port 0 state check is needed only for HCA/Routers and
 EnhancedPort0 switches.
 
 For BasePort0 switches, the PortState field in the PortInfo attribute for 
 port0 is
 undefined (not used).
 
 Signed-off-by: Dan Ben Yosef da...@mellanox.com
 ---
  src/ibtracert.c |   47 ---
  1 files changed, 40 insertions(+), 7 deletions(-)
 
 diff --git a/src/ibtracert.c b/src/ibtracert.c index d32968b..326200b 100644
 --- a/src/ibtracert.c
 +++ b/src/ibtracert.c
 @@ -92,6 +92,7 @@ struct Switch {
   int mccap;
   int linearFDBtop;
   int fdb_base;
 + int enhsp0;
   int8_t fdb[64];
   char switchinfo[64];
  };
 @@ -114,6 +115,17 @@ struct Node {
  Node *nodesdist[MAXHOPS];
  uint64_t target_portguid;
 
 +static int is_route_inactive_port0(Node * node, Port * port, Switch *
 +sw) {

I would prefer this function be named something like:

port_inactive_not_bsp0

The current name is confusing as to what the logic is.

 + int res = 0;
 + /* not active for HCA and for enhanced port0 Switches */
 + if (port-state != 4 

Please use #defines here even though the original code did not.

 + (node-type != IB_NODE_SWITCH ||
 +  (node-type == IB_NODE_SWITCH  sw-enhsp0)))
 + res = 1;
 + return res;
 +}
 +
  static int get_node(Node * node, Port * port, ib_portid_t * portid)  {
   void *pi = port-portinfo, *ni = node-nodeinfo, *nd = node-nodedesc;
 @@ -164,6 +176,7 @@ static int switch_lookup(Switch * sw, ib_portid_t *
 portid, int lid)
 
   mad_decode_field(si, IB_SW_LINEAR_FDB_CAP_F, sw-linearcap);
   mad_decode_field(si, IB_SW_LINEAR_FDB_TOP_F, sw-
 linearFDBtop);
 + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, sw-enhsp0);
 
   if (lid = sw-linearcap  lid  sw-linearFDBtop)
   return -1;
 @@ -248,7 +261,7 @@ static int find_route(ib_portid_t * from, ib_portid_t *
 to, int dump)
   Port *port, fromport, toport, nextport;
   Switch sw;
   int maxhops = MAXHOPS;
 - int portnum, outport;
 + int portnum, outport = 255, next_sw_outport = 255;
 
   memset(fromnode,0,sizeof(Node));
   memset(tonode,0,sizeof(Node));
 @@ -274,20 +287,28 @@ static int find_route(ib_portid_t * from, ib_portid_t *
 to, int dump)
   portnum = port-portnum;
 
   dump_endnode(dump, From, node, port);
 + if (node-type == IB_NODE_SWITCH) {
 + next_sw_outport = switch_lookup(sw, from, to-lid);
 + if (next_sw_outport  0 || next_sw_outport  node-numports)
 {
 + /* needed to print the port in badtbl */
 + outport = next_sw_outport;
 + goto badtbl;
 + }
 + }
 
   while (maxhops--) {
 - if (port-state != 4)
 + /* checking if the port state isn't active.
 +  * The sw argument is only relevant when the port is a
 +  * switch port for HCAs this argument is ignored */

This comment should go above the function signature and be enhanced to clarify 
that it is checking for port state inactive except for BSP0 which has no state.


Ira


 + if (is_route_inactive_port0(node, port, sw))
   goto badport;
 
   if (sameport(port, toport))
   break;  /* found */
 
 - outport = portnum;
   if (node-type == IB_NODE_SWITCH) {
   DEBUG(switch node);
 - if ((outport = switch_lookup(sw, from, to-lid))  0 ||
 - outport  node-numports)
 - goto badtbl;
 + outport = next_sw_outport;
 
   if (extend_dpath(from-drpath, outport)  0)
   goto badpath;
 @@ -307,6 +328,7 @@ static int find_route(ib_portid_t * from, ib_portid_t *
 to, int dump)
  (node-type == IB_NODE_ROUTER)) {
   int ca_src = 0;
 
 + outport = portnum;
   DEBUG(ca or router node);
   if (!sameport(port, fromport)) {
   IBWARN
 @@ -335,8 +357,19 @@ static int find_route(ib_portid_t * from, ib_portid_t *
 to, int dump)
   nextport.portnum =
   from-drpath.p[from-drpath.cnt + 1];
   }
 + /* only if the next node is a switch, get switch info */
 + if (nextnode.type == IB_NODE_SWITCH

RE: [PATCH infiniband-diags] ibtracert.c: Remove checking the port 0 state for BasePort0 switches

2015-01-28 Thread Weiny, Ira
 
  +static int is_route_inactive_port0(Node * node, Port * port, Switch
  +*
  +sw) {
 
  I would prefer this function be named something like:
 
  port_inactive_not_bsp0
 
  The current name is confusing as to what the logic is.
 
 How about is_port_inactive ?

That is fine.

 
  +  int res = 0;
  +  /* not active for HCA and for enhanced port0 Switches */
  +  if (port-state != 4 
 
  Please use #defines here even though the original code did not.
 
 This is an issue throughout infiniband-diags. I think it should be addressed 
 by
 separate patch(es) for this.

I would rather see us do the right thing going forward.  This saves work trying 
to find all these majic numbers later.

Ira


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 16/20] IB/mad: Add Intel Omni-Path Architecture defines

2015-01-20 Thread Weiny, Ira
 
 On Mon, Jan 19, 2015 at 10:50:38PM +, Weiny, Ira wrote:
  
   On Mon, Jan 19, 2015 at 07:57:22PM +, Weiny, Ira wrote:
  
Perhaps you can give us an example of where the current code would
work without modifications if the IBTA were to define any new base
or smp class version?
  
   I think the point here is that every check for OPA_MIN_CLASS_VERSION
   must also be combined with a check if the device in question is OPA or 
   not.
 
  This contradicts what you say below
 
 How so?
 
 if (mad_reg_req-mgmt_class_version = OPA_MIN_CLASS_VERSION
  !port_priv-qp_info[qpn].supports_jumbo_mads) {
 

Sorry, I got a bit confused because this is not the latest version of the patch.

 Should be:
 
 if (port_is_opa  mad_reg_req-mgmt_class_version =
 OPA_MIN_CLASS_VERSION
  !port_priv-qp_info[qpn].supports_jumbo_mads) {
 
 And since port_is_opa = true implies
 port_priv-qp_info[qpn].supports_jumbo_mads = true the above if can never
 evaluate to true and can just be removed.

Right there is no need to check for supports_jumbo_mads.  The latest version 
was:

+   if (mad_reg_req-mgmt_class_version = OPA_MIN_CLASS_VERSION 
+   !(port_priv-device-attributes.device_cap_flags2  
IB_DEVICE_OPA_MAD_SUPPORT)) {



The bug in the patch is that any class version should be allowed on IB 
devices.  This check prevented class versions = 0x80 for IB devices.

NOTE however that the current code already limits users to a class version  8. 
 (due to the implementation.)

-#define MAX_MGMT_VERSION   8
+#define MAX_MGMT_VERSION   0x83
...
if (mad_reg_req-mgmt_class_version = MAX_MGMT_VERSION) { 
dev_notice(device-dev, 
   ib_register_mad_agent: invalid Class 
Version %u\n,
   mad_reg_req-mgmt_class_version);
goto error1;
}

Therefore I did not see this as limiting the IB implementation we already have.

I will remove the patch as I agree with Jason and Hal it is technically more 
correct.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 16/20] IB/mad: Add Intel Omni-Path Architecture defines

2015-01-19 Thread Weiny, Ira
 
 On Mon, Jan 19, 2015 at 07:57:22PM +, Weiny, Ira wrote:
 
  Perhaps you can give us an example of where the current code would
  work without modifications if the IBTA were to define any new base or
  smp class version?
 
 I think the point here is that every check for OPA_MIN_CLASS_VERSION must
 also be combined with a check if the device in question is OPA or not.

This contradicts what you say below

 
 I don't see a problem with sharing the kernel structs or MAD processing, as
 long as all OPA specific behavior is conditionalized on the device in question
 being OPA - it should never be conditionalized on the class version of the 
 MAD.
 
  commit a70a5af286b2985a0a95e9733d1e3166845a8be8
  Author: Ira Weiny ira.we...@intel.com
  Date:   Tue Sep 23 20:04:49 2014 -0400
 
  IB/mad: Add registration check for Intel Omni-Path Architecture
  MADs
 
 For instance:
 
 + if (mad_reg_req-mgmt_class_version =
 OPA_MIN_CLASS_VERSION
 +  !port_priv-qp_info[qpn].supports_jumbo_mads) {
 
 Is wrong because it means that no IB software can register for those MADs
 anymore, which is cross polluting the two architectures in the kernel.
 
 Presumably since supports_jumbo_mads is always true for OPA the above test
 is not required at all and this patch should be dropped.
 
 But I expect if we keep looking at uses of OPA_MIN_CLASS_VERSION we will
 see other cases where they need to be conditionalized?

That is the only use of OPA_MIN_CLASS_VERSION.  I agree with your second 
assessment and I'll remove the patch from the next series.

Hal was this your assessment as well?

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv2 infiniband-diags] vendstat.c: Add additional vendor IDs to ext_fw_info_device

2015-01-19 Thread Weiny, Ira


 -Original Message-
 From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
 Sent: Wednesday, January 14, 2015 6:07 AM
 To: Weiny, Ira
 Cc: linux-rdma (linux-rdma@vger.kernel.org)
 Subject: [PATCHv2 infiniband-diags] vendstat.c: Add additional vendor IDs to
 ext_fw_info_device
 
 Add additional supported device IDs to ext_fw_info_device.
 These IDs are ConnectX-4, ConnectX-IB, and Switch-IB.
 This affects is_ext_fw_info_supported which determine which Mellanox
 specific vendor MADs are issued to a device.
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks applied,
Ira


 ---
 Change since v1:
 Changed ConnectX-3 to Connect-IB in patch description
 
 diff --git a/src/vendstat.c b/src/vendstat.c index f28ff02..7fc4a11 100644
 --- a/src/vendstat.c
 +++ b/src/vendstat.c
 @@ -145,8 +145,9 @@ typedef struct {
  static uint16_t ext_fw_info_device[][2] = {
   {0x0245, 0x0245},   /* Switch-X */
   {0xc738, 0xc738},   /* Switch-X */
 + {0xcb84, 0xcb84},   /* Switch-IB */
   {0x01b3, 0x01b3},   /* IS-4 */
 - {0x1003, 0x1011},   /* Connect-X */
 + {0x1003, 0x1016},   /* Connect-X */
   {0x, 0x}};
 
  static int is_ext_fw_info_supported(uint16_t device_id) {
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 infiniband-diags] ibdiag_common.c: Add more supported device IDs in is_mlnx_ext_port_info_supported

2015-01-19 Thread Weiny, Ira


 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Hal Rosenstock
 Sent: Wednesday, January 14, 2015 6:07 AM
 To: Weiny, Ira
 Cc: linux-rdma (linux-rdma@vger.kernel.org); Or Gerlitz
 Subject: [PATCH v3 infiniband-diags] ibdiag_common.c: Add more supported
 device IDs in is_mlnx_ext_port_info_supported
 
 Add support for some ConnectX-IB, ConnectX-4, and Switch-IB in
 is_mlnx_ext_port_info_supported.
 
 This only affects tool(s) which invoke this routine
 (is_mlnx_ext_port_info_supported) which is currently just ibportstate.
 
 is_mlnx_ext_port_info_supported determines whether or not the MLNX vendor
 extended port info SMP is sent which is used to determine FDR10 speed.
 
 Not having the device in the list would only result in FDR10 being
 misinterpreted as QDR.
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks, applied,
Ira 


 ---
 Change since v2:
 Changed ConnectX-3 to ConnectX-IB in patch description
 
 Change since v1:
 Improved patch description
 
 diff --git a/src/ibdiag_common.c b/src/ibdiag_common.c index
 8c749c7..384d342 100644
 --- a/src/ibdiag_common.c
 +++ b/src/ibdiag_common.c
 @@ -499,9 +499,9 @@ conv_cnt_human_readable(uint64_t val64, float *val,
 int data)  int is_mlnx_ext_port_info_supported(uint32_t devid)  {
   if (ibd_ibnetdisc_flags  IBND_CONFIG_MLX_EPI) {
 - if (devid == 0xc738)
 + if (devid == 0xc738 || devid == 0xcb84)
   return 1;
 - if (devid = 0x1003  devid = 0x1011)
 + if (devid = 0x1003  devid = 0x1016)
   return 1;
   }
   return 0;
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv2 infiniband-diags] ibnetdisc.c: Add additional supported device IDs to is_mlnx_ext_port_info_supported

2015-01-19 Thread Weiny, Ira


 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Hal Rosenstock
 Sent: Wednesday, January 14, 2015 6:07 AM
 To: Weiny, Ira
 Cc: linux-rdma (linux-rdma@vger.kernel.org)
 Subject: [PATCHv2 infiniband-diags] ibnetdisc.c: Add additional supported
 device IDs to is_mlnx_ext_port_info_supported
 
 Similar to previous ibdiag_common.c patch, this adds additional device IDs to
 be supported to the ibnetdiscover internal is_mlnx_ext_port_info routine.
 These IDs are ConnectX-4, ConnectX-IB, and Switch-IB.
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks applied.
Ira



 ---
 Change from v1:
 Changed ConnectX-3 to ConnectX-IB in patch description
 
 diff --git a/libibnetdisc/src/ibnetdisc.c b/libibnetdisc/src/ibnetdisc.c index
 121fe35..f3c6000 100644
 --- a/libibnetdisc/src/ibnetdisc.c
 +++ b/libibnetdisc/src/ibnetdisc.c
 @@ -203,9 +203,9 @@ static int is_mlnx_ext_port_info_supported(ibnd_port_t
 * port)  {
   uint16_t devid = (uint16_t) mad_get_field(port-node-info, 0,
 IB_NODE_DEVID_F);
 
 - if (devid == 0xc738)
 + if (devid == 0xc738 || devid == 0xcb84)
   return 1;
 - if (devid = 0x1003  devid = 0x1011)
 + if (devid = 0x1003  devid = 0x1016)
   return 1;
   return 0;
  }
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH TRIVIAL libibmad] fields.c: Fix commentary typo

2015-01-19 Thread Weiny, Ira


 -Original Message-
 From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
 Sent: Monday, January 12, 2015 11:08 AM
 To: Weiny, Ira
 Cc: linux-rdma (linux-rdma@vger.kernel.org)
 Subject: [PATCH TRIVIAL libibmad] fields.c: Fix commentary typo
 
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks applied,
Ira
 

 ---
 diff --git a/src/fields.c b/src/fields.c index 33a6364..9965811 100644
 --- a/src/fields.c
 +++ b/src/fields.c
 @@ -835,7 +835,7 @@ static const ib_field_t ib_mad_f[] = {
   {BITSOFFS(16, 16), ThresholdEventCounter, mad_dump_uint},
   {BITSOFFS(32, 16), ThresholdCongestionEventMap, mad_dump_hex},
   /* XXX: Q3/2010 errata lists offset 48, but that means field is not
 -  * world aligned.  Assume will be aligned to offset 64 later.
 +  * word aligned.  Assume will be aligned to offset 64 later.
*/
   {BITSOFFS(64, 32), CurrentTimeStamp, mad_dump_uint},
   {0, 0}, /* IB_CC_CONGESTION_LOG_CA_LAST_F */
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 16/20] IB/mad: Add Intel Omni-Path Architecture defines

2015-01-19 Thread Weiny, Ira
 
  Currently, these patches are claiming values that are controlled by an
  established standards organization. It is the equivalent of just
  grabbing a currently unused Ethertype value for a new protocol and
  then claiming that whenever the IEEE allocates it for something else
  you will make changes to resolve the conflict.
 
 It would be equivalent if OPA was Infiniband, but it is not.  It is a separate
 architecture, with an entirely separate name space.  Conceptually, it is 
 similar
 to the IB GRH and the IPv6 header.  They have a similar structure, but they 
 are
 not the same.  The IBTA can define a new GRH field values, such as NxtHdr,
 without consulting or getting approval from the IETF.
 
 Ira could introduce patches to define OPA management packets, but the
 general structure of those packet headers would be exactly the same as IB
 management packets.  There's no need to introduce duplicate definitions into
 the code base.  The existing structures can be reused.  However, OPA makes
 use of different values for the fields.
 

Perhaps you can give us an example of where the current code would work without 
modifications if the IBTA were to define any new base or smp class version?

Would removing this patch satisfy your concerns?

commit a70a5af286b2985a0a95e9733d1e3166845a8be8
Author: Ira Weiny ira.we...@intel.com
Date:   Tue Sep 23 20:04:49 2014 -0400

IB/mad: Add registration check for Intel Omni-Path Architecture MADs

If the registration specifies an OPA MAD class version and the device does 
not
support OPA MADs, fail the MAD registration.

Signed-off-by: Ira Weiny ira.we...@intel.com


Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 16/20] IB/mad: Add Intel Omni-Path Architecture defines

2015-01-16 Thread Weiny, Ira
 On 1/15/2015 6:30 PM, Weiny, Ira wrote:
 
  On 1/12/2015 12:11 PM, ira.we...@intel.com wrote:
  From: Ira Weiny ira.we...@intel.com
 
  OPA_MIN_CLASS_VERSION -- OPA Class versions are  0x80
  OPA_SMP_CLASS_VERSION -- Defined at 0x80
 OPA_MGMT_BASE_VERSION --
  Defined at 0x80
 
  Increase max management version to accommodate OPA
 
  Allocation of MAD base and class version numbers is owned by the IBTA.
  It doesn't seem appropriate to arbitrarily claim code points without
  proper approval.
 
  OPA is its own architecture space.  While this space uses some of the
  same values as IB we are not claiming any IBTA values.
 
 You *are* claiming IBTA values. When the IBTA chooses to use those values,
 then there will be a conflict.

There is no conflict.
 
It is true that when the IBTA assigns meaning to those values the code may have 
to be changed to interpret this new meaning.  That has to happen regardless of 
these patches or their meaning on OPA devices.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 16/20] IB/mad: Add Intel Omni-Path Architecture defines

2015-01-15 Thread Weiny, Ira
 
 On 1/12/2015 12:11 PM, ira.we...@intel.com wrote:
  From: Ira Weiny ira.we...@intel.com
 
  OPA_MIN_CLASS_VERSION -- OPA Class versions are  0x80
  OPA_SMP_CLASS_VERSION -- Defined at 0x80 OPA_MGMT_BASE_VERSION --
  Defined at 0x80
 
  Increase max management version to accommodate OPA
 
 Allocation of MAD base and class version numbers is owned by the IBTA.
 It doesn't seem appropriate to arbitrarily claim code points without proper
 approval.

OPA is its own architecture space.  While this space uses some of the same 
values as IB we are not claiming any IBTA values.  This is similar to how IB 
GIDs and IPv6 addresses look the same but are in separate addressing domains.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 14/20] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

2015-01-14 Thread Weiny, Ira
 
 On 1/12/2015 7:11 PM, ira.we...@intel.com wrote:
 
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
   enum ib_signature_prot_cap {
   IB_PROT_T10DIF_TYPE_1 = 1,
   IB_PROT_T10DIF_TYPE_2 = 1  1,
 @@ -210,6 +214,7 @@ struct ib_device_attr {
   int sig_prot_cap;
   int sig_guard_cap;
   struct ib_odp_caps  odp_caps;
 + u64 device_cap_flags2;
 
 
 Just make the existing kernel size device_cap_flags field a u64, note it's not
 blankly copied to user space in uverbs as part of a chunk, so just go there 
 and
 copy the lower 32 bits.
 

It is an easy change but before changing this I'd like to hear what Roland or 
others think.

I prefer to define an additional field.  Eventually we are likely to have flags 
which need to be communicated to user space.  I think it will be cleaner at 
that time to report the entire new field rather than go through some sort of 
upper 32 bit device capability logic.

Thoughts?
Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 14/20] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

2015-01-14 Thread Weiny, Ira
 
 On 1/12/2015 7:11 PM, ira.we...@intel.com wrote:
  Add a device capability flag to flag OPA MAD support on devices.
 
 You should put few words here telling what is OPA/OPA MADs and why
 supporting them fits the IB core.
 
 See for example the IB core patch [1] that added signature verbs API is 
 states in
 a manner which is both generic across vendors and across different Mellanox
 HCAs. Also see [2] the IB core patch that added support for BMME API  as
 another example.

I'm not quite sure what should be covered in this message which is different 
from the cover letter or the subsequent patches.

Would this be more acceptable?

quote

IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

OPA MADs share a common header with IBTA MADs but with a different base version
and an extended length.  These jumbo MADs increase the performance of
management traffic.

Sharing a common header with IBTA MADs allows us to share most of the MAD
processing code when dealing with OPA MADs in addition to supporting some IBTA 
MADs on OPA devices.

Add a device capability flag to indicate OPA MAD support on the device.

/quote

Ira

 
 
 [1] 1b01d33 IB/core: Introduce signature verbs API [2] 00f7ec3 RDMA/core:
 Add memory management extensions support
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 02/20] IB/core: Cache device attributes for use by upper level drivers

2015-01-14 Thread Weiny, Ira
 
 On 1/12/2015 7:10 PM, ira.we...@intel.com wrote:
  From: Ira Weiny ira.we...@intel.com
 
 Please avoid empty change-logs, e.g one liner will perfectly do here.

Done.

  b/include/rdma/ib_verbs.h index 0d74f1d..86fc90f 100644
  --- a/include/rdma/ib_verbs.h
  +++ b/include/rdma/ib_verbs.h
  @@ -1675,6 +1675,7 @@ struct ib_device {
  u32  local_dma_lkey;
  u8   node_type;
  u8   phys_port_cnt;
  +   struct ib_device_attrattributes;
 I think cached_dev_attrs will be better name.

Done for the entire series.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 05/20] IB/mad: Change cast in rcv_has_same_class

2015-01-14 Thread Weiny, Ira
 
 On 1/12/2015 7:10 PM, ira.we...@intel.com wrote:
  From: Ira Weiny ira.we...@intel.com
 
  Save dereference and clarifies that rcv_has_same_class can process
  both IB and OPA MADs.
 
 I don't see any clarification below... something missing here?

Nothing missing.  I was trying to indicate that this change simply indicates 
that this function can operate on any buffer which has an ib_mad_hdr. 

Ira

 
 
  Signed-off-by: Ira Weiny ira.we...@intel.com
  ---
drivers/infiniband/core/mad.c |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
 
  diff --git a/drivers/infiniband/core/mad.c
  b/drivers/infiniband/core/mad.c index 66b3940..819b794 100644
  --- a/drivers/infiniband/core/mad.c
  +++ b/drivers/infiniband/core/mad.c
  @@ -1750,7 +1750,7 @@ static int is_rmpp_data_mad(struct
 ib_mad_agent_private *mad_agent_priv,
static inline int rcv_has_same_class(struct ib_mad_send_wr_private *wr,
   struct ib_mad_recv_wc *rwc)
{
  -   return ((struct ib_mad *)(wr-send_buf.mad))-mad_hdr.mgmt_class
 ==
  +   return ((struct ib_mad_hdr *)(wr-send_buf.mad))-mgmt_class ==
  rwc-recv_buf.mad-mad_hdr.mgmt_class;
}
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 06/20] IB/core: Add mad_size to ib_device_attr

2015-01-14 Thread Weiny, Ira
 
 On 1/12/2015 7:10 PM, ira.we...@intel.com wrote:
  Change all IB drivers to report the IB management size.
 
 Do you mean the maximal size of IB MADs they support? if this is the case,
 please reflect it in the field name.

Done.  Changed to max_mad_size.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 15/20] IB/mad: Create jumbo_mad data structures

2015-01-14 Thread Weiny, Ira
 
 On 1/12/2015 7:11 PM, ira.we...@intel.com wrote:
  Define jumbo_mad and jumbo_rmpp_mad
 
 For the sake of review and maintenance, please add few more words here on
 what are these creatures...

Proposed new text:

Define jumbo_mad and jumbo_rmpp_mad.

Jumbo MAD structures are 2K versions of ib_mad and ib_rmpp_mad structures.
Currently only OPA base version MADs are of this type.

Sound reasonable?

Ira 

 
  Create an RMPP Base header to share between ib_rmpp_mad and
  jumbo_rmpp_mad

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH infiniband-diags] ibccconfig.c: Account for enhanced switch port 0 in SwitchPortCongestionSetting handling

2015-01-09 Thread Weiny, Ira
 
 Enhanced switch port 0 can support CC (indicated in CC
 ClassPortInfo:CapabilityMask.EnhancedPort0CC).
 
 portnum 0 is a valid port number and offsets should be calculated based on
 port 0 rather than port 1.
 
 Also, fix some error messages to indicate switch port congestion setting 
 rather
 then switch congestion setting failure.
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks applied.
Ira

 ---
 diff --git a/src/ibccconfig.c b/src/ibccconfig.c index 11dfc07..22b16db 100644
 --- a/src/ibccconfig.c
 +++ b/src/ibccconfig.c
 @@ -371,9 +371,6 @@ static char
 *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int
   if ((errstr = parseint(argv[5], cong_parm_marking_rate, 0)))
   return errstr;
 
 - if (!portnum)
 - return invalid port number specified;
 -
   /* Figure out number of ports first */
   if (!smp_query_via(data, dest, IB_ATTR_NODE_INFO, 0, 0, srcport))
   return node info config failed;
 @@ -389,10 +386,10 @@ static char
 *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int
 
   /* We are modifying only 1 port, so get the current config */
   if (!cc_query_status_via(payload, dest,
 IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING,
 -  (portnum - 1) / 32, 0, NULL, srcport, cckey))
 - return switch congestion setting query failed;
 +  portnum / 32, 0, NULL, srcport, cckey))
 + return switch port congestion setting query failed;
 
 - ptr = payload + (((portnum % 32 - 1) * 4));
 + ptr = payload + (((portnum % 32) * 4));
 
   mad_encode_field(ptr,
 
 IB_CC_SWITCH_PORT_CONGESTION_SETTING_ELEMENT_VALID_F,
 @@ -415,8 +412,8 @@ static char
 *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int
cong_parm_marking_rate);
 
   if (!cc_config_status_via(payload, rcv, dest,
 IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING,
 -   (portnum - 1) / 32, 0, NULL, srcport, cckey))
 - return switch congestion setting config failed;
 +   portnum / 32, 0, NULL, srcport, cckey))
 + return switch port congestion setting config failed;
 
   return NULL;
  }
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH infiniband-diags] ibccquery.c: Account for enhanced switch port 0 in SwitchPortCongestionSetting handling

2015-01-09 Thread Weiny, Ira
 
 Enhanced switch port 0 can support CC (indicated in CC
 ClassPortInfo:CapabilityMask.EnhancedPort0CC).
 
 portnum 0 is a valid port number and offsets should be calculated based on
 port 0 rather than port 1.
 
 Also, fix some error messages to indicate switch port congestion setting 
 rather
 then switch congestion setting failure.
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks applied.
Ira

 ---
 diff --git a/src/ibccquery.c b/src/ibccquery.c index ac8bca2..39e45b7 100644
 --- a/src/ibccquery.c
 +++ b/src/ibccquery.c
 @@ -224,11 +224,11 @@ static char
 *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int
 
   memset(data, '\0', sizeof data);
   if (!cc_query_status_via(data, dest,
 IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING,
 -  (portnum - 1) / 32, 0, NULL, srcport,
 cckey))
 - return switch congestion setting query failed;
 +  portnum / 32, 0, NULL, srcport,
 cckey))
 + return switch port congestion setting query failed;
 
   mad_dump_cc_switchportcongestionsettingelement(buf, sizeof
 buf,
 -data + (((portnum
 % 32) - 1) * 4),
 +data + ((portnum
 % 32) * 4),
  4);
   printf(%s, buf);
   return NULL;
 @@ -236,16 +236,16 @@ static char
 *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int
 
   /* else get all port info */
 
 - maxblocks = ((numports - 1) / 32) + 1;
 + maxblocks = numports / 32 + 1;
 
   for (i = 0; i  maxblocks; i++) {
   memset(data, '\0', sizeof data);
   if (!cc_query_status_via(data, dest,
 IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING,
i, 0, NULL, srcport, cckey))
 - return switch congestion setting query failed;
 + return switch port congestion setting query failed;
 
 - for (j = 0; j  32  outputcount  numports; j++) {
 - printf(Port:%u\n, i * 32 
 + j + 1);
 + for (j = 0; j  32  outputcount = numports; j++) {
 + printf(Port:%u\n, i * 32 
 + j);
 
   mad_dump_cc_switchportcongestionsettingelement(buf, sizeof buf,
  data + j 
 *
 4,
  4);
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] libibmad/src/rpc.c: Remove superfluous packet dump

2015-01-09 Thread Weiny, Ira
Any reason this should not be removed from mad_rpc_rmpp as well?

Ira


 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Albert Chu
 Sent: Friday, December 19, 2014 11:08 AM
 To: linux-rdma@vger.kernel.org
 Subject: [PATCH] libibmad/src/rpc.c: Remove superfluous packet dump
 
 When idebug == 1, mad_rpc outputs only the mad data response in debug
 output.  It does not output the request mad data.  This is confusing since 
 it's not
 clear if it's request or response output.
 
 When idebug  1, full mad send and receive buffers are output, which makes
 the mad data output with idebug == 1 superfluous.  Remove it to remove
 confusion.
 
 Signed-off-by: Albert Chu ch...@llnl.gov
 ---
  src/rpc.c |5 -
  1 files changed, 0 insertions(+), 5 deletions(-)
 
 diff --git a/src/rpc.c b/src/rpc.c
 index 8d961f2..202fac5 100644
 --- a/src/rpc.c
 +++ b/src/rpc.c
 @@ -272,11 +272,6 @@ void *mad_rpc(const struct ibmad_port *port,
 ib_rpc_t * rpc,
   return NULL;
   }
 
 - if (ibdebug) {
 - IBWARN(data offs %d sz %d, rpc-dataoffs, rpc-datasz);
 - xdump(stderr, mad data\n, mad + rpc-dataoffs, rpc-
 datasz);
 - }
 -
   if (rcvdata)
   memcpy(rcvdata, mad + rpc-dataoffs, rpc-datasz);
 
 --
 1.7.1
 
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv2 libibmad] fields.c: Print SwitchCongestionSetting threshold fields in hex rather than decimal

2015-01-09 Thread Weiny, Ira
 for consistency with CC Annex which shows meaning of threshold fields in hex
 not decimal
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks applied.

Ira


 ---
 Change since v1:
 Change fields to hex rather than SwitchPortCongestionSetting:Threshold to be
 consistent with CC Annex
 
 diff --git a/src/fields.c b/src/fields.c index 33a6364..9965811 100644
 --- a/src/fields.c
 +++ b/src/fields.c
 @@ -858,9 +858,9 @@ static const ib_field_t ib_mad_f[] = {
   {0, 32, Control_Map, mad_dump_hex},
   {32, 256, Victim_Mask, mad_dump_array},
   {288, 256, Credit_Mask, mad_dump_array},
 - {BITSOFFS(544, 4), Threshold, mad_dump_uint},
 + {BITSOFFS(544, 4), Threshold, mad_dump_hex},
   {BITSOFFS(552, 8), Packet_Size, mad_dump_uint},
 - {BITSOFFS(560, 4), CS_Threshold, mad_dump_uint},
 + {BITSOFFS(560, 4), CS_Threshold, mad_dump_hex},
   {BITSOFFS(576, 16), CS_ReturnDelay, mad_dump_hex}, /* TODO: CCT
 dump */
   {BITSOFFS(592, 16), Marking_Rate, mad_dump_uint},
   {0, 0}, /*
 IB_CC_SWITCH_CONGESTION_SETTING_LAST_F */
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH infiniband-diags] ibccconfig.c: Account for enhanced switch port 0 in SwitchPortCongestionSetting handling

2015-01-09 Thread Weiny, Ira
[off-list]

This replaces the first one?

 -Original Message-
 From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
 Sent: Friday, January 09, 2015 7:18 AM
 To: Weiny, Ira
 Cc: linux-rdma (linux-rdma@vger.kernel.org); Chu, Al
 Subject: [PATCH infiniband-diags] ibccconfig.c: Account for enhanced switch
 port 0 in SwitchPortCongestionSetting handling
 
 Enhanced switch port 0 can support CC (indicated in CC
 ClassPortInfo:CapabilityMask.EnhancedPort0CC).
 
 portnum 0 is a valid port number and offsets should be calculated based on
 port 0 rather than port 1.
 
 Also, fix some error messages to indicate switch port congestion setting 
 rather
 then switch congestion setting failure.
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com
 ---
 diff --git a/src/ibccconfig.c b/src/ibccconfig.c index 11dfc07..22b16db 100644
 --- a/src/ibccconfig.c
 +++ b/src/ibccconfig.c
 @@ -371,9 +371,6 @@ static char
 *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int
   if ((errstr = parseint(argv[5], cong_parm_marking_rate, 0)))
   return errstr;
 
 - if (!portnum)
 - return invalid port number specified;
 -
   /* Figure out number of ports first */
   if (!smp_query_via(data, dest, IB_ATTR_NODE_INFO, 0, 0, srcport))
   return node info config failed;
 @@ -389,10 +386,10 @@ static char
 *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int
 
   /* We are modifying only 1 port, so get the current config */
   if (!cc_query_status_via(payload, dest,
 IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING,
 -  (portnum - 1) / 32, 0, NULL, srcport, cckey))
 - return switch congestion setting query failed;
 +  portnum / 32, 0, NULL, srcport, cckey))
 + return switch port congestion setting query failed;
 
 - ptr = payload + (((portnum % 32 - 1) * 4));
 + ptr = payload + (((portnum % 32) * 4));
 
   mad_encode_field(ptr,
 
 IB_CC_SWITCH_PORT_CONGESTION_SETTING_ELEMENT_VALID_F,
 @@ -415,8 +412,8 @@ static char
 *switch_port_congestion_setting(ib_portid_t * dest, char **argv, int
cong_parm_marking_rate);
 
   if (!cc_config_status_via(payload, rcv, dest,
 IB_CC_ATTR_SWITCH_PORT_CONGESTION_SETTING,
 -   (portnum - 1) / 32, 0, NULL, srcport, cckey))
 - return switch congestion setting config failed;
 +   portnum / 32, 0, NULL, srcport, cckey))
 + return switch port congestion setting config failed;
 
   return NULL;
  }
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device

2015-01-07 Thread Weiny, Ira
 
 On 11/27/2014 3:51 PM, Sagi Grimberg wrote:
  We're already short on bits in device_cap_flags
 
 no shortage @ the kernel... we can add more 32 bits if/when we need it

Why is there a gap in the bits?

enum ib_device_cap_flags {
...
IB_DEVICE_MEM_WINDOW_TYPE_2B= (124), 
IB_DEVICE_MANAGED_FLOW_STEERING = (129),
...
};

25-28 are not used?  Is there some legacy software out there which uses these?

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device

2014-12-09 Thread Weiny, Ira
 
 On Mon, Dec 8, 2014 at 2:23 AM, Weiny, Ira ira.we...@intel.com wrote:
 
  I find it very annoying that upper level drivers replicate in
  different ways elements from the IB device attributes returned by
  ib_query_device. I met that in multiple drivers and upcoming designs
  for which I do code review. Are you up to come up with a patch that
  caches the device attributes on the device structure?
 
  I don't follow what you are asking for.  Could you give more details?
 
 1. add a struct ib_device_attr field to struct ib_device
 
 2. when the device registers itself with the IB core, go and run the
 query_device verb with the param being pointer to that field

I see where you are going.  Then the MAD stack does not have to cache a 
max_mad_size value but rather looks in the ib_device structure on the fly...

So, something like the diff below?  What are the chances we end up with 
attributes which are not constant?

If Roland would like to go this way I can rework my series based on the 
attributes being cached.

-- Ira


17:15:59  git di
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece..db18795 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -322,6 +322,8 @@ int ib_register_device(struct ib_device *device,
client-add(device);
}
 
+   device-query_device(device, device-attributes);
+
  out:
mutex_unlock(device_mutex);
return ret;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 470a011..241a882 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1630,6 +1630,7 @@ struct ib_device {
u32  local_dma_lkey;
u8   node_type;
u8   phys_port_cnt;
+   struct ib_device_attrattributes;
 };
 
 struct ib_client {


RE: [RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device

2014-12-09 Thread Weiny, Ira
 
 On Sun, Dec 7, 2014 at 4:23 PM, Weiny, Ira ira.we...@intel.com wrote:
  I don't think this is a bad idea, however, the complication is determining 
  the
 kmem_cache object size in that case.  How about we limit this value to  2K
 until such time as there is a need for larger support?
 
 Can we just get rid of all the kmem_caches used in the MAD code?  I don't
 think any of this is so performance-critical that we couldn't just use 
 kmalloc...
 

On an SM node the MAD code is _very_ performance-critical.  I don’t think we 
should give that up.

Another alternative is to have a cache per device based on the size reported.

-- Ira

N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

RE: [RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device

2014-12-07 Thread Weiny, Ira
 
 On 11/13/2014 9:54 PM, ira.we...@intel.com wrote:
 
  Add a device capability flag for OPA devices to signal their support of 
  jumbo
  MADs.
 
  Check for IB_DEVICE_JUMBO_MAD_SUPPORT in the device capabilities and
  if supported mark the special QPs created.
 
 Ira,
 
 Few comments here, the device capability is OK, specifically if it helps for 
 the
 mad layer logic to be simpler and/or more robust.
 
 You should add device attribute telling what is the size of MADs supported by
 the device, I think the IBTA mandated 256Band in the OPA case, the driver will
 fill there 2k.

I don't think this is a bad idea, however, the complication is determining the 
kmem_cache object size in that case.  How about we limit this value to  2K 
until such time as there is a need for larger support?

 
 You can't just state that (jumbo == 2k) and go to complete design/changes
 which use this hard-coded assumption. You need to see how to re-spin this
 series w.o this assumption.

I'm looking into it.  I believe I will still need to have a capability bit, but 
it may end up having a different meaning.

 
 I find it very annoying that upper level drivers replicate in different ways
 elements from the IB device attributes returned by ib_query_device. I met that
 in multiple drivers and upcoming designs for which I do code review. Are you
 up to come up with a patch that caches the device attributes on the device
 structure?

I don't follow what you are asking for.  Could you give more details?

 if not,
 I can do that.. and have your code to see it.
 
 Also, I would go and unify (squash) this patch and the preceding one.
 

I'll keep this in mind as I rework the patches.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 07/16] ib/mad: create a jumbo MAD kmem_cache

2014-12-05 Thread Weiny, Ira
 
 On 11/13/2014 9:54 PM, ira.we...@intel.com wrote:
  @@ -773,7 +782,12 @@ static int handle_outgoing_dr_smp(struct
 ib_mad_agent_private *mad_agent_priv,
  }
  local-mad_priv = NULL;
  local-recv_mad_agent = NULL;
  -   mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_ATOMIC);
  +
  +   if (mad_agent_priv-qp_info-supports_jumbo_mads)
  +   mad_priv = kmem_cache_alloc(jumbo_mad_cache,
 GFP_ATOMIC);
  +   else
  +   mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_ATOMIC);
  +
 @ minimum (if you really think that one kmem cache for both jumbo and non-
 jumbo mads isn't the way to get) lets have one pointer that is directed to the
 cache you want to use and this way all branch as the above one and it's such
 can be avoided, right?

That is a good idea, however, I'm going to address your other comments before 
changing anything here.

-- Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 00/16] ib_mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing.

2014-12-01 Thread Weiny, Ira
 
 [RFC PATCH 07/16] ib/mad: create a jumbo MAD kmem_cache
  why not use a single kmem-cache instance with a non hard coded
  element size, 256B (or whatever we use today) or 2KB?
  I wanted to be able to adjust the element count of the caches separately to
 better tune overall memory usage.  However, I stopped short of adding
 additional module parameters to adjust the 2K cache at this time.
 
 
 I tend to think that the resulted code is too much of a special purpose one
 under a  (jumbo == 2K) assumption. See some more comments in the individual
 patches and we'll take it from there.
 

Ok, I'll address those comments in the other email threads.

 
 
 
  Also (nit), please change the prefix for all patches to be IB/mad:
  and not
  ib/mad: to comply with the existing habit of patch titles for the IB
  subsystem
  I will thanks.
 
 Good. See below another easy-to-fix nitpicking comment, but before that, for
 the sake of easier review and post-robustness of the code to future 
 bisections,
 please do a re-ordering of the series such that all general refactoring and 
 pre-
 patches come before the OPApatches.
 
 This goes to re-order the current series such tat patches 8/9/10 are located
 after patch 14, as listed here:
 
[RFC PATCH 01/16] ib/mad: rename is_data_mad to is_rmpp_data_mad
[RFC PATCH 02/16] ib/core: add IB_DEVICE_JUMBO_MAD_SUPPORT device
 cap
[RFC PATCH 03/16] ib/mad: Add check for jumbo MADs support on a device
[RFC PATCH 04/16] ib/mad: add base version parameter to
[RFC PATCH 05/16] ib/mad: Add MAD size parameters to process_mad
[RFC PATCH 06/16] ib/mad: Create jumbo_mad data structures
[RFC PATCH 07/16] ib/mad: create a jumbo MAD kmem_cache
[RFC PATCH 11/16] ib/mad: create helper function for
[RFC PATCH 12/16] ib/mad: create helper function for
[RFC PATCH 13/16] ib/mad: create helper function for
[RFC PATCH 14/16] ib/mad: Create helper function for SMI processing
[RFC PATCH 08/16] ib/mad: Add Intel Omni-Path Architecture defines
[RFC PATCH 09/16] ib/mad: Implement support for Intel Omni-Path
[RFC PATCH 10/16] ib/mad: Add registration check for Intel Omni-Path
[RFC PATCH 15/16] ib/mad: Implement Intel Omni-Path Architecture SMP
[RFC PATCH 16/16] ib/mad: Implement Intel Omni-Path Architecture General
 

Done.

 Another easy-to-fix nitpicking comment would be to have all the patches be
 consistent w.r.t to the capitalization of the 1st letter in the 1st word 
 after the
 IB/core: or IB/mad:  prefix, e.g
 
 ib/mad: create helper function for smi_handle_dr_smp_send
 
 becomes
 
 IB/mad: Create helper function for smi_handle_dr_smp_send

Done.

 
 BTW, here my personal preference is Add helper and not Create helper
 
 IB/mad: Add helper function for smi_handle_dr_smp_send

Done.

Ira

N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

RE: [RFC PATCH 00/16] ib_mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing.

2014-11-25 Thread Weiny, Ira
 
 On Thu, Nov 13, 2014 at 9:54 PM,  ira.we...@intel.com wrote:
  The following patch series modifies the kernel MAD processing
  (ib_mad/ib_umad) and related interfaces to process Intel Omni-Path
  Architecture MADs on devices which support them.
 
 So this series allows to process such mad when it arrives or goes beyond that
 and allows to send such mad too?

Both send and receive is supported.  My apologies for not being clear.

 
  In addition to supporting some IBTA management classes, OPA devices
  use MADs with lengths up to 2K.  These jumbo MADs increase the
  performance of management traffic.
 
 Can you provide 1-2 use cases where such mads will be sent and by what
 entity? I recall 2KB mads were mentioned over our LWG talks 8 years ago on IB
 routers...

The Intel Omni-Path driver and SM will be the entities using these MADs.  The 
patch series is written such that other devices could use jumbo MAD's but there 
is no attempt to predict how other technologies would do so.

[snip]

[RFC PATCH 07/16] ib/mad: create a jumbo MAD kmem_cache
 
 why not use a single kmem-cache instance with a non hard coded element size,
 256B (or whatever we use today) or 2KB?

I wanted to be able to adjust the element count of the caches separately to 
better tune overall memory usage.  However, I stopped short of adding 
additional module parameters to adjust the 2K cache at this time.

 
 Also (nit), please change the prefix for all patches to be IB/mad: and not
 ib/mad: to comply with the existing habit of patch titles for the IB subsystem

I will thanks.

 
 And (another nit), generate patch 0/N using
 
 $ git format-patch --cover-letter
 
 so we have the exact subject line for each patch and the over-all diffstat in 
 the
 cover-letter

I will.  My reason for not doing this previously was that I run the git 
send-email command on generated patches from a different machine than the one 
on which my repo is located due to Intel's firewall setup.  I will clone the 
repo there and do this in the future.

-- Ira



RE: [PATCHv3] libibmad: Add new fields for SM:PortInfoExtended and for PM:PortExtendedSpeedsCounters

2014-11-18 Thread Weiny, Ira


 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of da...@dev.mellanox.co.il
 Sent: Wednesday, November 12, 2014 5:11 AM
 To: Weiny, Ira
 Cc: da...@mellanox.com; linux-rdma@vger.kernel.org
 Subject: [PATCHv3] libibmad: Add new fields for SM:PortInfoExtended and for
 PM:PortExtendedSpeedsCounters
 
 This change includes:
 1. Adding new AttributeID and fields for MAD SM:PortInfoExtended.
 2. Adding new fields to support PM:PortExtendedSpeedsCounters when RSFEC
 mode is active.
 
 Signed-off-by: Dan Ben Yosef da...@mellanox.com

Thanks applied,
Ira




--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH infiniband-diags] New MAD SM:PortInfoExtended and changing PM:PortExtendedSpeedsCounters

2014-11-18 Thread Weiny, Ira
 -Original Message-
 From: Dan Ben Yosef [mailto:da...@dev.mellanox.co.il]
 Sent: Monday, November 17, 2014 6:57 AM
 To: Weiny, Ira
 Cc: da...@mellanox.com; linux-rdma@vger.kernel.org
 Subject: [PATCH infiniband-diags] New MAD SM:PortInfoExtended and changing
 PM:PortExtendedSpeedsCounters
 
 This change needed for EDR support.
 This change includes:
 1. Adding new MAD option to smpquery: PortInfoExtended (PIE).
 2. Changing perfquery to support PortExtendedSpeedsCounters with RSFEC
 mode active.
 
 Signed-off-by: Dan Ben Yosef da...@mellanox.com

Applied with the following change:

 -static void extended_speeds_query(ib_portid_t * portid, int port, uint64_t
 ext_mask)
 +static uint8_t is_fec_mode_active(ib_portid_t * portid, int port,
 +   uint16_t cap_mask)


Changed to is_rsfec_mode_active...

Thanks,
Ira



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] infiniband-diags: add rdma-ndd daemon

2014-11-10 Thread Weiny, Ira
 
 On Mon, Nov 10, 2014 at 12:28:27AM +, Weiny, Ira wrote:
  
   Sadly, I think the proper way to address this is the same way netdev
   addresses it - do not activate the interface on module load, wait
   for an explicit enablement so userspace can configure before it tries to 
   link
 up.
  
   (Not to say that is even possible considering RDMA's history, but
   still..)
 
  I don't think this is really possible.  When would you consider the
  interface to be up?  Even before the port goes active the SM and
  other diags can query this value.  This is why I propose that the
  default of the drivers, without user space involvement, should change.
  The user space daemon is only trying to react to other user space
  activity.
 
 Same as net dev, after the module load the physical layer is disabled. Nothing
 can see the NodeDescription because the link is kept down.
 
 The everything gets configured, then the physical layer is allowed to come up.
 
 I don't know if this is practical, but it is the only race free way to 
 properly
 address all of this.

I don't think it is practical at all.

Assuming that we can hold the link from transitioning to Init how does any 
userspace software know if the current hostname is valid?  If, at the time of 
module load, the hostname is localhost you end up with the same problem as 
the proposed kernel patches.

In fact the proposed kernel patches actually aid in the setting of alternate 
policies in that a %h provides some dynamic lookup.  (See below)

 
   TBH, I'd rather see just this daemon and drop the kernel side... If
   the daemon is started before the modules are loaded then it might be
   able to set the description while the port is still down.
 
  The problem is this sequence.
 
  1) rdma-ndd daemon started (hostname == localhost, no rdma devices
  set)
  2) dhcp started (or other hostname change, rdma-ndd triggered but no
  rdma devices set; not loaded yet)
  3) rdma devices loaded, default node_desc
 
 4) daemon detects new rdma device instantly and programs the admin
desired node name in the 100ms before the physical link reaches
up. Since the link is down when the change occurs no traps are scheduled.
Or in the 10% case where the race is lost you get a trap. Oh well.

I'm not against adding this to rdma-ndd but I would like it to be optional.  
With the kernel patches rdma-ndd itself is optional.

 
  What I have proposed will work no matter what order things are
  started/loaded.
 
 Well, not really - it works in the sense that the kernel default node name 
 works
 properly, but it doesn't allow the admin to set a custom name without hitting
 all these same issues (well, except perhaps by module parameter..).

I think changing the default is a worthwhile change.  In addition, alternate 
admin policies are aided by the general use of the %h specifier.

1) SM's which periodically scan the Node Description always get up to 
date hostname info.
2) Up to date hostname info is provided even if a user space daemons 
fails.
Note: OpenSM has an update node description feature for this 
condition.
3) Low level diag tools always get up to date hostname info

Do you believe that hostname device is an unreasonable default?

Roland, Hal, do you have any input?

 
  Finally, I contemplated the alternative of having the rdma-ndd daemon
  detect new rdma devices.
 
 Well, without that all it does is link two parts of the kernel together 
 through
 user space with no way toset policy in userspace (policy in this case is the 
 node
 description string). Seems very strange.

Yes the daemon is somewhat strange and I envision it being updated for other 
policies in the future which may include some of the functionality you suggest. 
 But it seemed appropriate to add it now as it enhances the SM knowing about 
hostname changes.

-- Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv2] libibmad: Add new fields for SM:PortInfoExtended and for PM:PortExtendedSpeedsCounters

2014-11-09 Thread Weiny, Ira
 On 11/8/2014 9:29 AM, Weiny, Ira wrote:
  --- a/src/fields.c
   +++ b/src/fields.c
   @@ -949,8 +949,42 @@ static const ib_field_t ib_mad_f[] = {
{480, 32, Counter14, mad_dump_uint},
{0, 0}, /* IB_PSR_LAST_F */
  
   -{0, 0}  /* IB_FIELD_LAST_ */
   +/*
   + * PortInfoExtended fields
   + */
   +{0, 32, CapMask, mad_dump_hex},
  I'm not quite sure why the spec was written this way but technically CapMask
 is only 1 bit???
 
 PortInfoExtended:CapMask is 32 bit field. Current 1.3 draft is misformatted.

Thanks...  That makes more sense.  Sorry I only just got access to the new 1.3 
draft.

-- Ira

 The two rows are supposed to be combined into 1 and things are misaligned.
 IsFECModeSupported is 1 bit (first bit out of those 32 bits).
 
  Therefore I'm not sure we can define this field as above.  Can we get
  some assurances from the management WG that the 31 bit reserved field
 following the CapMask are only going to extend the CapMask?
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] infiniband-diags: add rdma-ndd daemon

2014-11-09 Thread Weiny, Ira
 
  rdma-ndd is a system daemon which watches the procfs hostname file for
 updates.
 
  Upon detecting an update it will update the Node Descriptions of the
  RDMA devices in the system.
 
 What is the effect of this on RoCE and IWARP adapters ? While they populate
 node_desc in sysfs, I don't think they support IB_DEVICE_MODIFY_NODE_DESC
 for ib_modify_device.

No they don't support it.

 
 Shouldn't such devices be skipped in terms of setting NodeDescription ?

AFAIK all of the current user space software which write to the node_desc sysfs 
simply ignore any errors.  Those devices which do not support ib_modify_device 
return EIO from sysfs.  This is the same mechanism I used and should be safe.

If we want to go down this path[*] I think the core sysfs code should be 
changed such that the node_desc is either A) not available on those devices or 
B) not writable if ib_modify_device is not supported.

-- Ira

[*] Changing this sysfs behavior was not the intent of this patch series and I 
think should be dealt with separately.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 1/6] ib/mad: Add function to support format specifiers for node description

2014-11-09 Thread Weiny, Ira
 
 On 11/6/2014 5:06 PM, ira.we...@intel.com wrote:
  From: Ira Weiny ira.we...@intel.com
 
  ib_build_node_desc - prints src node description into dest while
  mapping format specifiers
 
  Specifiers supported:
  %h system hostname
  %d device name
 
  Define a default Node Description format to be %h %d
 
 There are a lot of Node Description schemes in use in the field currently. New
 format needs to be made as unlikely as possible to interfere with those. So
 perhaps H:%h D:%d ?

What I am proposing was generally agreed upon to be a reasonable default in 
this thread:

http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg07295.html

Furthermore RHEL 6.4 has started to use this naming scheme in their rdma start 
up script as was proposed and agreed upon by Doug Ledford in this thread:

https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg11282.html

 
 Also, in order to support such installations, whether or not to use this 
 format
 should be configurable.

It is configurable.  You can overwrite the node_desc field anyway you want just 
as before and no current user space software needs to change if a user has 
customizations there.

What is nice about this addition is that _if_ the user does _not_ have 
customizations the defaults are, IMHO, much more usable by default.

-- Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] infiniband-diags: add rdma-ndd daemon

2014-11-09 Thread Weiny, Ira
 
 Sadly, I think the proper way to address this is the same way netdev addresses
 it - do not activate the interface on module load, wait for an explicit
 enablement so userspace can configure before it tries to link up.
 
 (Not to say that is even possible considering RDMA's history, but still..)

I don't think this is really possible.  When would you consider the interface 
to be up?  Even before the port goes active the SM and other diags can query 
this value.  This is why I propose that the default of the drivers, without 
user space involvement, should change.  The user space daemon is only trying to 
react to other user space activity.

 
 TBH, I'd rather see just this daemon and drop the kernel side... If the 
 daemon is
 started before the modules are loaded then it might be able to set the
 description while the port is still down.

The problem is this sequence.

1) rdma-ndd daemon started (hostname == localhost, no rdma devices set)
2) dhcp started (or other hostname change, rdma-ndd triggered but no rdma 
devices set; not loaded yet)
3) rdma devices loaded, default node_desc

What I have proposed will work no matter what order things are started/loaded.  
Furthermore, it avoids races between the start up scripts and the SM because 
the current hostname is always mapped at the time the query is being answered.  
With parallel start up systems like systemd you could have the hostname set by 
DHCP while some of your rdma devices are loaded and others are not.  Those 
which were loaded before the hostname change will get set by rdma-ndd and 
trigger a trap.  The others will simply pick up the correct hostname in the 
initial SM query.

Finally, I contemplated the alternative of having the rdma-ndd daemon detect 
new rdma devices.  I don't like this solution because it results in additional 
unnecessary MAD traffic for those devices.  (ie. initial ND query, followed by 
a trap/trap repress, and the final correct ND query.)  With my proposed 
solution any devices loaded after the hostname change (or in the case of a 
static hostname) will be properly queried with one ND query.

-- Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv2] libibmad: Add new fields for SM:PortInfoExtended and for PM:PortExtendedSpeedsCounters

2014-11-08 Thread Weiny, Ira
 +
 + /*
 +  * PortExtendedSpeedsCounters RSFEC active fields
 +  */

Use  IB_PESC_RSFEC_FIRST_F and then set IB_PESC_RSFEC_PORT_SELECT_F to that.

 + IB_PESC_RSFEC_PORT_SELECT_F,
 + IB_PESC_RSFEC_COUNTER_SELECT_F,
 + IB_PESC_RSFEC_SYNC_HDR_ERR_CTR_F,
 + IB_PESC_RSFEC_UNK_BLOCK_CTR_F,

What about ErrorDetectionCounterLaneX?

 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE0_F,


I'm not quite sure of the name here because this can be either Symbol or Block 
counter...

What if we just drop the SYMBOL and leave it as:

IB_PESC_RSFEC_FEC_CORR_CTR_LANE(X)_F

 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE1_F,
 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE2_F,
 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE3_F,
 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE4_F,
 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE5_F,
 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE6_F,
 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE7_F,
 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE8_F,
 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE9_F,
 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE10_F,
 + IB_PESC_RSFEC_FEC_CORR_SYMBOL_CTR_LANE11_F,

What about FECUncorrectable*CounterLaneX?

Again it seems this counter is likely to have multiple meanings so use

IB_PESC_RSFEC_FEC_UNCORR_CTR_LANE(X)_F

 + IB_PESC_PORT_FEC_CORR_BLOCK_CTR_F,
 + IB_PESC_PORT_FEC_UNCORR_BLOCK_CTR_F,
 + IB_PESC_PORT_FEC_CORR_SYMBOL_CTR_F,
 + IB_PESC_RSFEC_LAST_F,
 +
   IB_FIELD_LAST_  /* must be last */
  };
 
 diff --git a/src/dump.c b/src/dump.c
 index efe4bc4..64577c0 100644
 --- a/src/dump.c
 +++ b/src/dump.c
 @@ -857,6 +857,13 @@ void mad_dump_portsamples_result(char *buf, int
 bufsz, void *val, int valsz)
   _dump_fields(buf, bufsz, val, IB_PSR_TAG_F, IB_PSR_LAST_F);  }
 
 +void mad_dump_port_ext_speeds_counters_rsfec_active(char *buf, int bufsz,
 + void *val, int valsz)
 +{
 + _dump_fields(buf, bufsz, val, IB_PESC_RSFEC_PORT_SELECT_F,

Based on the above change use the FIRST field here.

 +  IB_PESC_RSFEC_LAST_F);
 +}
 +
  void mad_dump_port_ext_speeds_counters(char *buf, int bufsz, void *val, int
 valsz)  {
   _dump_fields(buf, bufsz, val, IB_PESC_PORT_SELECT_F,
 IB_PESC_LAST_F); @@ -1115,6 +1122,12 @@ void
 mad_dump_classportinfo(char *buf, int bufsz, void *val, int valsz)
   _dump_fields(buf, bufsz, val, IB_CPI_BASEVER_F, IB_CPI_TRAP_QKEY_F
 + 1);  }
 
 +void mad_dump_portinfo_ext(char *buf, int bufsz, void *val, int valsz)
 +{
 + _dump_fields(buf, bufsz, val, IB_PORT_EXT_CAPMASK_F,

Use IB_PORT_EXT_FIRST_F here.

 +  IB_PORT_EXT_LAST_F);
 +}
 +
  void xdump(FILE * file, char *msg, void *p, int size)  {  #define HEX(x)  
 ((x)  10
 ? '0' + (x) : 'a' + ((x) -10)) diff --git a/src/fields.c b/src/fields.c index
 33a6364..a848ebf 100644
 --- a/src/fields.c
 +++ b/src/fields.c
 @@ -949,8 +949,42 @@ static const ib_field_t ib_mad_f[] = {
   {480, 32, Counter14, mad_dump_uint},
   {0, 0}, /* IB_PSR_LAST_F */
 
 - {0, 0}  /* IB_FIELD_LAST_ */
 + /*
 +  * PortInfoExtended fields
 +  */
 + {0, 32, CapMask, mad_dump_hex},

I'm not quite sure why the spec was written this way but technically CapMask is 
only 1 bit???

Therefore I'm not sure we can define this field as above.  Can we get some 
assurances from the management WG that the 31 bit reserved field following the 
CapMask are only going to extend the CapMask?


 + {BITSOFFS(32, 16), FECModeActive, mad_dump_uint},
 + {BITSOFFS(48, 16), FDRFECModeSupported, mad_dump_uint},

As this is a bit field it is probably better to be printed as hex.

 + {BITSOFFS(64, 16), FDRFECModeEnabled, mad_dump_uint},

Print as hex.

 + {BITSOFFS(80, 16), EDRFECModeSupported, mad_dump_uint},

Print as hex.

 + {BITSOFFS(96, 16), EDRFECModeEnabled, mad_dump_uint},

Print as hex.

 + {0, 0}, /* IB_PORT_EXT_LAST_F */
 
 + /*
 +  * PortExtendedSpeedsCounters RSFEC Active fields
 +  */
 + {BITSOFFS(8, 8), PortSelect, mad_dump_uint},
 + {64, 64, CounterSelect, mad_dump_hex},
 + {BITSOFFS(128, 16), SyncHeaderErrorCounter, mad_dump_uint},
 + {BITSOFFS(144, 16), UnknownBlockCounter, mad_dump_uint},

Add missing fields.

 + {352, 32, FECCorrectableSymbolCtrLane0, mad_dump_uint},

Same comment as above regarding the name here.  It is unfortunate that the lib 
could not change the name based on RS-FEC.  That will be up to other software 
to decode in a more user friendly way.  For these raw dumps we should use the 
more generic name:

FECCorrectableCtrLaneX

 + {384, 32, FECCorrectableSymbolCtrLane1, mad_dump_uint},
 + {416, 32, FECCorrectableSymbolCtrLane2, mad_dump_uint},
 + {448, 32, FECCorrectableSymbolCtrLane3, mad_dump_uint},
 + {480, 32, FECCorrectableSymbolCtrLane4, mad_dump_uint},
 + {512, 32, FECCorrectableSymbolCtrLane5, mad_dump_uint},
 + {544, 

RE: [PATCH infiniband-diags] saquery.c: Fix handling of cpi and ClassPortInfo options

2014-11-08 Thread Weiny, Ira
 -Original Message-
 From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
 Sent: Wednesday, November 05, 2014 12:22 PM
 To: Weiny, Ira
 Cc: linux-rdma (linux-rdma@vger.kernel.org); Doug Ledford; Dan Ben-Yosef
 Subject: [PATCH infiniband-diags] saquery.c: Fix handling of cpi and
 ClassPortInfo options
 
 
 Need to check if query_type is CLASS_PORT_INFO in order to handle saquery
 [cpi | ClassPortInfo].
 Otherwise, it just dumps ClassPortInfo attribute as all 0s.
 
 Note that saquery -c works fine because it set command to
 SAQUERY_CMD_CLASS_PORT_INFO.
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks applied,
Ira



 ---
 diff --git a/src/saquery.c b/src/saquery.c index 0f39064..1af3401 100644
 --- a/src/saquery.c
 +++ b/src/saquery.c
 @@ -1869,6 +1869,7 @@ int main(int argc, char **argv)
   params.dlid = get_lid(h, dst_lid);
 
   if (command == SAQUERY_CMD_CLASS_PORT_INFO ||
 + query_type == CLASS_PORT_INFO ||
   query_type == IB_SA_ATTR_SWITCHINFORECORD)
   sa_cpi_required = 1;
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH infiniband-diags] saquery.c: Return proper status from query_sa_cpi on bad result status

2014-11-08 Thread Weiny, Ira


 -Original Message-
 From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
 Sent: Wednesday, November 05, 2014 12:22 PM
 To: Weiny, Ira
 Cc: linux-rdma (linux-rdma@vger.kernel.org); Dan Ben-Yosef
 Subject: [PATCH infiniband-diags] saquery.c: Return proper status from
 query_sa_cpi on bad result status
 
 
 When result status was other than IB_SA_MAD_STATUS_SUCCESS,
 0 was being returned rather than EIO as intended.
 
 Signed-off-by: Hal Rosenstock h...@mellanox.com

Thanks applied,
Ira


 ---
 diff --git a/src/saquery.c b/src/saquery.c index 0f39064..1af3401 100644
 --- a/src/saquery.c
 +++ b/src/saquery.c
 @@ -1445,7 +1445,7 @@ static int query_sa_cpi(struct sa_handle *h, struct
 query_params *query_params)
   memcpy(query_params-cpi, cpi, sizeof(query_params-cpi));
  Exit:
   sa_free_result_mad(result);
 - return (0);
 + return ret;
  }
 
  static const struct query_cmd query_cmds[] = {
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] infiniband-diags: add rdma-ndd daemon

2014-10-31 Thread Weiny, Ira
 
 On 10/31/14 00:54, ira.we...@intel.com wrote:
  +int set_rdma_device_names(const char *hostname) {
  +   DIR *class_dir;
  +   struct dirent *dent;
  +
  +   class_dir = opendir(SYS_INFINIBAND);
  +   if (!class_dir) {
  +   syslog(LOG_INFO, Failed to open %s, SYS_INFINIBAND);
  +   return -ENOSYS;
  +   }
  +
  +   while ((dent = readdir(class_dir))) {
  +   int retry = set_retry_cnt;
  +   if (dent-d_name[0] == '.')
  +   continue;
  +
  +   while (update_node_desc(dent-d_name, hostname)  retry
  0) {
  +   syslog(LOG_ERR, retrying set Node Description on
 %s\n,
  +   dent-d_name);
  +   retry--;
  +   }
  +   }
  +
  +   return 0;
  +}
 
 Has this code been tested with Valgrind and --track-fds=yes and --leak-
 check=full ? I see an opendir() call in the above code but no closedir().

Great catch thanks!

I'll run with valgrind on v2.

 
   +int read_and_set_hostname(int fd)
   +{
   +  int rc;
   +  char buf[128];
   +  if (read(fd, buf, 65) = 0) {
   +  buf[65] = '\0';
   +  newline_to_null(buf);
   +  strip_domain(buf);
   +  syslog(LOG_INFO, new hostname detected: %s, buf);
   +  set_rdma_device_names((char *)buf);
   +  rc = 0;
   +  } else {
   +  syslog(LOG_ERR, Read %s Failed\n, SYS_HOSTNAME);
   +  rc = -EIO;
   +  }
   +  return rc;
   +}
 
 The above code will trigger reading uninitialized data if the data read from 
 fd
 is not newline-terminated and less than 65 characters are read.

I'm not sure what you mean.  Do you mean will trigger _using_ uninitialized 
data?

How about the following?

@@ -155,8 +156,8 @@ int read_and_set_hostname(int fd)
 {
int rc;
char buf[128];
-   if (read(fd, buf, 65) = 0) {
-   buf[65] = '\0';
+   memset(buf, 0, sizeof(buf));
+   if (read(fd, buf, 64) = 0) {
newline_to_null(buf);
strip_domain(buf);
syslog(LOG_INFO, new hostname detected: %s, buf);


Ira

 
 Bart.
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] infiniband-diags: add rdma-ndd daemon

2014-10-31 Thread Weiny, Ira
 
  How about the following?
 
  @@ -155,8 +156,8 @@ int read_and_set_hostname(int fd)
{
   int rc;
   char buf[128];
  -   if (read(fd, buf, 65) = 0) {
  -   buf[65] = '\0';
  +   memset(buf, 0, sizeof(buf));
  +   if (read(fd, buf, 64) = 0) {
   newline_to_null(buf);
   strip_domain(buf);
   syslog(LOG_INFO, new hostname detected: %s, buf);
 
 This change looks fine to me. A possible alternative is to use dup() +
 fdopen() + fgets() + fclose(). fgets() namely automatically appends a
 terminating '\0'.

Agreed.  But I'd like to avoid creating additional fd's.

Thanks for the review,
Ira

 
 Bart.
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Are the PathRecord reserved components really reserved?

2014-08-25 Thread Weiny, Ira
 
 Hello,
 
 I am looking into the OFED code for the component mask bits of the
 PathRecord requests and I need some further explanation.
 
 The IB specification in section 15.2.5.16 PathRecord, table 207 PathRecord
 defines the first two components as Reserved. Each one of them is 32 bits
 long and the description of the first one is Reserved
 while the description of the second reserved component is Offset for
 alignment.


The IBTA has published errata to the 1.2.1 specification which includes the use 
of many reserved fields.

I believe these errata are public on their web site.

Ira

 
 However, in the file include/rdma/ib_sa.h where the component mask bits for
 the Path Records are defined, the 1st and 2nd component are combined
 together under the name IB_SA_PATH_REC_SERVICE_ID.
 
 
 #define IB_SA_PATH_REC_SERVICE_ID   (IB_SA_COMP_MASK( 0) |\
  IB_SA_COMP_MASK( 1))
 
 
 The same definition exists in the OpenSM code in file include/iba/ib_types.h
 
 
 /* Path Record Component Masks */
 #define  IB_PR_COMPMASK_SERVICEID_MSB
 (CL_HTON64(((uint64_t)1)0))
 #define  IB_PR_COMPMASK_SERVICEID_LSB (CL_HTON64(((uint64_t)1)1))
 .
 .
 #define  IB_PR_COMPMASK_SERVICEID (IB_PR_COMPMASK_SERVICEID_MSB |
 \
IB_PR_COMPMASK_SERVICEID_LSB)
 
 
 Why are these fields named as SERVICE_ID? I would expect something like
 IB_SA_PATH_REC_RESERVED_1 and IB_SA_PATH_REC_RESERVED_2 according
 to the specs. Are they really reserved or used, and if they are used, when 
 does
 this happen?
 
 Vangelis



RE: Question on merge windows.

2014-07-30 Thread Weiny, Ira
 
  If that is the case would you like me to rebase the latest version of
  following patches onto your most recent for-next branch and
  resubmit?  Or would you prefer I _not_ do so?  I would like to see if
  we can get this series into 3.17 if possible.
 
 I'm guess the answer is no, but are there any changes to the patches if you
 rebase to 3.16-rc7?

No changes.  They all apply cleanly with git am to for-next.

 
 I don't see a list of patches staged for 3.17 in Roland's upstream tree yet, 
 but
 he either may not have gotten to it  yet, or just hasn't pushed them.

Thanks for the reply,
Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Question on merge windows and MAD submitted MAD patches.

2014-07-18 Thread Weiny, Ira
 
 Roland,
 
 I see the merge window closed on Sunday and Linus has released 3.16-rc1.
 Does that mean now is the time you will start accepting things destined for 
 3.17
 while 3.16 bakes?
 
 If that is the case would you like me to rebase the latest version of 
 following
 patches onto your most recent for-next branch and resubmit?  Or would you
 prefer I _not_ do so?  I would like to see if we can get this series into 
 3.17 if
 possible.

Roland,

I did not get a response to this email so I'm not sure if I should resubmit my 
patches or not.

I have rejected the older versions of the patches on patchwork and marked you 
the Delegate for the following current versions of this series.

[PATCH v2 1/5] ib/umad: update module to [pr|dev]_* style print messages
[PATCH v2 2/5] ib/mad: update module to [pr|dev]_* style print messages
[PATCH v2 3/5] ib/mad: Add dev_notice messages for various umad/mad
[PATCH v3 4/5] ib/mad: add new ioctl to user space to support new 
[PATCH 5/5] ib/mad: Add user space RMPP support

I hope this helps.

I'm also more than happy to resend the patches which I have rebased on top of 
your for-next branch but there are no differences and the old patches 
cherry-pick cleanly.  So I'm holding off on submitting to avoid clutter on the 
mailing list and patchwork.

Thanks,
Ira

 
 Thanks,
 Ira
 
  -Original Message-
  From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
  ow...@vger.kernel.org] On Behalf Of ira.we...@intel.com
  Sent: Monday, May 05, 2014 6:45 PM
  To: rol...@purestorage.com
  Cc: linux-rdma@vger.kernel.org
  Subject: [PATCH RESEND 0/5] ib/mad: Add ability for user space
  applications to control RMPP directly
 
  My apologies if anyone receives duplicates of these emails.
  Apparently the series did not make it to linux-rdma the first time.
 
  The first 3 patches update the umad and mad modules to use pr_* style
  print messages in preparation for adding pr_notice messages to the
  existing and new registration functions.
 
  The final 2 patches add a new registration mechanism which allows for
  flags to be specified at ib_mad registration time.  The first such
  flag is to allow user control of RMPP transactions.
 
  Existing registration and RMPP transactions are unaffected by this change.
 
  [PATCH 1/5] ib/umad: update module to pr_* style print messages
  [PATCH 2/5] ib/mad: update module to pr_* style print messages
  [PATCH 3/5] ib/mad: Add pr_notice messages for various umad/mad
  [PATCH 4/5] ib/mad: add new ioctl to user space to support new 
  [PATCH 5/5] ib/mad: Add user space RMPP support
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-rdma
  in the body of a message to majord...@vger.kernel.org More majordomo
  info at http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >