[PATCH] ib/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state

2011-03-03 Thread Hefty, Sean
This problem was reported by Moni Shoua  and
Amir Vadai :

When destroying a cm_id from a context of a work queue
and if the lap_state of this cm_id is IB_CM_LAP_SENT,
we need to release the reference of this id that
was taken upon the send of the LAP message.
Otherwise, if the expected APR message gets lost,
it is only after a long time that the reference will be
released, while during that the work handler thread is
not available to process other things.

It turns out that we need to cancel any pending LAP messages
whenever we transition out of the IB_CM_ESTABLISH state.  This
occurs when disconnecting - either sending or receiving a DREQ.
It can also happen in a corner case where we receive a REJ message
after sending an RTU, followed by a LAP.  Add checks and cancel
any outstanding LAP messages in these three cases.

Canceling the LAP when sending a DREQ fixes the destroy problem
reported by Moni.  When a cm_id is destroyed in the IB_CM_ESTABLISHED
state, it sends a DREQ to the remote side to notify the peer that
the connection is going away.

Signed-off-by: Sean Hefty 
---
Moni, I don't have a good way to test that this really fixes your problem,
but it looks like it should.
I considered merging the cm state and lap states together, but I wasn't
convinced that that made things any simpler.

 drivers/infiniband/core/cm.c |   19 ++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 1d9616b..f804e28 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1988,6 +1988,10 @@ int ib_send_cm_dreq(struct ib_cm_id *cm_id,
goto out;
}
 
+   if (cm_id->lap_state == IB_CM_LAP_SENT ||
+   cm_id->lap_state == IB_CM_MRA_LAP_RCVD)
+   ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
+
ret = cm_alloc_msg(cm_id_priv, &msg);
if (ret) {
cm_enter_timewait(cm_id_priv);
@@ -2129,6 +2133,10 @@ static int cm_dreq_handler(struct cm_work *work)
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
break;
case IB_CM_ESTABLISHED:
+   if (cm_id_priv->id.lap_state == IB_CM_LAP_SENT ||
+   cm_id_priv->id.lap_state == IB_CM_MRA_LAP_RCVD)
+   ib_cancel_mad(cm_id_priv->av.port->mad_agent, 
cm_id_priv->msg);
+   break;
case IB_CM_MRA_REP_RCVD:
break;
case IB_CM_TIMEWAIT:
@@ -2349,9 +2357,18 @@ static int cm_rej_handler(struct cm_work *work)
/* fall through */
case IB_CM_REP_RCVD:
case IB_CM_MRA_REP_SENT:
-   case IB_CM_ESTABLISHED:
cm_enter_timewait(cm_id_priv);
break;
+   case IB_CM_ESTABLISHED:
+   if (cm_id_priv->id.lap_state == IB_CM_LAP_UNINIT ||
+   cm_id_priv->id.lap_state == IB_CM_LAP_SENT) {
+   if (cm_id_priv->id.lap_state == IB_CM_LAP_SENT)
+   ib_cancel_mad(cm_id_priv->av.port->mad_agent,
+ cm_id_priv->msg);
+   cm_enter_timewait(cm_id_priv);
+   break;
+   }
+   /* fall through */
default:
spin_unlock_irq(&cm_id_priv->lock);
ret = -EINVAL;


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ib/cm: Cancel pending lap message when destroying an ID

2011-03-03 Thread Hefty, Sean
> I tried this and it doesn't fix the issue in my test. It operates only
> in IB_CM_ESTABLISHED but in my tests the order of operations in client
> side is
> 1. rdma_connect() was called
> 2. lap was sent (for which apr will never be received - according to test 
> case)
> 3. rdma_disconnect() was called

Let me think about the best way to handle this.  We may want ib_send_cm_dreq() 
to cancel any outstanding LAP message.  This should fix the destroy case as 
well, since it calls into disconnect.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibdiagpath broken with TCL 8.5

2011-03-03 Thread Yevgeny Kliteynik
On 03-Mar-11 4:22 PM, Mike Heinz wrote:
> If I get a chance, I'll take a look and see if I find an easy fix.  One 
> simple thing that occurred to me was to modify ibdebug.tcl to  filter the 
> field names out of the output string but I'm not sure what the side-effects 
> would be.

That will probably work for ibdiagpath.
But still, if anyone has some script that uses
ibis, the fix won't resolve the problem for him.

-- YK

> -Original Message-
> From: Yevgeny Kliteynik [mailto:klit...@dev.mellanox.co.il]
> Sent: Thursday, March 03, 2011 5:45 AM
> To: Mike Heinz
> Cc: Linux RDMA; e...@lists.openfabrics.org; Todd Rimmer
> Subject: Re: ibdiagpath broken with TCL 8.5
> 
> Mike,
> 
> On 01-Mar-11 11:13 PM, Mike Heinz wrote:
>> YK,
>>
>> I had a chance to go back and dig further into this. I just scratch-built 
>> the ibis executable on an RHEL6 system, and started running it in 
>> interactive mode. What I see is that results that return arrays are getting 
>> garbage pre-pended to them - it looks like the root problem that John tried 
>> to patch last fall, and that's causing problems for some of my systems here, 
>> is that ibis isn't interfacing with TCL 8.5 correctly:
>>
>> % puts [smLftBlockMad dump]
>> -lft 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 % puts [smVlArbTableMad
>> dump] -vl_entry {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00}
>> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
>> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
>> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
>> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
>> 0x00} {0x0 0x00} {0x0 0x00}
>>
>> I do not see this behavior on systems running TCL 8.4:
>>
>> % ibis_init
>> 0
>> % ibis_set_port 0x00066a00a000707f
>> 0
>> % puts [smLftBlockMad dump]
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 % puts [smVlArbTableMad dump]
>> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
>> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
>> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
>> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
>> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
>> 0x00} {0x0 0x00}
> 
> Interesting. I tried it, and I see same results as you.
> Looks like "dump" is supposed to include field names only if there are more 
> than one field in the object.
> 
> With TCL 8.4, I see this:
> 
> % smVlArbTableMad dump
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} % smSwitchInfoMad dump -lin_cap 0 
> -rand_cap 0 -mcast_cap 0 -lin_top 0 -def_port 0 -def_mcast_pri_port 0 
> -def_mcast_not_port 0 -life_state 0 -lids_per_port 0 -enforce_cap 0 -flags 0
> 
> So VLArb Table doesn't have field name, while SwitchInfo has all its fields. 
> I see similar behavior with other objects.
> Ibis has an implementation of dump function for "non-trivial" objects 
> (objects that are not just set of standard data types). VLArbTable would be 
> one of them - it consists of VLArbTable Elements, that have their own dump 
> function:
> 
>  %typemap(tcl8, out) ib_vl_arb_element_t[ANY] {
>  int i;
>  char buff[16];
>  for (i=0; i<$dim0 ; i++) {
>  sprintf(buff, "{0x%x 0x%02x} ", $source[i].vl, 
> $source[i].weight);
>  Tcl_AppendResult(interp, buff, NULL);
>  }
>  }
> 
>  typedef struct _ibsm_vl_arb_table
>  {
>  ib_vl_arb_element_t 
> vl_entry[IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK];
>  } smVlArbTable;
> 
> Looks like this behavior has been changed in TCL 8.5.
> IMHO, the TCL 8.5 behavior seems more consistent.
> However, it is clear that in order to support 8.5 and older version, that 
> simple patch is not enough.
> Also, this new behavior will probably break any TCL script that was relaying 
> on the old ibis output...
> 
> If I'm right, then you will see this problem also with smPkeyTableMad, 
> smGuidInfoMad, smVlArbTableMad, smSlVlTableMad, smMftBlockMad, a

RE: ibdiagpath broken with TCL 8.5

2011-03-03 Thread Mike Heinz
If I get a chance, I'll take a look and see if I find an easy fix.  One simple 
thing that occurred to me was to modify ibdebug.tcl to  filter the field names 
out of the output string but I'm not sure what the side-effects would be.

-Original Message-
From: Yevgeny Kliteynik [mailto:klit...@dev.mellanox.co.il]
Sent: Thursday, March 03, 2011 5:45 AM
To: Mike Heinz
Cc: Linux RDMA; e...@lists.openfabrics.org; Todd Rimmer
Subject: Re: ibdiagpath broken with TCL 8.5

Mike,

On 01-Mar-11 11:13 PM, Mike Heinz wrote:
> YK,
>
> I had a chance to go back and dig further into this. I just scratch-built the 
> ibis executable on an RHEL6 system, and started running it in interactive 
> mode. What I see is that results that return arrays are getting garbage 
> pre-pended to them - it looks like the root problem that John tried to patch 
> last fall, and that's causing problems for some of my systems here, is that 
> ibis isn't interfacing with TCL 8.5 correctly:
>
> % puts [smLftBlockMad dump]
> -lft 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 % puts [smVlArbTableMad
> dump] -vl_entry {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00}
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
> 0x00} {0x0 0x00} {0x0 0x00}
>
> I do not see this behavior on systems running TCL 8.4:
>
> % ibis_init
> 0
> % ibis_set_port 0x00066a00a000707f
> 0
> % puts [smLftBlockMad dump]
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 % puts [smVlArbTableMad dump]
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
> 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0
> 0x00} {0x0 0x00}

Interesting. I tried it, and I see same results as you.
Looks like "dump" is supposed to include field names only if there are more 
than one field in the object.

With TCL 8.4, I see this:

% smVlArbTableMad dump
{0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
{0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
{0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
{0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
{0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} % smSwitchInfoMad dump -lin_cap 0 
-rand_cap 0 -mcast_cap 0 -lin_top 0 -def_port 0 -def_mcast_pri_port 0 
-def_mcast_not_port 0 -life_state 0 -lids_per_port 0 -enforce_cap 0 -flags 0

So VLArb Table doesn't have field name, while SwitchInfo has all its fields. I 
see similar behavior with other objects.
Ibis has an implementation of dump function for "non-trivial" objects (objects 
that are not just set of standard data types). VLArbTable would be one of them 
- it consists of VLArbTable Elements, that have their own dump function:

%typemap(tcl8, out) ib_vl_arb_element_t[ANY] {
int i;
char buff[16];
for (i=0; i <$dim0 ; i++) {
sprintf(buff, "{0x%x 0x%02x} ", $source[i].vl, 
$source[i].weight);
Tcl_AppendResult(interp, buff, NULL);
}
}

typedef struct _ibsm_vl_arb_table
{
ib_vl_arb_element_t vl_entry[IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK];
} smVlArbTable;

Looks like this behavior has been changed in TCL 8.5.
IMHO, the TCL 8.5 behavior seems more consistent.
However, it is clear that in order to support 8.5 and older version, that 
simple patch is not enough.
Also, this new behavior will probably break any TCL script that was relaying on 
the old ibis output...

If I'm right, then you will see this problem also with smPkeyTableMad, 
smGuidInfoMad, smVlArbTableMad, smSlVlTableMad, smMftBlockMad, and 
smLftBlockMad MADs.
And that's only SM MADs. There are also SA, CC, and others.

Bottom line, I'm reverting the fix to allow ibdiagpath work on all the distros 
with TCL 8.4.

For newer TCL some work needs to be done. To make ibis backward compatible, 
need to add dump wrapper for ALL the MADs with single field/array.

-- YK





>> -Original Message--

Re: [PATCH] ib/cm: Cancel pending lap message when destroying an ID

2011-03-03 Thread Moni Shoua
> Good catch, although, I think we can simplify the fix to the patch below
> (completely untested).  Please let me know if this solves the issue for you.
>
>  drivers/infiniband/core/cm.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 1d9616b..79da42d 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -888,6 +888,8 @@ retest:
>                               NULL, 0, NULL, 0);
>                break;
>        case IB_CM_ESTABLISHED:
> +               if (cm_id->lap_state == IB_CM_LAP_SENT)
> +                       ib_cancel_mad(cm_id_priv->av.port->mad_agent, 
> cm_id_priv->msg);
>                spin_unlock_irq(&cm_id_priv->lock);
>                ib_send_cm_dreq(cm_id, NULL, 0);
>                goto retest;
>
>

Hi Sean
I tried this and it doesn't fix the issue in my test. It operates only
in IB_CM_ESTABLISHED but in my tests the order of operations in client
side is
1. rdma_connect() was called
2. lap was sent (for which apr will never be received - according to test case)
3. rdma_disconnect() was called
4. rdma_destroy_id() was called (state here is IB_CM_DREQ_SENT)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibdiagpath broken with TCL 8.5

2011-03-03 Thread Yevgeny Kliteynik
Mike,

On 01-Mar-11 11:13 PM, Mike Heinz wrote:
> YK,
> 
> I had a chance to go back and dig further into this. I just scratch-built the 
> ibis executable on an RHEL6 system, and started running it in interactive 
> mode. What I see is that results that return arrays are getting garbage 
> pre-pended to them - it looks like the root problem that John tried to patch 
> last fall, and that's causing problems for some of my systems here, is that 
> ibis isn't interfacing with TCL 8.5 correctly:
> 
> % puts [smLftBlockMad dump]
> -lft 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
> 0x00 0x00 0x00 0x00 0x00
> % puts [smVlArbTableMad dump]
> -vl_entry {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00}
>
> I do not see this behavior on systems running TCL 8.4:
> 
> % ibis_init
> 0
> % ibis_set_port 0x00066a00a000707f
> 0
> % puts [smLftBlockMad dump]
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
> 0x00 0x00 0x00 0x00
> % puts [smVlArbTableMad dump]
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
> {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00}

Interesting. I tried it, and I see same results as you.
Looks like "dump" is supposed to include field names only
if there are more than one field in the object.

With TCL 8.4, I see this:

% smVlArbTableMad dump
{0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
{0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
{0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
{0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
{0x0 0x00} {0x0 0x00} {0x0 0x00} {0x0 0x00} 
% smSwitchInfoMad dump
-lin_cap 0 -rand_cap 0 -mcast_cap 0 -lin_top 0 -def_port 0 -def_mcast_pri_port 
0 -def_mcast_not_port 0 -life_state 0 -lids_per_port 0 -enforce_cap 0 -flags 0 

So VLArb Table doesn't have field name, while SwitchInfo has all
its fields. I see similar behavior with other objects.
Ibis has an implementation of dump function for "non-trivial" objects
(objects that are not just set of standard data types). VLArbTable
would be one of them - it consists of VLArbTable Elements, that have
their own dump function:

%typemap(tcl8, out) ib_vl_arb_element_t[ANY] {
int i;
char buff[16];
for (i=0; i <$dim0 ; i++) {
sprintf(buff, "{0x%x 0x%02x} ", $source[i].vl, 
$source[i].weight);
Tcl_AppendResult(interp, buff, NULL);
}
}

typedef struct _ibsm_vl_arb_table
{
ib_vl_arb_element_t vl_entry[IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK];
} smVlArbTable;

Looks like this behavior has been changed in TCL 8.5.
IMHO, the TCL 8.5 behavior seems more consistent.
However, it is clear that in order to support 8.5 and
older version, that simple patch is not enough.
Also, this new behavior will probably break any TCL script
that was relaying on the old ibis output...

If I'm right, then you will see this problem also with
smPkeyTableMad, smGuidInfoMad, smVlArbTableMad, 
smSlVlTableMad, smMftBlockMad, and smLftBlockMad MADs.
And that's only SM MADs. There are also SA, CC, and others.

Bottom line, I'm reverting the fix to allow ibdiagpath work
on all the distros with TCL 8.4.

For newer TCL some work needs to be done. To make ibis
backward compatible, need to add dump wrapper for ALL the
MADs with single field/array.

-- YK




 
>> -Original Message-
>> From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
>> boun...@lists.openfabrics.org] On Behalf Of Mike Heinz
>> Sent: Monday, February 21, 2011 11:55 AM
>> To: klit...@dev.mellanox.co.il
>> Cc: Linux RDMA; e...@lists.openfabrics.org
>> Subject: Re: [ewg] Patch breaks OFED 1.5.3: [PATCH] ibdiagpath:
>> Properly index VlArbTable during QoS test
>>
>> YK,
>>
>> I just finished running an RC4 build on Redhat 6. I didn't get the same
>> error - but ibdiagpa