Re: ibnetdiscover issue with multiported CA (or router) with multiple ports on same subnet

2010-09-01 Thread Sasha Khapyorsky
Hi Hal,

On 13:27 Wed 25 Aug , Hal Rosenstock wrote:
 
 I'm seeing an issue with ibnetdiscover from a CA port where it appears
 to extend a path at a remote CA port (it's actually another port on
 the same CA) to query NodeInfo of the next hop beyond it. I get the
 following error message:
 
 src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr
 0x11:0) bad status 110; Connection timed out
 
 where smpquery -D nodeinfo of 0,1,20 is a CA which can also be seen
 from the topology.
 
 It appears to stem from the following code snippet from
 libibnetdisc/src/ibnetdisc.c:recv_port_info
 
 if (port_num  mad_get_field(port-info, 0, IB_PORT_PHYS_STATE_F)
 == IB_PORT_PHYS_STATE_LINKUP
  ((node-type == IB_NODE_SWITCH  port_num != local_port) ||
 (node == fabric-from_node  port_num == local_port))) {
 ib_portid_t path = smp-path;
 if (extend_dpath(engine, path, port_num)  0)
 query_node_info(engine, path, node);
 }

This makes sense for me.

 
 that was introduced by:
 commit fcb8d5e7588e38508a8e354c37009d73c0a3889f
 Author: Sasha Khapyorsky sas...@voltaire.com
 Date:   Sat Apr 10 02:43:24 2010 +0300
 
 libibnetdisc: no backward NodeInfo queries
 
 Then switch is reached via port N we don't need to query back via this
 port - source node is discovered already. Finally this saves some amount
 of unnecessary MADs.
 
 Signed-off-by: Sasha Khapyorsky sas...@voltaire.com
 
 and subsequently modified by:
 commit 49d149c63a44d99259f516a15af53d8cf3f0e7c9
 Author: Sasha Khapyorsky sas...@voltaire.com
 Date:   Tue Apr 13 19:54:45 2010 +0300
 
 libibnetdisc: don't try to cross discovery over CA
 
 When discovery is running from CA node it shouldn't try to cross over
 all ports, but only via local one (send over non-local ports will fail
 since CA doesn't route MADs).
 
 Signed-off-by: Sasha Khapyorsky sas...@voltaire.com
 
 due to the (node == fabric-from_node  port_num == local_port)
 clause being TRUE.

But I don't see how those patches are actually related to the story. An
original (before patches) condition was:

if (port_num  mad_get_field(port-info, 0, IB_PORT_PHYS_STATE_F)
== IB_PORT_PHYS_STATE_LINKUP
 (node-type == IB_NODE_SWITCH || node == fabric-from_node))

, which has the described bug as I can understand this.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibnetdiscover issue with multiported CA (or router) with multiple ports on same subnet

2010-09-01 Thread Hal Rosenstock
Hi Sasha,

On Wed, Sep 1, 2010 at 9:43 AM, Sasha Khapyorsky sas...@voltaire.com wrote:
 Hi Hal,

 On 13:27 Wed 25 Aug     , Hal Rosenstock wrote:

 I'm seeing an issue with ibnetdiscover from a CA port where it appears
 to extend a path at a remote CA port (it's actually another port on
 the same CA) to query NodeInfo of the next hop beyond it. I get the
 following error message:

 src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr
 0x11:0) bad status 110; Connection timed out

 where smpquery -D nodeinfo of 0,1,20 is a CA which can also be seen
 from the topology.

 It appears to stem from the following code snippet from
 libibnetdisc/src/ibnetdisc.c:recv_port_info

         if (port_num  mad_get_field(port-info, 0, IB_PORT_PHYS_STATE_F)
             == IB_PORT_PHYS_STATE_LINKUP
              ((node-type == IB_NODE_SWITCH  port_num != local_port) ||
                 (node == fabric-from_node  port_num == local_port))) {
                 ib_portid_t path = smp-path;
                 if (extend_dpath(engine, path, port_num)  0)
                         query_node_info(engine, path, node);
         }

 This makes sense for me.


 that was introduced by:
 commit fcb8d5e7588e38508a8e354c37009d73c0a3889f
 Author: Sasha Khapyorsky sas...@voltaire.com
 Date:   Sat Apr 10 02:43:24 2010 +0300

     libibnetdisc: no backward NodeInfo queries

     Then switch is reached via port N we don't need to query back via this
     port - source node is discovered already. Finally this saves some amount
     of unnecessary MADs.

     Signed-off-by: Sasha Khapyorsky sas...@voltaire.com

 and subsequently modified by:
 commit 49d149c63a44d99259f516a15af53d8cf3f0e7c9
 Author: Sasha Khapyorsky sas...@voltaire.com
 Date:   Tue Apr 13 19:54:45 2010 +0300

     libibnetdisc: don't try to cross discovery over CA

     When discovery is running from CA node it shouldn't try to cross over
     all ports, but only via local one (send over non-local ports will fail
     since CA doesn't route MADs).

     Signed-off-by: Sasha Khapyorsky sas...@voltaire.com

 due to the (node == fabric-from_node  port_num == local_port)
 clause being TRUE.

 But I don't see how those patches are actually related to the story. An
 original (before patches) condition was:

        if (port_num  mad_get_field(port-info, 0, IB_PORT_PHYS_STATE_F)
            == IB_PORT_PHYS_STATE_LINKUP
             (node-type == IB_NODE_SWITCH || node == fabric-from_node))

 , which has the described bug as I can understand this.

I thought this used to work and those changes looked related to me.
Maybe the fix is right but that part of the problem description isn't.
Do you want a revised patch without that part of the description ?

-- Hal


 Sasha
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibnetdiscover issue with multiported CA (or router) with multiple ports on same subnet

2010-09-01 Thread Sasha Khapyorsky
On 09:47 Wed 01 Sep , Hal Rosenstock wrote:
 
 I thought this used to work and those changes looked related to me.
 Maybe the fix is right but that part of the problem description isn't.
 Do you want a revised patch without that part of the description ?

No needs - I applied this already. Thanks.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ibnetdiscover issue with multiported CA (or router) with multiple ports on same subnet

2010-08-25 Thread Hal Rosenstock
Sasha,

I'm seeing an issue with ibnetdiscover from a CA port where it appears
to extend a path at a remote CA port (it's actually another port on
the same CA) to query NodeInfo of the next hop beyond it. I get the
following error message:

src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr
0x11:0) bad status 110; Connection timed out

where smpquery -D nodeinfo of 0,1,20 is a CA which can also be seen
from the topology.

It appears to stem from the following code snippet from
libibnetdisc/src/ibnetdisc.c:recv_port_info

if (port_num  mad_get_field(port-info, 0, IB_PORT_PHYS_STATE_F)
== IB_PORT_PHYS_STATE_LINKUP
 ((node-type == IB_NODE_SWITCH  port_num != local_port) ||
(node == fabric-from_node  port_num == local_port))) {
ib_portid_t path = smp-path;
if (extend_dpath(engine, path, port_num)  0)
query_node_info(engine, path, node);
}

that was introduced by:
commit fcb8d5e7588e38508a8e354c37009d73c0a3889f
Author: Sasha Khapyorsky sas...@voltaire.com
Date:   Sat Apr 10 02:43:24 2010 +0300

libibnetdisc: no backward NodeInfo queries

Then switch is reached via port N we don't need to query back via this
port - source node is discovered already. Finally this saves some amount
of unnecessary MADs.

Signed-off-by: Sasha Khapyorsky sas...@voltaire.com

and subsequently modified by:
commit 49d149c63a44d99259f516a15af53d8cf3f0e7c9
Author: Sasha Khapyorsky sas...@voltaire.com
Date:   Tue Apr 13 19:54:45 2010 +0300

libibnetdisc: don't try to cross discovery over CA

When discovery is running from CA node it shouldn't try to cross over
all ports, but only via local one (send over non-local ports will fail
since CA doesn't route MADs).

Signed-off-by: Sasha Khapyorsky sas...@voltaire.com

due to the (node == fabric-from_node  port_num == local_port)
clause being TRUE.

ibnetdiscover
src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr
0x11:0) bad status 110; Connection timed out
#
# Topology file: generated on Wed Aug 25 18:52:16 2010
#
# Initiated from node 0002c9020020ee0c port 0002c9020020ee0d

vendid=0x2c9
devid=0xb924
sysimgguid=0xb8c00438b
switchguid=0xb8c00438b(b8c00438b)
Switch  24 S-000b8c00438b # MT47396 Infiniscale-III
Mellanox Technologies base port 0 lid 4 lmc 0
[5] H-0002c90310e0[1](2c90310e1)  # sw124
HCA-1 lid 5 4xDDR
[6] H-0002c903d1c8[1](2c903d1c9)  # sw123
HCA-1 lid 0 4xDDR
[7] H-0002c9020020ee0c[1](2c9020020ee0d)  # sw075
HCA-1 lid 2 4xDDR
[20]H-0002c9020020ee0c[2](2c9020020ee0e)  # sw075
HCA-1 lid 3 4xDDR

...

vendid=0x2c9
devid=0x6278
sysimgguid=0x2c9020020ee0f
caguid=0x2c9020020ee0c
Ca  2 H-0002c9020020ee0c  # sw075 HCA-1
[1](2c9020020ee0d)  S-000b8c00438b[7] # lid 2 lmc 0
MT47396 Infiniscale-III Mellanox Technologies lid 4 4xDDR
[2](2c9020020ee0e)  S-000b8c00438b[20]# lid
3 lmc 0 MT47396 Infiniscale-III Mellanox Technologies lid 4 4xDDR


smpquery -D nodeinfo 0,1,20
# Node info: DR path slid 65535; dlid 65535; 0,1,20
BaseVers:1
ClassVers:...1
NodeType:Channel Adapter
NumPorts:2
SystemGuid:..0x0002c9020020ee0f
Guid:0x0002c9020020ee0c
PortGuid:0x0002c9020020ee0e
PartCap:.64
DevId:...0x6278
Revision:0x00a0
LocalPort:...2
VendorId:0x0002c9

I don't think the local port part of the test above (node ==
fabric-from_node  port_num == local_port)  is correct where:

local_port = (uint8_t) mad_get_field(port_info, 0,
IB_PORT_LOCAL_PORT_F);

Instead, shouldn't port_num be checked against the local port that
initiated the ibnetdiscover (which in this case is port 1) ? If so, a
from_portnum could be added/saved in the fabric structure and used
for this check. Do you concur with this approach ?

-- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html