Re: Errors from ibchecknet

2010-09-03 Thread Chuck Hartley
I checked another working  fabric here and also see the same warnings,
so it looks like the warnings are not really a problem.

Well, I assume that it is just IPoIB that isn't working. Since ibping
works, I believe that says the IB part is ok. Of course, I can't run
any of the perftools since they all need IPoIB to resolve the host IP.

Do you have any suggestions of what to check to diagnose the IPoIB
problem?  Specifically, can you think of any interaction with the
normal networking stuff in the kernel that might be misconfigured?
The reason I mention that is because I rebuilt/installed OFED (no
errors/warnings) and it is in its default configuration, which is
running well on other similar fabrics here.  Therefore I assume the
problem must be with the non-OFED stuff. Previously, whenever this
kind of problem cropped up it has always been because opensm was not
running. I did check that iptables was off, so it isn't a firewall
issue.

- Chuck


On Thu, Sep 2, 2010 at 4:16 PM, Ira Weiny wei...@llnl.gov wrote:
 On Thu, 2 Sep 2010 11:11:13 -0700
 Chuck Hartley hartlc...@gmail.com wrote:

 Sure, here is the output:
 Note this is with the switch we swapped in, so the port numbers don't
 match the ibchecknet output in the original message.

 # ibstat
 CA 'mlx4_0'
       CA type: MT26428
       Number of ports: 2
       Firmware version: 2.6.0
       Hardware version: a0
       Node GUID: 0x0002c90300032de0
       System image GUID: 0x0002c90300032de3
       Port 1:
               State: Active
               Physical state: LinkUp
               Rate: 40
               Base lid: 6
               LMC: 0
               SM lid: 6

 Well the SM lid is set here.  Is it set on the other nodes?

 I don't run ibchecknet usually but I am getting the same errors here on a
 working fabric...

 ibwarn: [13629] dump_perfcounters: PortXmitWait not indicated so ignore this 
 counter
 #warn: Lid is not configured lid 37 port 2
 #warn: SM Lid is not configured
 Port check lid 37 port 2:  FAILED

 Looking at this output I don't think this is an error.

 13:17:14  smpquery nodeinfo 37
 # Node info: Lid 37
 BaseVers:1
 ClassVers:...1
 NodeType:Switch
 NumPorts:24
 ...

 On switch external Ports the Lid and SMLid are not used.

 Hal, would you concur?

 Chuck,
 Is it just that IPoIB is not working for you?

 Ira


               Capability mask: 0x0251086a
               Port GUID: 0x0002c90300032de1
       Port 2:
               State: Down
               Physical state: Polling
               Rate: 10
               Base lid: 0
               LMC: 0
               SM lid: 0
               Capability mask: 0x02510868
               Port GUID: 0x0002c90300032de2
 CA 'mthca0'
       CA type: MT25204
       Number of ports: 1
       Firmware version: 1.2.0
       Hardware version: a0
       Node GUID: 0x003048c64c0c
       System image GUID: 0x003048c64c0c0003
       Port 1:
               State: Down
               Physical state: Polling
               Rate: 10
               Base lid: 0
               LMC: 0
               SM lid: 0
               Capability mask: 0x02510a68
               Port GUID: 0x003048c64c0c0001

 # iblinkinfo
 Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies:
            1    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==       5
 1[  ]  HCA-1 ( )
            1    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==       6
 1[  ] linux70 HCA-1 ( )
            1    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==       7
 1[  ] linux71 HCA-1 ( )
            1    4[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1    5[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1    6[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1    7[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1    8[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1    9[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   10[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   12[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   20[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   21[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   22[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   23[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   24[  

Errors from ibchecknet

2010-09-02 Thread Chuck Hartley
Hello,

We installed 1.5.1 and are having problems getting the IB fabric
working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
no errors. However, ibchecknet shows that the switch ports are not
being configured.  We have never seen this before and are at a loss as
to where the problem might be - would someone please point us in the
right direction to look?  Could it be a problem with the switch
itself? Output from ibchecknet below.


# ibchecknet
Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
ignore this counter
#warn: Lid is not configured lid 3 port 7
#warn: SM Lid is not configured
Port check lid 3 port 7:  FAILED
# Checked Switch: nodeguid 0x0002c90200405368 with failure
ibwarn: [26751] dump_perfcounters: PortXmitWait not indicated so
ignore this counter
#warn: Lid is not configured lid 3 port 10
#warn: SM Lid is not configured
Port check lid 3 port 10:  FAILED
ibwarn: [26770] dump_perfcounters: PortXmitWait not indicated so
ignore this counter
#warn: Lid is not configured lid 3 port 11
#warn: SM Lid is not configured
Port check lid 3 port 11:  FAILED
ibwarn: [26789] dump_perfcounters: PortXmitWait not indicated so
ignore this counter
#warn: Lid is not configured lid 3 port 34
#warn: SM Lid is not configured
Port check lid 3 port 34:  FAILED
ibwarn: [26808] dump_perfcounters: PortXmitWait not indicated so
ignore this counter
#warn: Lid is not configured lid 3 port 35
#warn: SM Lid is not configured
Port check lid 3 port 35:  FAILED

# Checking Ca: nodeguid 0x0030487f3076
ibwarn: [26832] dump_perfcounters: PortXmitWait not indicated so
ignore this counter

# Checking Ca: nodeguid 0x0030487f32b2
ibwarn: [26856] dump_perfcounters: PortXmitWait not indicated so
ignore this counter

# Checking Ca: nodeguid 0x0002c9030003360c

# Checking Ca: nodeguid 0x0002c90300084162
ibwarn: [26904] dump_perfcounters: PortXmitWait not indicated so
ignore this counter

# Checking Ca: nodeguid 0x0002c90300032de0

## Summary: 6 nodes checked, 0 bad nodes found
##  10 ports checked, 5 bad ports found
##  0 ports have errors beyond threshold
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Errors from ibchecknet

2010-09-02 Thread Hal Rosenstock
On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley hartlc...@gmail.com wrote:
 Hello,

 We installed 1.5.1 and are having problems getting the IB fabric
 working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
 no errors. However, ibchecknet shows that the switch ports are not
 being configured.  We have never seen this before and are at a loss as
 to where the problem might be - would someone please point us in the
 right direction to look?  Could it be a problem with the switch
 itself? Output from ibchecknet below.


 # ibchecknet
 Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
 ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
 ignore this counter
 #warn: Lid is not configured lid 3 port 7
 #warn: SM Lid is not configured

Is there an SM running on your subnet ? If so, I think that the lack
of an SM could account for all of the issues mentioned here.

-- Hal

 Port check lid 3 port 7:  FAILED
 # Checked Switch: nodeguid 0x0002c90200405368 with failure
 ibwarn: [26751] dump_perfcounters: PortXmitWait not indicated so
 ignore this counter
 #warn: Lid is not configured lid 3 port 10
 #warn: SM Lid is not configured
 Port check lid 3 port 10:  FAILED
 ibwarn: [26770] dump_perfcounters: PortXmitWait not indicated so
 ignore this counter
 #warn: Lid is not configured lid 3 port 11
 #warn: SM Lid is not configured
 Port check lid 3 port 11:  FAILED
 ibwarn: [26789] dump_perfcounters: PortXmitWait not indicated so
 ignore this counter
 #warn: Lid is not configured lid 3 port 34
 #warn: SM Lid is not configured
 Port check lid 3 port 34:  FAILED
 ibwarn: [26808] dump_perfcounters: PortXmitWait not indicated so
 ignore this counter
 #warn: Lid is not configured lid 3 port 35
 #warn: SM Lid is not configured
 Port check lid 3 port 35:  FAILED

 # Checking Ca: nodeguid 0x0030487f3076
 ibwarn: [26832] dump_perfcounters: PortXmitWait not indicated so
 ignore this counter

 # Checking Ca: nodeguid 0x0030487f32b2
 ibwarn: [26856] dump_perfcounters: PortXmitWait not indicated so
 ignore this counter

 # Checking Ca: nodeguid 0x0002c9030003360c

 # Checking Ca: nodeguid 0x0002c90300084162
 ibwarn: [26904] dump_perfcounters: PortXmitWait not indicated so
 ignore this counter

 # Checking Ca: nodeguid 0x0002c90300032de0

 ## Summary: 6 nodes checked, 0 bad nodes found
 ##          10 ports checked, 5 bad ports found
 ##          0 ports have errors beyond threshold
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Errors from ibchecknet

2010-09-02 Thread Chuck Hartley
Sure, here is the output:
Note this is with the switch we swapped in, so the port numbers don't
match the ibchecknet output in the original message.

# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 2
Firmware version: 2.6.0
Hardware version: a0
Node GUID: 0x0002c90300032de0
System image GUID: 0x0002c90300032de3
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 6
LMC: 0
SM lid: 6
Capability mask: 0x0251086a
Port GUID: 0x0002c90300032de1
Port 2:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510868
Port GUID: 0x0002c90300032de2
CA 'mthca0'
CA type: MT25204
Number of ports: 1
Firmware version: 1.2.0
Hardware version: a0
Node GUID: 0x003048c64c0c
System image GUID: 0x003048c64c0c0003
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510a68
Port GUID: 0x003048c64c0c0001

# iblinkinfo
Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies:
   11[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==   5
1[  ]  HCA-1 ( )
   12[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==   6
1[  ] linux70 HCA-1 ( )
   13[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==   7
1[  ] linux71 HCA-1 ( )
   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   10[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   12[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   20[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   21[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   22[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   23[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   24[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==   9
1[  ]  HCA-1 ( )
   1   25[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==   8
1[  ]  HCA-1 ( )
   1   26[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   27[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   28[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   29[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   30[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   31[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   32[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   33[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   34[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   35[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )
   1   36[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
[  ]  ( )

On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny wei...@llnl.gov wrote:
 On Thu, 2 Sep 2010 06:56:50 -0700
 Chuck Hartley hartlc...@gmail.com wrote:

 We swapped in a different switch and see the same errors. The opensm
 logfile does not show any errors:

 Could you run ibstat on the node with OpenSM running?

 And iblinkinfo on the same node?

 Send that output.

 Ira


 -
 OpenSM 3.3.5
 Command Line Arguments:
  Daemon mode
  Log File: /var/log/opensm.log
 -
 OpenSM 3.3.5

 Sep 02 05:56:29 933684 [B53B8700] 0x80 - OpenSM 3.3.5
 Entering DISCOVERING state

 Sep 02 05:56:29 934931 [B53B8700] 0x02 - osm_vendor_init: 1000
 pending umads specified
 Sep 02 05:56:29 935079 [B53B8700] 0x80 - Entering DISCOVERING state
 Using default GUID 0x2c90300032de1
 Entering MASTER state

 Sep 02 05:56:29 953763 [B53B8700] 0x02 - osm_vendor_bind: Binding to
 port 0x2c90300032de1
 Sep 02 05:56:29 990146 [B53B8700] 0x02 - 

Re: Errors from ibchecknet

2010-09-02 Thread Chuck Hartley
BTW, I am able to communicate between nodes via 'ibping'.  That is the
only test program I found that will work without needing a host IP.



On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny wei...@llnl.gov wrote:
 On Thu, 2 Sep 2010 06:56:50 -0700
 Chuck Hartley hartlc...@gmail.com wrote:

 We swapped in a different switch and see the same errors. The opensm
 logfile does not show any errors:

 Could you run ibstat on the node with OpenSM running?

 And iblinkinfo on the same node?

 Send that output.

 Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Errors from ibchecknet

2010-09-02 Thread Ira Weiny
On Thu, 2 Sep 2010 11:11:13 -0700
Chuck Hartley hartlc...@gmail.com wrote:

 Sure, here is the output:
 Note this is with the switch we swapped in, so the port numbers don't
 match the ibchecknet output in the original message.
 
 # ibstat
 CA 'mlx4_0'
   CA type: MT26428
   Number of ports: 2
   Firmware version: 2.6.0
   Hardware version: a0
   Node GUID: 0x0002c90300032de0
   System image GUID: 0x0002c90300032de3
   Port 1:
   State: Active
   Physical state: LinkUp
   Rate: 40
   Base lid: 6
   LMC: 0
   SM lid: 6

Well the SM lid is set here.  Is it set on the other nodes?

I don't run ibchecknet usually but I am getting the same errors here on a
working fabric...

ibwarn: [13629] dump_perfcounters: PortXmitWait not indicated so ignore this 
counter
#warn: Lid is not configured lid 37 port 2
#warn: SM Lid is not configured
Port check lid 37 port 2:  FAILED 

Looking at this output I don't think this is an error.

13:17:14  smpquery nodeinfo 37
# Node info: Lid 37
BaseVers:1
ClassVers:...1
NodeType:Switch
NumPorts:24
...

On switch external Ports the Lid and SMLid are not used.

Hal, would you concur?

Chuck,
Is it just that IPoIB is not working for you?

Ira


   Capability mask: 0x0251086a
   Port GUID: 0x0002c90300032de1
   Port 2:
   State: Down
   Physical state: Polling
   Rate: 10
   Base lid: 0
   LMC: 0
   SM lid: 0
   Capability mask: 0x02510868
   Port GUID: 0x0002c90300032de2
 CA 'mthca0'
   CA type: MT25204
   Number of ports: 1
   Firmware version: 1.2.0
   Hardware version: a0
   Node GUID: 0x003048c64c0c
   System image GUID: 0x003048c64c0c0003
   Port 1:
   State: Down
   Physical state: Polling
   Rate: 10
   Base lid: 0
   LMC: 0
   SM lid: 0
   Capability mask: 0x02510a68
   Port GUID: 0x003048c64c0c0001
 
 # iblinkinfo
 Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies:
11[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==   5
 1[  ]  HCA-1 ( )
12[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==   6
 1[  ] linux70 HCA-1 ( )
13[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==   7
 1[  ] linux71 HCA-1 ( )
14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   10[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   12[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   20[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   21[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   22[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   23[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   24[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==   9
 1[  ]  HCA-1 ( )
1   25[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==   8
 1[  ]  HCA-1 ( )
1   26[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   27[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   28[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   29[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   30[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   31[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   32[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   33[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   34[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   35[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
1   36[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
 
 On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny wei...@llnl.gov wrote:
  On Thu, 2 Sep 2010 06:56:50 -0700
  Chuck Hartley 

Re: Errors from ibchecknet

2010-09-02 Thread Hal Rosenstock
On Thu, Sep 2, 2010 at 4:16 PM, Ira Weiny wei...@llnl.gov wrote:
 On Thu, 2 Sep 2010 11:11:13 -0700
 Chuck Hartley hartlc...@gmail.com wrote:

 Sure, here is the output:
 Note this is with the switch we swapped in, so the port numbers don't
 match the ibchecknet output in the original message.

 # ibstat
 CA 'mlx4_0'
       CA type: MT26428
       Number of ports: 2
       Firmware version: 2.6.0
       Hardware version: a0
       Node GUID: 0x0002c90300032de0
       System image GUID: 0x0002c90300032de3
       Port 1:
               State: Active
               Physical state: LinkUp
               Rate: 40
               Base lid: 6
               LMC: 0
               SM lid: 6

 Well the SM lid is set here.  Is it set on the other nodes?

 I don't run ibchecknet usually but I am getting the same errors here on a
 working fabric...

 ibwarn: [13629] dump_perfcounters: PortXmitWait not indicated so ignore this 
 counter
 #warn: Lid is not configured lid 37 port 2
 #warn: SM Lid is not configured
 Port check lid 37 port 2:  FAILED

 Looking at this output I don't think this is an error.

 13:17:14  smpquery nodeinfo 37
 # Node info: Lid 37
 BaseVers:1
 ClassVers:...1
 NodeType:Switch
 NumPorts:24
 ...

 On switch external Ports the Lid and SMLid are not used.

 Hal, would you concur?

Yes, on switch external ports, both LID and SMLID are not valid.

-- Hal


 Chuck,
 Is it just that IPoIB is not working for you?

 Ira


               Capability mask: 0x0251086a
               Port GUID: 0x0002c90300032de1
       Port 2:
               State: Down
               Physical state: Polling
               Rate: 10
               Base lid: 0
               LMC: 0
               SM lid: 0
               Capability mask: 0x02510868
               Port GUID: 0x0002c90300032de2
 CA 'mthca0'
       CA type: MT25204
       Number of ports: 1
       Firmware version: 1.2.0
       Hardware version: a0
       Node GUID: 0x003048c64c0c
       System image GUID: 0x003048c64c0c0003
       Port 1:
               State: Down
               Physical state: Polling
               Rate: 10
               Base lid: 0
               LMC: 0
               SM lid: 0
               Capability mask: 0x02510a68
               Port GUID: 0x003048c64c0c0001

 # iblinkinfo
 Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies:
            1    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==       5
 1[  ]  HCA-1 ( )
            1    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==       6
 1[  ] linux70 HCA-1 ( )
            1    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==       7
 1[  ] linux71 HCA-1 ( )
            1    4[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1    5[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1    6[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1    7[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1    8[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1    9[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   10[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   12[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   20[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   21[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   22[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   23[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   24[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==       9
 1[  ]  HCA-1 ( )
            1   25[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==       8
 1[  ]  HCA-1 ( )
            1   26[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   27[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   28[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   29[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   30[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   31[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   32[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   33[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   34[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   35[  ] ==( 4X 2.5 Gbps   Down/ Polling)==
 [  ]  ( )
            1   36[  ] ==( 4X 2.5 Gbps   Down/