What server model and CPU model do you have?
 
This could be https://bugs.openfabrics.org//show_bug.cgi?id=229.  Try
setting RENICE_IB_MAD=yes in /etc/infiniband/openibd.conf, then reboot
or run /etc/init.d/openibd restart, and see if that helps.
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 


________________________________

        From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of SEGERS Koen
        Sent: Tuesday, May 22, 2007 6:44 AM
        To: Ami Perlmutter; Shirley Ma
        Cc: [EMAIL PROTECTED];
[email protected]
        Subject: RE: [ofa-general] GPFS node loses IB-connection
        
        

        I did the iperf tests on servers with OFED-1.2-RC3.

         

        It also gives the same result. Actually, it is even worse: when
the interface dies, it gets in PORT_INIT state, but it doesn't go to
PORT_ACTIVE again. At least not within 10 minutes.

         

        I'll give you the test script I ran:

         

        ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 5001 &

        ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 5002 &

        ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 5003 &

        ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 6001 &

        ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 6002 &

        ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 6003 &

        ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 7001 &

        ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 7002 &

        ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 7003 &

        ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 8001 &

        ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 8002 &

        ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 8003 &

         

        sleep 5

         

        for i in 14 15 16 17

        do

                ssh 10.224.158.111 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK
iperf -c 192.168.2.$i -p $((i-9))001 -t 120 -d -P 5 &

                ssh 10.224.158.112 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK
iperf -c 192.168.2.$i -p $((i-9))002 -t 120 -d -P 5 &

                ssh 10.224.158.113 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK
iperf -c 192.168.2.$i -p $((i-9))003 -t 120 -d -P 5 &

        done

         

        Any ideas?

         

        Regards,

         

        Koen

        
________________________________


        Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens SEGERS Koen
        Verzonden: dinsdag 22 mei 2007 10:55
        Aan: Ami Perlmutter; Shirley Ma
        CC: [EMAIL PROTECTED];
[email protected]
        Onderwerp: RE: [ofa-general] GPFS node loses IB-connection

         

        GPFS keeps its connection constantly open.

         

        We did some more tests with iperf:

        If we don't run bidirectional tests, all connections keeps
running smoothly. If we add bidirectional tests, it becomes unstable.
Certainly if this is done on multiple nodes. Is this normal?

         

        The failed iperf tests give the same error in the switch log:

        May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO: Generate
SM OUT_OF_SERVICE trap for
GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:08:a8:71

        May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO: Generate
SM DELETE_MC_GROUP trap for
GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:08:a8:71

        May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by discovering removed ports

        May 22 08:15:00 topspin-120sc ib_sm.x[621]: %IB-6-INFO: Program
switch port state to down, node=00:05:ad:00:00:0b:a2:cc, port= 6, due to
non-responding CA

        May 22 08:15:00 topspin-120sc port_mgr.x[497]: %PORT-6-INFO:
port down - port=1/6, type=ib4xTXP

        May 22 08:15:00 topspin-120sc diag_mgr.x[508]: %DIAG-6-INFO: in
portTblFindEntry() - IfIndex=70(1/6)

        May 22 08:15:00 topspin-120sc diag_mgr.x[508]: %DIAG-6-INFO:
cannot find entry - IfIndex=70(1/6)

        May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by discovering new ports

        May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by multicast membership change

        May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO: Generate
SM IN_SERVICE trap for
GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:08:a8:71

        May 22 08:15:05 topspin-120sc port_mgr.x[497]: %PORT-6-INFO:
port up - port=1/6, type=ib4xTXP

        May 22 08:15:07 topspin-120sc ib_sm.x[632]: %IB-6-INFO: Generate
SM CREATE_MC_GROUP trap for
GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:08:a8:71

        May 22 08:15:08 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by multicast membership change

         

        RC3 is just installed. Results will follow soon.

         

        Regards,

         

        Koen

         

        
________________________________


        Van: Ami Perlmutter [mailto:[EMAIL PROTECTED] 
        Verzonden: dinsdag 22 mei 2007 10:33
        Aan: Shirley Ma
        CC: SEGERS Koen; [EMAIL PROTECTED];
[email protected]
        Onderwerp: Re: [ofa-general] GPFS node loses IB-connection

         

        does the application constantly open and close connections? 

        *** Disclaimer ***
        
        Vlaamse Radio- en Televisieomroep
        Auguste Reyerslaan 52, 1043 Brussel
        
        nv van publiek recht
        BTW BE 0244.142.664
        RPR Brussel
        http://www.vrt.be/disclaimer

        *** Disclaimer ***
        
        Vlaamse Radio- en Televisieomroep
        Auguste Reyerslaan 52, 1043 Brussel
        
        nv van publiek recht
        BTW BE 0244.142.664
        RPR Brussel
        http://www.vrt.be/disclaimer
        
        

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to