Helen> It doesn't seem like shrinking the TCP window had helped.
    Helen> I captured the Dmesg log from Lustre server and associated
    Helen> client reporting IOZONE error.

What is the state of the system after you start seeing the ib0
transmit time out messages?  Does IPoIB work at all?  Is the HCA
responsive at all -- for example what do you see if you do

  cat /sys/class/infiniband/mthca0/ports/1/state

or

  cat /sys/class/infiniband/mthca0/ports/1/counters/*

    Helen> BTW, this problem is a moving target so it is hard to
    Helen> believe that it is hardware related(?)  BTW, I am using the
    Helen> mellanox DDR switch and HCA.

Not sure what you mean by a moving target... the symptoms really look
like a crash of the HCA firmware to me.

Thanks,
  Roland
_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to