Helen> It doesn't seem like shrinking the TCP window had helped. Helen> I captured the Dmesg log from Lustre server and associated Helen> client reporting IOZONE error.
What is the state of the system after you start seeing the ib0 transmit time out messages? Does IPoIB work at all? Is the HCA responsive at all -- for example what do you see if you do cat /sys/class/infiniband/mthca0/ports/1/state or cat /sys/class/infiniband/mthca0/ports/1/counters/* Helen> BTW, this problem is a moving target so it is hard to Helen> believe that it is hardware related(?) BTW, I am using the Helen> mellanox DDR switch and HCA. Not sure what you mean by a moving target... the symptoms really look like a crash of the HCA firmware to me. Thanks, Roland _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general