> I spent the last couple of days retracing my steps.  In my haste, I
 > listed the wrong HCA firmware revision.  It was  firmware 1.2.940 that
 > caused the system to crash while booting to Linux.  I have the mthca
 > driver built into the kernel; it is not a loadable driver.  The system
 > boots fine with the 1.2.0 firmware.

Oh, it's mthca firmware version dependent?  That's a big clue: you're
using mem-free firmware, which means the HCA uses system memory to store
big chunks of internal state.  If something is going wrong with how the
memory is mapped to the HCA (or how the HCA writes to it) then that
could cause memory corruption -- possibly tied to posting receives to
the hardware as part of the MAD initialization.

So it could be a driver bug exposed by the new firmware, or a firmware bug.

Is Mellanox following this bug?  Maybe they have some idea of how to
figure out what the HCA is doing that could crash a system.

 - R.
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to