On Wed, Dec 05, 2007 at 02:45:17PM -0500, Tim Mattox wrote:
> Hello,
> It appears that sometime after r16777, and by r16799, that something
> was broken on the trunk's openib support for 32-bit builds.
> The 64-bit tests all seem normal, as well as the 32-bit & 64-bit tests on
> the 1.2 branch on the same machine (odin).
> 
> See this MTT results page permalink showing the 32-bit odin runs:
> http://www.open-mpi.org/mtt/index.php?do_redir=468
> 
> Pasha & Gleb, you both did a variety of checkins in that svn r# range.
> Do either of you have time to investigate this?
> 
> Here is a snippet from one randomly picked failed test (out of thousands):
> [1,1][btl_openib_component.c:1665:btl_openib_module_progress] from
> odin001 to: odin001 error
> polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 for
> wr_id 141733120 opcode 128
> qp_idx 3
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 29761 on
> node odin001 calling "abort". This will have caused other processes
> in the application to be terminated by signals sent by mpirun
> (as reported here).
> --------------------------------------------------------------------------
> 
> Thanks, and happy bug hunting!
I know where the problem is. Will fix this week.
--
                        Gleb.

Reply via email to