Thank for advice how to trace the problem, Jed, but unfortunately I have no time to dive into a such complicated debugging in recent future. Maybe Garth or Johannes reported the same issue http://osdir.com/ml/dolfin-differential-equations/2011-10/msg00199.html one and half year ago.
Now I think that problem happens with OpenMPI 1.4.3 && (DOLFIN 1.0.0 && PETSc 3.2-p7 && Hypre 2.8.0b) || (DOLFIN 1.2.0+ && PETSc 3.4.0 && Hypre Hypre 2.8.0b). With higher OpenMPI problem does not occurr as evidenced by FEniCS buildbot. Everybody could try reproducing the problem by $export DOLFIN_NOPLOT=1 $mpirun -n 3 demo_navier-stokes on DOLFIN compiled with PESTc backend and OpenMPI 1.4.3 which is preinstalled on Ubuntu Precise I think. It could also be considered if testing PETSc errors in DOLFIN should be done to avoid similiar deadlocks. But I guess that current approach is run by purpose from performance reasons. Jan On Thu, 30 May 2013 13:10:32 -0500 Jed Brown <[email protected]> wrote: > Jan Blechta <[email protected]> writes: > > > Regarding effort to reproduce it with PETSc directly, Jed, I was > > able to dump this specific matrix to binary format but not vector, > > so I need to obtain somehow binary vector - is somewhere > > documentation of that binary format? > > You can use bin/pythonscripts/PetscBinaryIO.py or the Matlab/Octave > bin/matlab/PetscBinaryRead.m. > > For a vector, you can use -ksp_view_rhs binary:::append (append to > default 'binaryoutput') or -ksp_view_rhs binary:filename. > > > I guess I would need to recompile PETSc in some debug mode to break > > into Hypre, is it so? > > You'll need a hypre built with debugging symbols to get line numbers. > An easy way to do that is to reconfigure PETSc using > --with-debugging=1 (the default) and --download-hypre. You should be > able to LD_PRELOAD the debugging PETSc library with your dolfin > application, so that you don't have to rebuild dolfin. > > > This is backtrace from process printing PETSc ERROR: > > __________________________________________________________________________ > > #0 0x00007ffff5caa2d8 in __GI___poll (fds=0x6d02c0, nfds=6, > > timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83 #1 > > 0x00007fffed0c5ab0 in ?? () from /usr/lib/libopen-pal.so.0 #2 > > 0x00007fffed0c48ff in ?? () from /usr/lib/libopen-pal.so.0 #3 > > 0x00007fffed0b9221 in opal_progress () > > from /usr/lib/libopen-pal.so.0 #4 0x00007ffff1b593d5 in ?? () > > from /usr/lib/libmpi.so.0 #5 0x00007ffff1b8a1c5 in PMPI_Waitany () > > from /usr/lib/libmpi.so.0 #6 0x00007ffff2f5c43e in VecScatterEnd_1 > > () from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so #7 > > 0x00007ffff2f57811 in VecScatterEnd () from > > /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so #8 0x00007ffff2f3cb9a > > in VecGhostUpdateEnd () from > > /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so #9 0x00007ffff74ecdea > > in dolfin::Assembler::assemble (this=0x7fffffff9da0, A=..., a=...) > > at /usr/users/blechta/fenics/fenics/src/dolfin/dolfin/fem/Assembler.cpp:96 > > #10 0x00007ffff74e8095 in dolfin::assemble (A=..., a=...) at > > /usr/users/blechta/fenics/fenics/src/dolfin/dolfin/fem/assemble.cpp:38 > > #11 0x0000000000425d41 in main () at > > /usr/users/blechta/fenics/fenics/src/dolfin/demo/pde/navier-stokes/cpp/main.cpp:180 > > PETSc should have returned an error, but it looks like dolfin kept > going. > > Anyway, both of these traces are too late. We have to find out why > Hypre is returning an error. They don't have an "error handler" > system, so we can't automatically get a trace at the source location > where the error was first raised. > _______________________________________________ > fenics mailing list > [email protected] > http://fenicsproject.org/mailman/listinfo/fenics _______________________________________________ fenics mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics
