Thank for advice how to trace the problem, Jed, but unfortunately I have
no time to dive into a such complicated debugging in recent future.
Maybe Garth or Johannes reported the same issue
http://osdir.com/ml/dolfin-differential-equations/2011-10/msg00199.html
one and half year ago.

Now I think that problem happens with OpenMPI 1.4.3 && (DOLFIN 1.0.0
&& PETSc 3.2-p7 && Hypre 2.8.0b) || (DOLFIN 1.2.0+ && PETSc 3.4.0 &&
Hypre Hypre 2.8.0b). With higher OpenMPI problem does not occurr as
evidenced by FEniCS buildbot.

Everybody could try reproducing the problem by

  $export DOLFIN_NOPLOT=1
  $mpirun -n 3 demo_navier-stokes

on DOLFIN compiled with PESTc backend and OpenMPI 1.4.3 which is
preinstalled on Ubuntu Precise I think.

It could also be considered if testing PETSc errors in DOLFIN should be
done to avoid similiar deadlocks. But I guess that current approach is
run by purpose from performance reasons.

Jan


On Thu, 30 May 2013 13:10:32 -0500
Jed Brown <[email protected]> wrote:
> Jan Blechta <[email protected]> writes:
> 
> > Regarding effort to reproduce it with PETSc directly, Jed, I was
> > able to dump this specific matrix to binary format but not vector,
> > so I need to obtain somehow binary vector - is somewhere
> > documentation of that binary format?
> 
> You can use bin/pythonscripts/PetscBinaryIO.py or the Matlab/Octave
> bin/matlab/PetscBinaryRead.m.
> 
> For a vector, you can use -ksp_view_rhs binary:::append (append to
> default 'binaryoutput') or -ksp_view_rhs binary:filename.
> 
> > I guess I would need to recompile PETSc in some debug mode to break
> > into Hypre, is it so? 
> 
> You'll need a hypre built with debugging symbols to get line numbers.
> An easy way to do that is to reconfigure PETSc using
> --with-debugging=1 (the default) and --download-hypre.  You should be
> able to LD_PRELOAD the debugging PETSc library with your dolfin
> application, so that you don't have to rebuild dolfin.
> 
> > This is backtrace from process printing PETSc ERROR:
> > __________________________________________________________________________
> > #0 0x00007ffff5caa2d8 in __GI___poll (fds=0x6d02c0, nfds=6,
> > timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83 #1
> > 0x00007fffed0c5ab0 in ?? () from /usr/lib/libopen-pal.so.0 #2
> > 0x00007fffed0c48ff in ?? () from /usr/lib/libopen-pal.so.0 #3
> > 0x00007fffed0b9221 in opal_progress ()
> > from /usr/lib/libopen-pal.so.0 #4 0x00007ffff1b593d5 in ?? ()
> > from /usr/lib/libmpi.so.0 #5 0x00007ffff1b8a1c5 in PMPI_Waitany ()
> > from /usr/lib/libmpi.so.0 #6 0x00007ffff2f5c43e in VecScatterEnd_1
> > () from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so #7
> > 0x00007ffff2f57811 in VecScatterEnd () from
> > /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so #8 0x00007ffff2f3cb9a
> > in VecGhostUpdateEnd () from
> > /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so #9 0x00007ffff74ecdea
> > in dolfin::Assembler::assemble (this=0x7fffffff9da0, A=..., a=...)
> > at /usr/users/blechta/fenics/fenics/src/dolfin/dolfin/fem/Assembler.cpp:96
> > #10 0x00007ffff74e8095 in dolfin::assemble (A=..., a=...)  at
> > /usr/users/blechta/fenics/fenics/src/dolfin/dolfin/fem/assemble.cpp:38
> > #11 0x0000000000425d41 in main () at
> > /usr/users/blechta/fenics/fenics/src/dolfin/demo/pde/navier-stokes/cpp/main.cpp:180
> 
> PETSc should have returned an error, but it looks like dolfin kept
> going.
> 
> Anyway, both of these traces are too late.  We have to find out why
> Hypre is returning an error.  They don't have an "error handler"
> system, so we can't automatically get a trace at the source location
> where the error was first raised.
> _______________________________________________
> fenics mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to