On 25 October 2011 11:18, Johannes Ring <[email protected]> wrote: > On Mon, Oct 24, 2011 at 10:14 PM, Garth N. Wells <[email protected]> wrote: >> On 24 October 2011 12:57, Johannes Ring <[email protected]> wrote: >>> This failure was expected and not Martin's fault.The problem is the >>> stokes-iterative C++ demo, which is problematic when run in parallel >>> (parallel testing has been turned off on this buildbot slave until >>> now). >>> >>> I have done some manual testing with two and up to five processes and >>> the demo fails only (but not always) when run with three or five >>> processes. Sometimes I get a segmentation violation: >>> >>> [1]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>> probably memory access out of range >>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [1]PETSC ERROR: or see >>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[1]PETSC >>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to >>> find memory corruption errors >>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and >>> run >>> [1]PETSC ERROR: to get more information on the crash. >>> [1]PETSC ERROR: --------------------- Error Message >>> ------------------------------------ >>> [1]PETSC ERROR: Signal received! >>> [1]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> >>> Other times I get this error: >>> >>> Warning -- row partitioning does not line up! Partitioning incomplete! >>> [2]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c >>> [2]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c >>> [2]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c >>> [2]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c >>> [2]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c >>> Process 2: Soln norm: 0 >>> >>> Any ideas? Is it a bug in DOLFIN? The demo works fine in parallel when >>> using Trilinos instead of PETSc. >>> >> >> There is a very nasty bug in the Oneiric OpenMPI. I had a frustrating >> week tracking this down. > > Is this a bug in OpenMPI 1.4.3 and is it reported somewhere?
No that I'm aware of. I tracked down an example of the bug in a SCOTCH call to MPI_Allgather which randomly returned an obviously wrong result. > It would > be good to fix this in Ubuntu (see below). > I'm not going to bother tracking it down in Ubuntu because I've identified an MPI bug in Ubuntu in the past and it was confirmed, but Ubuntu didn't bother releasing a fix. MPI is too specialised for them to care. Garth >> Installing OpenMPI 1.4.4 manually does the trick. > > I would like to avoid that on the buildbot if possible, because it > looses the value of having a Oneiric buildbot if most of the > dependencies are built from source. > > Also, this does not solve the problem for the DOLFIN packages in the > PPA as there is no chance I can build and maintain packages for > OpenMPI 1.4.4 and all its dependencies. > > Johannes > _______________________________________________ Mailing list: https://launchpad.net/~dolfin Post to : [email protected] Unsubscribe : https://launchpad.net/~dolfin More help : https://help.launchpad.net/ListHelp

