Yes, after many trials using

$ cd test/unit/io/python
$ while true; do git clean -fdx && mpirun -n 3 xterm -e gdb -ex r -ex q
-args python -m pytest -sv; done
# when it hangs and you interrupt it, it asks for confirmation for
# quitting, so you type n and enjoy gdb...

I've seen a situation when 2 processes deadlocked on
HDF5Interface::close_file() in DOLFIN with backtrace like

# MPI barrier
...
# MPI close
# HDF5 lib calls
H5FClose()
dolfin::HDF5Interface::close_file()
dolfin::HDF5File::close()
dolfin::HDF5File::~HDF5File()
dolfin::HDF5File::~HDF5File()
# smart ptr management
# garbage collection

while 3rd process is waiting far away. Isn't it strange that destructor
is there twice in stacktrace? (The upper one is on '}' line which I
don't get.) What does it mean?

Jan


On Thu, 18 Sep 2014 16:20:51 +0200
Martin Sandve Alnæs <[email protected]> wrote:

> I've added the mpi fixes for temppath fixture and fixed
> some other related issues while at it: When parameterizing
> a test that uses a temppath fixture, there is a need for
> separate directories for each parameter combo.
> A further improvement would be automatic cleaning of old tempdirs,
> but I leave that for now.
> 
> I've pushed these changes to the branch
> aslakbergersen/topic-change-unittest-to-pytest
> 
> The tests still hang though, in the closing of HDF5File.
> 
> Here's now to debug if someone wants to give it a shot:
> Just run:
>     mpirun -np 3 python -m pytest -s -v
> With gdb:
>     mpirun -np 3 xterm -e gdb --args python -m pytest
> then enter 'r' in each of the three xterms.
> 
> You may have to try a couple of times to get the hanging behaviour.
> 
> Martin
> 
> On 18 September 2014 13:23, Martin Sandve Alnæs <[email protected]>
> wrote:
> 
> > Good spotting both of you, thanks.
> >
> > Martin
> >
> > On 18 September 2014 13:01, Lawrence Mitchell <
> > [email protected]> wrote:
> >
> >> On 18/09/14 11:42, Jan Blechta wrote:
> >> > Some problems (when running in a clean dir) are avoided using
> >> > this (although incorrect) patch. There are race conditions in
> >> > creation of temp dir. It should be done using atomic operation.
> >> >
> >> > Jan
> >> >
> >> >
> >> > ==================================================================
> >> > diff --git a/test/unit/io/python/test_XDMF.py
> >> > b/test/unit/io/python/test_XDMF.py index 9ad65a4..31471f1 100755
> >> > --- a/test/unit/io/python/test_XDMF.py
> >> > +++ b/test/unit/io/python/test_XDMF.py
> >> > @@ -28,8 +28,9 @@ def temppath():
> >> >      filedir = os.path.dirname(os.path.abspath(__file__))
> >> >      basename = os.path.basename(__file__).replace(".py",
> >> > "_data") temppath = os.path.join(filedir, basename, "")
> >> > -    if not os.path.exists(temppath):
> >> > -        os.mkdir(temppath)
> >> > +    if MPI.rank(mpi_comm_world()) == 0:
> >> > +        if not os.path.exists(temppath):
> >> > +            os.mkdir(temppath)
> >> >      return temppath
> >>
> >> There's still a race condition here because ranks other than zero
> >> might try and use temppath before it's created.  I think you want
> >> something like the below:
> >>
> >> if MPI.rank(mpi_comm_world()) == 0:
> >>     if not os.path.exists(temppath):
> >>         os.mkdir(temppath)
> >> MPI.barrier(mpi_comm_world())
> >> return temppath
> >>
> >> If you're worried about the OS not creating files atomically, you
> >> can always mkdir into a tmp directory and then os.rename(tmp,
> >> temppath), since posix guarantees that renames are atomic.
> >>
> >> Lawrence
> >> _______________________________________________
> >> fenics mailing list
> >> [email protected]
> >> http://fenicsproject.org/mailman/listinfo/fenics
> >>
> >
> >

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to