Yes, after many trials using $ cd test/unit/io/python $ while true; do git clean -fdx && mpirun -n 3 xterm -e gdb -ex r -ex q -args python -m pytest -sv; done # when it hangs and you interrupt it, it asks for confirmation for # quitting, so you type n and enjoy gdb...
I've seen a situation when 2 processes deadlocked on HDF5Interface::close_file() in DOLFIN with backtrace like # MPI barrier ... # MPI close # HDF5 lib calls H5FClose() dolfin::HDF5Interface::close_file() dolfin::HDF5File::close() dolfin::HDF5File::~HDF5File() dolfin::HDF5File::~HDF5File() # smart ptr management # garbage collection while 3rd process is waiting far away. Isn't it strange that destructor is there twice in stacktrace? (The upper one is on '}' line which I don't get.) What does it mean? Jan On Thu, 18 Sep 2014 16:20:51 +0200 Martin Sandve Alnæs <[email protected]> wrote: > I've added the mpi fixes for temppath fixture and fixed > some other related issues while at it: When parameterizing > a test that uses a temppath fixture, there is a need for > separate directories for each parameter combo. > A further improvement would be automatic cleaning of old tempdirs, > but I leave that for now. > > I've pushed these changes to the branch > aslakbergersen/topic-change-unittest-to-pytest > > The tests still hang though, in the closing of HDF5File. > > Here's now to debug if someone wants to give it a shot: > Just run: > mpirun -np 3 python -m pytest -s -v > With gdb: > mpirun -np 3 xterm -e gdb --args python -m pytest > then enter 'r' in each of the three xterms. > > You may have to try a couple of times to get the hanging behaviour. > > Martin > > On 18 September 2014 13:23, Martin Sandve Alnæs <[email protected]> > wrote: > > > Good spotting both of you, thanks. > > > > Martin > > > > On 18 September 2014 13:01, Lawrence Mitchell < > > [email protected]> wrote: > > > >> On 18/09/14 11:42, Jan Blechta wrote: > >> > Some problems (when running in a clean dir) are avoided using > >> > this (although incorrect) patch. There are race conditions in > >> > creation of temp dir. It should be done using atomic operation. > >> > > >> > Jan > >> > > >> > > >> > ================================================================== > >> > diff --git a/test/unit/io/python/test_XDMF.py > >> > b/test/unit/io/python/test_XDMF.py index 9ad65a4..31471f1 100755 > >> > --- a/test/unit/io/python/test_XDMF.py > >> > +++ b/test/unit/io/python/test_XDMF.py > >> > @@ -28,8 +28,9 @@ def temppath(): > >> > filedir = os.path.dirname(os.path.abspath(__file__)) > >> > basename = os.path.basename(__file__).replace(".py", > >> > "_data") temppath = os.path.join(filedir, basename, "") > >> > - if not os.path.exists(temppath): > >> > - os.mkdir(temppath) > >> > + if MPI.rank(mpi_comm_world()) == 0: > >> > + if not os.path.exists(temppath): > >> > + os.mkdir(temppath) > >> > return temppath > >> > >> There's still a race condition here because ranks other than zero > >> might try and use temppath before it's created. I think you want > >> something like the below: > >> > >> if MPI.rank(mpi_comm_world()) == 0: > >> if not os.path.exists(temppath): > >> os.mkdir(temppath) > >> MPI.barrier(mpi_comm_world()) > >> return temppath > >> > >> If you're worried about the OS not creating files atomically, you > >> can always mkdir into a tmp directory and then os.rename(tmp, > >> temppath), since posix guarantees that renames are atomic. > >> > >> Lawrence > >> _______________________________________________ > >> fenics mailing list > >> [email protected] > >> http://fenicsproject.org/mailman/listinfo/fenics > >> > > > > _______________________________________________ fenics mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics
