On Mon, 6 Oct 2014 09:48:29 +0200 Martin Sandve Alnæs <[email protected]> wrote:
> The 'fix' that's in the branch now was to trigger python garbage > collection (suggested by Øyvind Evju) before each test. > > This probably means we have a general problem in dolfin with > non-deterministic destruction order of objects in parallel. Any > destructor that uses MPI represents a potential deadlock. To understand the issue, is the problem that garbage collection does not ensure when the object is destroyed which is the problem? Here http://stackoverflow.com/a/5071376/1796717 the distinction between variable scoping and object cleanup is discussed. Quoting it Deterministic cleanup happens through the with statement. which might be a proper solution to the problem. Jan > > On 19 September 2014 12:52, Jan Blechta <[email protected]> > wrote: > > > On Fri, 19 Sep 2014 00:27:50 +0200 > > Jan Blechta <[email protected]> wrote: > > > > > Yes, after many trials using > > > > > > $ cd test/unit/io/python > > > $ while true; do git clean -fdx && mpirun -n 3 xterm -e gdb -ex r > > > -ex q -args python -m pytest -sv; done > > > # when it hangs and you interrupt it, it asks for confirmation for > > > # quitting, so you type n and enjoy gdb... > > > > > > I've seen a situation when 2 processes deadlocked on > > > HDF5Interface::close_file() in DOLFIN with backtrace like > > > > > > # MPI barrier > > > ... > > > # MPI close > > > # HDF5 lib calls > > > H5FClose() > > > dolfin::HDF5Interface::close_file() > > > dolfin::HDF5File::close() > > > dolfin::HDF5File::~HDF5File() > > > dolfin::HDF5File::~HDF5File() > > > # smart ptr management > > > # garbage collection > > > > > > while 3rd process is waiting far away. Isn't it strange that > > > destructor is there twice in stacktrace? (The upper one is on '}' > > > line which I don't get.) What does it mean? > > > > Probably just code generation artifact - nothing harmful, see > > http://stackoverflow.com/a/15244091/1796717 > > > > Jan > > > > > > > > Jan > > > > > > > > > On Thu, 18 Sep 2014 16:20:51 +0200 > > > Martin Sandve Alnæs <[email protected]> wrote: > > > > > > > I've added the mpi fixes for temppath fixture and fixed > > > > some other related issues while at it: When parameterizing > > > > a test that uses a temppath fixture, there is a need for > > > > separate directories for each parameter combo. > > > > A further improvement would be automatic cleaning of old > > > > tempdirs, but I leave that for now. > > > > > > > > I've pushed these changes to the branch > > > > aslakbergersen/topic-change-unittest-to-pytest > > > > > > > > The tests still hang though, in the closing of HDF5File. > > > > > > > > Here's now to debug if someone wants to give it a shot: > > > > Just run: > > > > mpirun -np 3 python -m pytest -s -v > > > > With gdb: > > > > mpirun -np 3 xterm -e gdb --args python -m pytest > > > > then enter 'r' in each of the three xterms. > > > > > > > > You may have to try a couple of times to get the hanging > > > > behaviour. > > > > > > > > Martin > > > > > > > > On 18 September 2014 13:23, Martin Sandve Alnæs > > > > <[email protected]> wrote: > > > > > > > > > Good spotting both of you, thanks. > > > > > > > > > > Martin > > > > > > > > > > On 18 September 2014 13:01, Lawrence Mitchell < > > > > > [email protected]> wrote: > > > > > > > > > >> On 18/09/14 11:42, Jan Blechta wrote: > > > > >> > Some problems (when running in a clean dir) are avoided > > > > >> > using this (although incorrect) patch. There are race > > > > >> > conditions in creation of temp dir. It should be done > > > > >> > using atomic operation. > > > > >> > > > > > >> > Jan > > > > >> > > > > > >> > > > > > >> > ================================================================== > > > > >> > diff --git a/test/unit/io/python/test_XDMF.py > > > > >> > b/test/unit/io/python/test_XDMF.py index 9ad65a4..31471f1 > > > > >> > 100755 --- a/test/unit/io/python/test_XDMF.py > > > > >> > +++ b/test/unit/io/python/test_XDMF.py > > > > >> > @@ -28,8 +28,9 @@ def temppath(): > > > > >> > filedir = os.path.dirname(os.path.abspath(__file__)) > > > > >> > basename = os.path.basename(__file__).replace(".py", > > > > >> > "_data") temppath = os.path.join(filedir, basename, "") > > > > >> > - if not os.path.exists(temppath): > > > > >> > - os.mkdir(temppath) > > > > >> > + if MPI.rank(mpi_comm_world()) == 0: > > > > >> > + if not os.path.exists(temppath): > > > > >> > + os.mkdir(temppath) > > > > >> > return temppath > > > > >> > > > > >> There's still a race condition here because ranks other than > > > > >> zero might try and use temppath before it's created. I > > > > >> think you want something like the below: > > > > >> > > > > >> if MPI.rank(mpi_comm_world()) == 0: > > > > >> if not os.path.exists(temppath): > > > > >> os.mkdir(temppath) > > > > >> MPI.barrier(mpi_comm_world()) > > > > >> return temppath > > > > >> > > > > >> If you're worried about the OS not creating files > > > > >> atomically, you can always mkdir into a tmp directory and > > > > >> then os.rename(tmp, temppath), since posix guarantees that > > > > >> renames are atomic. > > > > >> > > > > >> Lawrence > > > > >> _______________________________________________ > > > > >> fenics mailing list > > > > >> [email protected] > > > > >> http://fenicsproject.org/mailman/listinfo/fenics > > > > >> > > > > > > > > > > > > > > > > _______________________________________________ > > > fenics mailing list > > > [email protected] > > > http://fenicsproject.org/mailman/listinfo/fenics > > > > _______________________________________________ fenics mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics
