MPI initialization has nothing to do with the test problems. The problem is the destructors of objects. It is temporarily solved by calling gc collect in pytest fixtures.
I think we should implement the with statement pattern for all file types in dolfin to allow scope management. If vectors _do_ call mpi in destructors that's a problem for nontrivial dolfin python programs. 6. okt. 2014 13:53 skrev "Garth N. Wells" <[email protected]> følgende: > > > On Mon, 6 Oct, 2014 at 12:32 PM, Jan Blechta <[email protected]> > wrote: > >> On Mon, 06 Oct 2014 11:53:58 +0100 >> "Garth N. Wells" <[email protected]> wrote: >> >> On Mon, 6 Oct, 2014 at 11:23 AM, Martin Sandve Alnæs >>> <[email protected]> wrote: >>> > All collective destructors must be managed explicitly in python, >>> > preferably via with statement. Are there any apart from file >>> > objects? Vectors? Matrices? Meshes? >>> > >>> >>> Off the top of my head I can't think of any cases, apart from IO, in >>> which a (collective) MPI call needs to be made inside a destructor. >>> For IO, we could insist on a user closing or flushing a file >>> explicitly. We cannot guarantee that 3rd party linear algebra >>> backends do not call MPI when objects are destroyed. >>> >> >> VecDestroy and MatDestroy (called by PESTcVector and PETScBaseMatrix >> destructors) are claimed to be collective by PETSc doc: >> http://www.mcs.anl.gov/petsc/petsc-current/docs/ >> manualpages/Vec/VecDestroy.html >> http://www.mcs.anl.gov/petsc/petsc-current/docs/ >> manualpages/Mat/MatDestroy.html >> > > Yes, they are collective but don't necessarily make MPI calls. My > understanding is that 'collective' is not the issue but whether or not MPI > calls are made from a destructor. Some functions will only make sense if > called collectively (e.g., VecDestroy), but might not make collective MPI > calls. > > For the tests, assuming PyTest permits a 'setup' function like unittest, > would it help if MPI is initialised explicitly in the setup function and > closed down at the end of a test suite (if possible with PyTest)? > > Garth > > > >> Jan >> >> >>> We have had this problem in the past with the 'automatic' >>> finalisation of MPI, which is a problem if MPI is shutdown before >>> PETSc. >>> >>> Garth >>> >>> >>> > 6. okt. 2014 12:18 skrev "Jan Blechta" <[email protected]> >>> > følgende: >>> >> On Mon, 6 Oct 2014 12:07:02 +0200 >>> >> Martin Sandve Alnæs <[email protected]> wrote: >>> >> >>> >> > The problem is that gc is nondeterministic and in particular not >>> >> > running with equal timing and ordering on each mpi process. >>> >> > >>> >> > We can't use the with statement to handle the scope of every >>> >> > single dolfin object in a program. >>> >> >>> >> Most of the DOLFIN destructors are not collective. So the moral is >>> >> that >>> >> we should avoid collective destructors as possible and document it >>> >> like >>> >> it is in PETSc doc. >>> >> >>> >> Jan >>> >> >>> >> > >>> >> > We can change all file handling to use with, and require the >>> >> > user >>> >> to >>> >> > use that in parallel. >>> >> > 6. okt. 2014 11:41 skrev "Jan Blechta" >>> >> <[email protected]> >>> >> > følgende: >>> >> > >>> >> > > On Mon, 6 Oct 2014 09:48:29 +0200 >>> >> > > Martin Sandve Alnæs <[email protected]> wrote: >>> >> > > >>> >> > > > The 'fix' that's in the branch now was to trigger python >>> >> garbage >>> >> > > > collection (suggested by Øyvind Evju) before each test. >>> >> > > > >>> >> > > > This probably means we have a general problem in dolfin with >>> >> > > > non-deterministic destruction order of objects in parallel. >>> >> > > > Any destructor that uses MPI represents a potential deadlock. >>> >> > > >>> >> > > To understand the issue, is the problem that garbage >>> >> > > collection >>> >> does >>> >> > > not ensure when the object is destroyed which is the problem? >>> >> > > >>> >> > > Here http://stackoverflow.com/a/5071376/1796717 the distinction >>> >> > > between variable scoping and object cleanup is discussed. >>> >> Quoting it >>> >> > > >>> >> > > Deterministic cleanup happens through the with statement. >>> >> > > >>> >> > > which might be a proper solution to the problem. >>> >> > > >>> >> > > Jan >>> >> > > >>> >> > > > >>> >> > > > On 19 September 2014 12:52, Jan Blechta >>> >> > > > <[email protected]> wrote: >>> >> > > > >>> >> > > > > On Fri, 19 Sep 2014 00:27:50 +0200 >>> >> > > > > Jan Blechta <[email protected]> wrote: >>> >> > > > > >>> >> > > > > > Yes, after many trials using >>> >> > > > > > >>> >> > > > > > $ cd test/unit/io/python >>> >> > > > > > $ while true; do git clean -fdx && mpirun -n 3 xterm -e >>> >> > > > > > gdb -ex r -ex q -args python -m pytest -sv; done >>> >> > > > > > # when it hangs and you interrupt it, it asks for >>> >> > > > > > confirmation for # quitting, so you type n and enjoy >>> >> > > > > > gdb... >>> >> > > > > > >>> >> > > > > > I've seen a situation when 2 processes deadlocked on >>> >> > > > > > HDF5Interface::close_file() in DOLFIN with backtrace like >>> >> > > > > > >>> >> > > > > > # MPI barrier >>> >> > > > > > ... >>> >> > > > > > # MPI close >>> >> > > > > > # HDF5 lib calls >>> >> > > > > > H5FClose() >>> >> > > > > > dolfin::HDF5Interface::close_file() >>> >> > > > > > dolfin::HDF5File::close() >>> >> > > > > > dolfin::HDF5File::~HDF5File() >>> >> > > > > > dolfin::HDF5File::~HDF5File() >>> >> > > > > > # smart ptr management >>> >> > > > > > # garbage collection >>> >> > > > > > >>> >> > > > > > while 3rd process is waiting far away. Isn't it strange >>> >> that >>> >> > > > > > destructor is there twice in stacktrace? (The upper one >>> >> > > > > > is >>> >> on >>> >> > > > > > '}' line which I don't get.) What does it mean? >>> >> > > > > >>> >> > > > > Probably just code generation artifact - nothing harmful, >>> >> > > > > see http://stackoverflow.com/a/15244091/1796717 >>> >> > > > > >>> >> > > > > Jan >>> >> > > > > >>> >> > > > > > >>> >> > > > > > Jan >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > On Thu, 18 Sep 2014 16:20:51 +0200 >>> >> > > > > > Martin Sandve Alnæs <[email protected]> wrote: >>> >> > > > > > >>> >> > > > > > > I've added the mpi fixes for temppath fixture and fixed >>> >> > > > > > > some other related issues while at it: When >>> >> parameterizing >>> >> > > > > > > a test that uses a temppath fixture, there is a need >>> >> > > > > > > for separate directories for each parameter combo. >>> >> > > > > > > A further improvement would be automatic cleaning of >>> >> > > > > > > old tempdirs, but I leave that for now. >>> >> > > > > > > >>> >> > > > > > > I've pushed these changes to the branch >>> >> > > > > > > aslakbergersen/topic-change-unittest-to-pytest >>> >> > > > > > > >>> >> > > > > > > The tests still hang though, in the closing of >>> >> > > > > > > HDF5File. >>> >> > > > > > > >>> >> > > > > > > Here's now to debug if someone wants to give it a shot: >>> >> > > > > > > Just run: >>> >> > > > > > > mpirun -np 3 python -m pytest -s -v >>> >> > > > > > > With gdb: >>> >> > > > > > > mpirun -np 3 xterm -e gdb --args python -m pytest >>> >> > > > > > > then enter 'r' in each of the three xterms. >>> >> > > > > > > >>> >> > > > > > > You may have to try a couple of times to get the >>> >> > > > > > > hanging behaviour. >>> >> > > > > > > >>> >> > > > > > > Martin >>> >> > > > > > > >>> >> > > > > > > On 18 September 2014 13:23, Martin Sandve Alnæs >>> >> > > > > > > <[email protected]> wrote: >>> >> > > > > > > >>> >> > > > > > > > Good spotting both of you, thanks. >>> >> > > > > > > > >>> >> > > > > > > > Martin >>> >> > > > > > > > >>> >> > > > > > > > On 18 September 2014 13:01, Lawrence Mitchell < >>> >> > > > > > > > [email protected]> wrote: >>> >> > > > > > > > >>> >> > > > > > > >> On 18/09/14 11:42, Jan Blechta wrote: >>> >> > > > > > > >> > Some problems (when running in a clean dir) are >>> >> avoided >>> >> > > > > > > >> > using this (although incorrect) patch. There are >>> >> race >>> >> > > > > > > >> > conditions in creation of temp dir. It should be >>> >> done >>> >> > > > > > > >> > using atomic operation. >>> >> > > > > > > >> > >>> >> > > > > > > >> > Jan >>> >> > > > > > > >> > >>> >> > > > > > > >> > >>> >> > > > > > > >> > >>> >> > > >>> >> ================================================================== >>> >> > > > > > > >> > diff --git a/test/unit/io/python/test_XDMF.py >>> >> > > > > > > >> > b/test/unit/io/python/test_XDMF.py index >>> >> > > > > > > >> > 9ad65a4..31471f1 100755 --- >>> >> > > > > > > >> > a/test/unit/io/python/test_XDMF.py +++ >>> >> > > > > > > >> > b/test/unit/io/python/test_XDMF.py @@ -28,8 +28,9 >>> >> > > > > > > >> > @@ def temppath(): filedir = >>> >> > > > > > > >> > os.path.dirname(os.path.abspath(__file__)) >>> >> > > > > > > >> > basename >>> >> = >>> >> > > > > > > >> > os.path.basename(__file__).replace(".py", "_data") >>> >> > > > > > > >> > temppath = os.path.join(filedir, basename, "") >>> >> > > > > > > >> > - if not os.path.exists(temppath): >>> >> > > > > > > >> > - os.mkdir(temppath) >>> >> > > > > > > >> > + if MPI.rank(mpi_comm_world()) == 0: >>> >> > > > > > > >> > + if not os.path.exists(temppath): >>> >> > > > > > > >> > + os.mkdir(temppath) >>> >> > > > > > > >> > return temppath >>> >> > > > > > > >> >>> >> > > > > > > >> There's still a race condition here because ranks >>> >> other >>> >> > > > > > > >> than zero might try and use temppath before it's >>> >> > > > > > > >> created. I think you want something like the below: >>> >> > > > > > > >> >>> >> > > > > > > >> if MPI.rank(mpi_comm_world()) == 0: >>> >> > > > > > > >> if not os.path.exists(temppath): >>> >> > > > > > > >> os.mkdir(temppath) >>> >> > > > > > > >> MPI.barrier(mpi_comm_world()) >>> >> > > > > > > >> return temppath >>> >> > > > > > > >> >>> >> > > > > > > >> If you're worried about the OS not creating files >>> >> > > > > > > >> atomically, you can always mkdir into a tmp >>> >> > > > > > > >> directory >>> >> and >>> >> > > > > > > >> then os.rename(tmp, temppath), since posix >>> >> > > > > > > >> guarantees that renames are atomic. >>> >> > > > > > > >> >>> >> > > > > > > >> Lawrence >>> >> > > > > > > >> _______________________________________________ >>> >> > > > > > > >> fenics mailing list >>> >> > > > > > > >> [email protected] >>> >> > > > > > > >> http://fenicsproject.org/mailman/listinfo/fenics >>> >> > > > > > > >> >>> >> > > > > > > > >>> >> > > > > > > > >>> >> > > > > > >>> >> > > > > > _______________________________________________ >>> >> > > > > > fenics mailing list >>> >> > > > > > [email protected] >>> >> > > > > > http://fenicsproject.org/mailman/listinfo/fenics >>> >> > > > > >>> >> > > > > >>> >> > > >>> >> > > >>> >> >>> >> >> >
_______________________________________________ fenics mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics
