On Mon, 6 Oct, 2014 at 1:10 PM, Jan Blechta <[email protected]> wrote:
On Mon, 06 Oct 2014 12:53:00 +0100
"Garth N. Wells" <[email protected]> wrote:



 On Mon, 6 Oct, 2014 at 12:32 PM, Jan Blechta
 <[email protected]> wrote:
 > On Mon, 06 Oct 2014 11:53:58 +0100
 > "Garth N. Wells" <[email protected]> wrote:
 >
 >>  On Mon, 6 Oct, 2014 at 11:23 AM, Martin Sandve Alnæs
 >>  <[email protected]> wrote:
>> > All collective destructors must be managed explicitly in python,
 >>  > preferably via with statement. Are there any apart from file
 >>  > objects? Vectors? Matrices? Meshes?
 >>  >
 >>
>> Off the top of my head I can't think of any cases, apart from IO,
 >> in which a (collective) MPI call needs to be made inside a
 >> destructor. For IO, we could insist on a user closing or flushing
 >> a file explicitly. We cannot guarantee that 3rd party linear
 >> algebra backends do not call MPI when objects are destroyed.
 >
> VecDestroy and MatDestroy (called by PESTcVector and PETScBaseMatrix
 > destructors) are claimed to be collective by PETSc doc:
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecDestroy.html > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatDestroy.html

 Yes, they are collective but don't necessarily make MPI calls. My
 understanding is that 'collective' is not the issue but whether or
 not MPI calls are made from a destructor. Some functions will only
 make sense  if called collectively (e.g., VecDestroy), but might not
 make collective MPI calls.

In an ususal sense, collectivity imlies a communication, see
http://en.wikipedia.org/wiki/Collective_operation.

I think that, collective PESTc routine employs MPI collective operation
(in the case that MPI implementation of Vec/Mat is used). What you are
speaking about is "logically collective" routine, e.g.
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetOption.html

Ok, I see that PETSc makes the distinction between "collective" and "logically collective", and in the PETSc docs Mat/VecDestroy are marked as "collective".

Garth


Can you confirm, Jed, whether we are correct?

Jan


 For the tests, assuming PyTest permits a 'setup' function like
unittest, would it help if MPI is initialised explicitly in the setup function and closed down at the end of a test suite (if possible with
 PyTest)?

 Garth


 >
 > Jan
 >
 >>
 >>  We have had this problem in the past with the 'automatic'
>> finalisation of MPI, which is a problem if MPI is shutdown before
 >>  PETSc.
 >>
 >>  Garth
 >>
 >>
 >>  > 6. okt. 2014 12:18 skrev "Jan Blechta"
 >> <[email protected]>
 >>  > følgende:
 >>  >> On Mon, 6 Oct 2014 12:07:02 +0200
 >>  >> Martin Sandve Alnæs <[email protected]> wrote:
 >>  >>
>> >> > The problem is that gc is nondeterministic and in particular
 >> not
 >>  >> > running with equal timing and ordering on each mpi process.
 >>  >> >
>> >> > We can't use the with statement to handle the scope of every
 >>  >> > single dolfin object in a program.
 >>  >>
 >>  >> Most of the DOLFIN destructors are not collective. So the
 >>  >> moral
 >> is
 >>  >> that
 >>  >> we should avoid collective destructors as possible and
 >>  >> document
 >> it
 >>  >> like
 >>  >> it is in PETSc doc.
 >>  >>
 >>  >> Jan
 >>  >>
 >>  >> >
>> >> > We can change all file handling to use with, and require the
 >>  >> > user
 >>  >> to
 >>  >> > use that in parallel.
 >>  >> >  6. okt. 2014 11:41 skrev "Jan Blechta"
 >>  >> <[email protected]>
 >>  >> > følgende:
 >>  >> >
 >>  >> > > On Mon, 6 Oct 2014 09:48:29 +0200
 >>  >> > > Martin Sandve Alnæs <[email protected]> wrote:
 >>  >> > >
>> >> > > > The 'fix' that's in the branch now was to trigger python
 >>  >> garbage
>> >> > > > collection (suggested by Øyvind Evju) before each test.
 >>  >> > > >
 >>  >> > > > This probably means we have a general problem in dolfin
 >> with
 >>  >> > > > non-deterministic destruction order of objects in
 >>  >> > > > parallel. Any destructor that uses MPI represents a
 >>  >> > > > potential
 >> deadlock.
 >>  >> > >
 >>  >> > > To understand the issue, is the problem that garbage
 >>  >> > > collection
 >>  >> does
 >>  >> > > not ensure when the object is destroyed which is the
 >>  >> > > problem?
 >>  >> > >
 >>  >> > > Here http://stackoverflow.com/a/5071376/1796717 the
 >> distinction
 >>  >> > > between variable scoping and object cleanup is discussed.
 >>  >> Quoting it
 >>  >> > >
>> >> > > Deterministic cleanup happens through the with statement.
 >>  >> > >
 >>  >> > > which might be a proper solution to the problem.
 >>  >> > >
 >>  >> > > Jan
 >>  >> > >
 >>  >> > > >
 >>  >> > > > On 19 September 2014 12:52, Jan Blechta
 >>  >> > > > <[email protected]> wrote:
 >>  >> > > >
 >>  >> > > > > On Fri, 19 Sep 2014 00:27:50 +0200
 >>  >> > > > > Jan Blechta <[email protected]> wrote:
 >>  >> > > > >
 >>  >> > > > > > Yes, after many trials using
 >>  >> > > > > >
 >>  >> > > > > > $ cd test/unit/io/python
>> >> > > > > > $ while true; do git clean -fdx && mpirun -n 3 xterm
 >>  >> > > > > > -e gdb -ex r -ex q -args python -m pytest -sv; done
 >>  >> > > > > > # when it hangs and you interrupt it, it asks for
>> >> > > > > > confirmation for # quitting, so you type n and enjoy
 >>  >> > > > > > gdb...
 >>  >> > > > > >
>> >> > > > > > I've seen a situation when 2 processes deadlocked on >> >> > > > > > HDF5Interface::close_file() in DOLFIN with backtrace
 >> like
 >>  >> > > > > >
 >>  >> > > > > > # MPI barrier
 >>  >> > > > > > ...
 >>  >> > > > > > # MPI close
 >>  >> > > > > > # HDF5 lib calls
 >>  >> > > > > > H5FClose()
 >>  >> > > > > > dolfin::HDF5Interface::close_file()
 >>  >> > > > > > dolfin::HDF5File::close()
 >>  >> > > > > > dolfin::HDF5File::~HDF5File()
 >>  >> > > > > > dolfin::HDF5File::~HDF5File()
 >>  >> > > > > > # smart ptr management
 >>  >> > > > > > # garbage collection
 >>  >> > > > > >
 >>  >> > > > > > while 3rd process is waiting far away. Isn't it
 >>  >> > > > > > strange
 >>  >> that
 >>  >> > > > > > destructor is there twice in stacktrace? (The upper
 >>  >> > > > > > one is
 >>  >> on
 >>  >> > > > > > '}' line which I don't get.) What does it mean?
 >>  >> > > > >
 >>  >> > > > > Probably just code generation artifact - nothing
 >>  >> > > > > harmful, see
 >>  >> > > > > http://stackoverflow.com/a/15244091/1796717
 >>  >> > > > >
 >>  >> > > > > Jan
 >>  >> > > > >
 >>  >> > > > > >
 >>  >> > > > > > Jan
 >>  >> > > > > >
 >>  >> > > > > >
 >>  >> > > > > > On Thu, 18 Sep 2014 16:20:51 +0200
 >>  >> > > > > > Martin Sandve Alnæs <[email protected]> wrote:
 >>  >> > > > > >
 >>  >> > > > > > > I've added the mpi fixes for temppath fixture and
 >> fixed
 >>  >> > > > > > > some other related issues while at it: When
 >>  >> parameterizing
 >>  >> > > > > > > a test that uses a temppath fixture, there is a
 >>  >> > > > > > > need for separate directories for each parameter
 >>  >> > > > > > > combo. A further improvement would be automatic
>> >> > > > > > > cleaning of old tempdirs, but I leave that for now.
 >>  >> > > > > > >
 >>  >> > > > > > > I've pushed these changes to the branch
 >>  >> > > > > > > aslakbergersen/topic-change-unittest-to-pytest
 >>  >> > > > > > >
 >>  >> > > > > > > The tests still hang though, in the closing of
 >>  >> > > > > > > HDF5File.
 >>  >> > > > > > >
 >>  >> > > > > > > Here's now to debug if someone wants to give it a
 >> shot:
 >>  >> > > > > > > Just run:
 >>  >> > > > > > >     mpirun -np 3 python -m pytest -s -v
 >>  >> > > > > > > With gdb:
 >>  >> > > > > > >     mpirun -np 3 xterm -e gdb --args python -m
>> >> > > > > > > pytest then enter 'r' in each of the three xterms.
 >>  >> > > > > > >
 >>  >> > > > > > > You may have to try a couple of times to get the
 >>  >> > > > > > > hanging behaviour.
 >>  >> > > > > > >
 >>  >> > > > > > > Martin
 >>  >> > > > > > >
 >>  >> > > > > > > On 18 September 2014 13:23, Martin Sandve Alnæs
 >>  >> > > > > > > <[email protected]> wrote:
 >>  >> > > > > > >
 >>  >> > > > > > > > Good spotting both of you, thanks.
 >>  >> > > > > > > >
 >>  >> > > > > > > > Martin
 >>  >> > > > > > > >
 >>  >> > > > > > > > On 18 September 2014 13:01, Lawrence Mitchell <
 >>  >> > > > > > > > [email protected]> wrote:
 >>  >> > > > > > > >
 >>  >> > > > > > > >> On 18/09/14 11:42, Jan Blechta wrote:
 >>  >> > > > > > > >> > Some problems (when running in a clean dir)
 >>  >> > > > > > > >> > are
 >>  >> avoided
 >>  >> > > > > > > >> > using this (although incorrect) patch. There
 >>  >> > > > > > > >> > are
 >>  >> race
>> >> > > > > > > >> > conditions in creation of temp dir. It should
 >>  >> > > > > > > >> > be
 >>  >> done
 >>  >> > > > > > > >> > using atomic operation.
 >>  >> > > > > > > >> >
 >>  >> > > > > > > >> > Jan
 >>  >> > > > > > > >> >
 >>  >> > > > > > > >> >
 >>  >> > > > > > > >> >
 >>  >> > >
 >>  >>
>> ================================================================== >> >> > > > > > > >> > diff --git a/test/unit/io/python/test_XDMF.py
 >>  >> > > > > > > >> > b/test/unit/io/python/test_XDMF.py index
 >>  >> > > > > > > >> > 9ad65a4..31471f1 100755 ---
 >>  >> > > > > > > >> > a/test/unit/io/python/test_XDMF.py +++
 >>  >> > > > > > > >> > b/test/unit/io/python/test_XDMF.py @@ -28,8
 >> +28,9
 >>  >> > > > > > > >> > @@ def temppath(): filedir =
 >>  >> > > > > > > >> > os.path.dirname(os.path.abspath(__file__))
 >>  >> > > > > > > >> > basename
 >>  >> =
 >>  >> > > > > > > >> > os.path.basename(__file__).replace(".py",
 >> "_data")
>> >> > > > > > > >> > temppath = os.path.join(filedir, basename, "")
 >>  >> > > > > > > >> > -    if not os.path.exists(temppath):
 >>  >> > > > > > > >> > -        os.mkdir(temppath)
 >>  >> > > > > > > >> > +    if MPI.rank(mpi_comm_world()) == 0:
 >>  >> > > > > > > >> > +        if not os.path.exists(temppath):
 >>  >> > > > > > > >> > +            os.mkdir(temppath)
 >>  >> > > > > > > >> >      return temppath
 >>  >> > > > > > > >>
 >>  >> > > > > > > >> There's still a race condition here because
 >>  >> > > > > > > >> ranks
 >>  >> other
>> >> > > > > > > >> than zero might try and use temppath before it's
 >>  >> > > > > > > >> created.  I think you want something like the
 >> below:
 >>  >> > > > > > > >>
 >>  >> > > > > > > >> if MPI.rank(mpi_comm_world()) == 0:
 >>  >> > > > > > > >>     if not os.path.exists(temppath):
 >>  >> > > > > > > >>         os.mkdir(temppath)
 >>  >> > > > > > > >> MPI.barrier(mpi_comm_world())
 >>  >> > > > > > > >> return temppath
 >>  >> > > > > > > >>
 >>  >> > > > > > > >> If you're worried about the OS not creating
 >>  >> > > > > > > >> files atomically, you can always mkdir into a
 >>  >> > > > > > > >> tmp directory
 >>  >> and
 >>  >> > > > > > > >> then os.rename(tmp, temppath), since posix
 >>  >> > > > > > > >> guarantees that renames are atomic.
 >>  >> > > > > > > >>
 >>  >> > > > > > > >> Lawrence
>> >> > > > > > > >> _______________________________________________
 >>  >> > > > > > > >> fenics mailing list
 >>  >> > > > > > > >> [email protected]
>> >> > > > > > > >> http://fenicsproject.org/mailman/listinfo/fenics
 >>  >> > > > > > > >>
 >>  >> > > > > > > >
 >>  >> > > > > > > >
 >>  >> > > > > >
 >>  >> > > > > > _______________________________________________
 >>  >> > > > > > fenics mailing list
 >>  >> > > > > > [email protected]
 >>  >> > > > > > http://fenicsproject.org/mailman/listinfo/fenics
 >>  >> > > > >
 >>  >> > > > >
 >>  >> > >
 >>  >> > >
 >>  >>
 >

 _______________________________________________
 fenics mailing list
 [email protected]
 http://fenicsproject.org/mailman/listinfo/fenics


_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to