Johannes did some hacks to instant to get it running on the Abel cluster at
UiO, patch attached. The swig paths and version is hardcoded so no check is
actually performed. Also the getstatusoutput implementation is using
commands instead of Popen.

Try this patch, then edit the changes manually to make the paths and
versions match  with your system.

Martin


On 17 June 2013 23:44, Jan Blechta <[email protected]> wrote:

> On Mon, 17 Jun 2013 21:48:09 +0200
> Jan Blechta <[email protected]> wrote:
> > On Mon, 17 Jun 2013 20:06:06 +0100
> > "Garth N. Wells" <[email protected]> wrote:
> > > On 17 June 2013 19:16, Jan Blechta <[email protected]>
> > > wrote:
> > > > On Mon, 17 Jun 2013 15:03:18 +0200
> > > > Johan Hake <[email protected]> wrote:
> > > >> On 06/17/2013 03:01 PM, Jan Blechta wrote:
> > > >> > On Mon, 17 Jun 2013 09:39:16 +0200
> > > >> > Johan Hake <[email protected]> wrote:
> > > >> >> I have now made changes to both instant and dolfin, so the
> > > >> >> swig path and version is checked at import time instead of
> > > >> >> each time a module is JIT compiled.
> > > >> >>
> > > >> >>   dolfin:
> > > >> >>   johanhake/swig-version-check
> > > >> >>
> > > >> >>   instant:
> > > >> >>   johanhake/add_version_cache
> > > >> >>
> > > >> >> Could Martin and/or Jan check if these fixes get rid of the
> > > >> >> fork warning?
> > > >> >
> > > >> > I'll try how does it behave. But note that currently I'm not
> > > >> > getting not only warning but seg fault.
> > > >>
> > > >> Ok.
> > > >
> > > > The whole thing is pretty twisted and I'm not sure if these
> > > > commits made some progress. There remains fork warning.
> > > >
> > > > As nobody other (from HPC community) reports related problems one
> > > > could deduce that python/DOLFIN popen calls are probably safe.
> > > >
> > > >>
> > > >> > But it may also happen because of my
> > > >> > subprocess.call(mesh_generator).
> > > >>
> > > >> It would be nice to figure out what process triggers the
> > > >> segfault.
> > > >
> > > > I tried with C++ Poisson demo and problems on 12-core OpenIB node
> > > > remain. It can segfault, throw various PETSc or MPI errors, hang
> > > > or pass. This resembles some memory corruption but it has probably
> > > > nothing to do with OpenIB/fork issue as warnings are not present.
> > > > Problem seems to happen more frequently with higher number
> > > > processes used. I've a suspicion to old, buggy OpenMPI, but I can
> > > > do nothing but beg for new version at cluster admin.
> > > >
> > >
> > > Make sure that you're using the MPI wrappers - by default CMake uses
> > > the C++ compiler plus the MPI lib flags. On my local HPC system,
> > > failing to use the wrappers leads to hangs when computing across
> > > nodes.
> >
> > Running cmake -DCMAKE_CXX_COMPILER=mpicxx when configuring
> > demo_poisson does not help. Does is apply also to the compilation of
> > DOLFIN?
>
> Recompiling UFC, DOLFIN, demo_poisson with mpicxx does not help. I
> think there will be some problem on the machine - possibly outdated
> OpenMPI.
>
> Jan
>
> >
> > Jan
> >
> > >
> > > Garth
> > >
> > > > Jan
> > > >
> > > >>
> > > >> Johan
> > > >>
> > > >> >
> > > >> > Jan
> > > >> >
> > > >> >>
> > > >> >> I would be surprised if it does, as we eventually call popen
> > > >> >> to compile the JIT generated module. That call would be
> > > >> >> difficult to get rid of.
> > > >> >>
> > > >> >> Johan
> > > >> >>
> > > >> >> On 06/17/2013 08:47 AM, Martin Sandve Alnæs wrote:
> > > >> >>> Registers are touched on basically every operation the CPU
> > > >> >>> does :) But it didn't say "registers", but "registered
> > > >> >>> memory".
> > > >> >>>
> http://blogs.cisco.com/performance/registered-memory-rma-rdma-and-mpi-implementations/
> > > >> >>>
> > > >> >>> Martin
> > > >> >>>
> > > >> >>>
> > > >> >>> On 17 June 2013 08:36, Johan Hake <[email protected]
> > > >> >>> <mailto:[email protected]>> wrote:
> > > >> >>>
> > > >> >>>     On 06/16/2013 11:40 PM, Jan Blechta wrote:
> > > >> >>>     > On Sun, 16 Jun 2013 22:40:43 +0200
> > > >> >>>     > Johan Hake <[email protected]
> > > >> >>>     > <mailto:[email protected]>> wrote:
> > > >> >>>     >> There are still fork calls when swig version is
> > > >> >>>     >> checked. Would it be ok to check it only when dolfin
> > > >> >>>     >> is imported? That would be an easy fix.
> > > >> >>>     >
> > > >> >>>     > I've no idea. There are two aspects of the issue:
> > > >> >>>     >
> > > >> >>>     > 1. forks may not be supported.
> > > >> >>>
> > > >> >>>     Following [1] below it looks like they should be
> > > >> >>> supported by the more recent openmpi and it also says that:
> > > >> >>>
> > > >> >>>       In general, if your application calls system() or
> > > >> >>> popen(), it will likely be safe.
> > > >> >>>
> > > >> >>>     > 2. even if forks are supported by given installation,
> > > >> >>>     > it may not be secure. Citing from [1]:
> > > >> >>>     >
> > > >> >>>     >    "If you use fork() in your application, you must not
> > > >> >>>     > touch any registered memory before calling some form of
> > > >> >>>     > exec() to launch another process. Doing so will cause
> > > >> >>>     > an immediate seg fault / program crash."
> > > >> >>>     >
> > > >> >>>     >    Is this condition met with present state and would
> > > >> >>>     > be met after suggested change?
> > > >> >>>
> > > >> >>>     I have no clue if we do touch any register before we call
> > > >> >>> the fork and I have no clue whether the suggested fix would
> > > >> >>> do that. Aren't registers touched on a low level basis quite
> > > >> >>> often?
> > > >> >>>
> > > >> >>>     Do you experience occasional segfaults?
> > > >> >>>
> > > >> >>>     Also [2] suggest that the warning might be the problem.
> > > >> >>> Have you tried running with:
> > > >> >>>
> > > >> >>>       mpirun --mca mpi_warn_on_fork 0 ...
> > > >> >>>
> > > >> >>>     Johan
> > > >> >>>
> > > >> >>>     >
> > > >> >>>     > Jan
> > > >> >>>     >
> > > >> >>>     >>
> > > >> >>>     >> Johan
> > > >> >>>     >> On Jun 16, 2013 12:47 AM, "Jan Blechta"
> > > >> >>>     <[email protected]
> > > >> >>> <mailto:[email protected]>>
> > > >> >>>     >> wrote:
> > > >> >>>     >>
> > > >> >>>     >>> What is the current status of a presence of fork()
> > > >> >>>     >>> calls in FEniCS codebase? These calls are not
> > > >> >>>     >>> friendly with openib infiniband clusters [1, 2].
> > > >> >>>     >>>
> > > >> >>>     >>> Issue with popen() calls for searching swig library
> > > >> >>>     >>> was disscused in the end of [3]. I'm still
> > > >> >>>     >>> experiencing these sort of troubles when running on
> > > >> >>>     >>> infiniband nodes (even when using only one node) so
> > > >> >>>     >>> was cleaning of popen() finished or are there any
> > > >> >>>     >>> other harmful fork() calls in FEniCS codebase?
> > > >> >>>     >>>
> > > >> >>>     >>> [1]
> > > >> >>>     >>>
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
> > > >> >>>     >>> [2]
> > > >> >>>     >>>
> http://www.open-mpi.org/faq/?category=tuning#fork-warning
> > > >> >>>     >>> [3]
> > > >> >>>     >>> https://answers.launchpad.net/dolfin/+question/219270
> > > >> >>>     >>>
> > > >> >>>     >>> Jan
> > > >> >>>     >>> _______________________________________________
> > > >> >>>     >>> fenics mailing list
> > > >> >>>     >>> [email protected]
> > > >> >>>     >>> <mailto:[email protected]>
> > > >> >>>     >>> http://fenicsproject.org/mailman/listinfo/fenics
> > > >> >>>     >>>
> > > >> >>>     >>
> > > >> >>>
> > > >> >>>     _______________________________________________
> > > >> >>>     fenics mailing list
> > > >> >>>     [email protected]
> > > >> >>> <mailto:[email protected]>
> > > >> >>> http://fenicsproject.org/mailman/listinfo/fenics
> > > >> >>>
> > > >> >>>
> > > >> >
> > > >>
> > > >> _______________________________________________
> > > >> fenics mailing list
> > > >> [email protected]
> > > >> http://fenicsproject.org/mailman/listinfo/fenics
> > > >
> > > > _______________________________________________
> > > > fenics mailing list
> > > > [email protected]
> > > > http://fenicsproject.org/mailman/listinfo/fenics
> > > _______________________________________________
> > > fenics mailing list
> > > [email protected]
> > > http://fenicsproject.org/mailman/listinfo/fenics
> >
> > _______________________________________________
> > fenics mailing list
> > [email protected]
> > http://fenicsproject.org/mailman/listinfo/fenics
>
> _______________________________________________
> fenics mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics
>

Attachment: instant_abel.patch
Description: Binary data

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to