Hi Werner and Kapil,
    Rohan and I found the DMTCP bug affecting matlab version 2012b and later.

Werner,
    If you want a quick workaround for now, just look for where we define
'realpath()' in DMTCP.  This workaround will be good enough for matlab,
There are a small number of other applications that will still need a more
general solution.  We'll add the general solution to the github repo and
to the next rel4ease of DMTCP.  Thanks greatly for reporting this bug to us.

You'll need to go to these two files in the DMTCP source:
  DMTCP/src/plugin/ipc/file/filewrappers.cpp
  DMTCP/src/plugin/pid/pid_filewrappers.cpp
In each of these two files, you'll need to comment out all uses of
the function realpath().  For example, in filewrappers.cpp, change it to be:

#if 0
extern "C" char *realpath(const char *path, char *resolved_path)

...
extern "C" char *canonicalize_file_name(const char *path)
{
  return realpath(path, NULL);
}
#endif

and do something similar in:  pid_filewrappers.cpp
After that, do the usual:
  ./configure
  make
and DMTCP/bin will have the patched executables.


Kapil, Werner, and everyone else:
    The issue is that the definition of realpath() changed between:
  POSIX.1-2001 and POSIX.1-2008
Apparently in matlab version 2012b and later, Matlab began using the newer
definition found in POSIX.1-2008.  (The newer definition allows
the second argument to be NULL.)
    The issue arises because newer versions of glibc maintain two versions
of realpath().  It's the symbol versioning issue that we've seen before
for DMTCP.  DMTCP needs to invoke the default version instead of the
older version of realpath.

Best,
- Gene


On Tue, Dec 02, 2014 at 09:52:37AM -0500, Gene Cooperman wrote:
> Hi Werner,
>     I'm making some progress.  Apparently, the bug is caused by something
> in the combination of the DMTCP pid plugin and its file plugin.
> 
> Kapil,
>     If you're following this, I can comment out the use of
>   filewrappers.o
> in the definition of OBJS in src/plugin/ipc/Makefile
> (but I use the rest of the file plugin),
> and I also need to disable the pid plugin.  Under these conditions,
> DMTCP successfully launches matlab.
>     I guess the next thing to do is to find which is the bad wrapper
> inside filewrappers.o.  I'm still not sure why I also need to disable
> the pid plugin to make this work.  Any thoughts on that?
> 
> Best,
> - Gene
> 
> On Tue, Dec 02, 2014 at 12:08:31PM +0100, Werner Hack wrote:
> > Hi Gene,
> > 
> > thanks for looking after this bug.
> > Yes, I observed the same behaviour as you in gdb.
> > So I'll wait until you found a solution.
> > 
> > Thanks
> > Werner
> > 
> > On 12/02/2014 12:39 AM, Gene Cooperman wrote:
> > > Hi Werner,
> > >     Thanks for this bug report that pins things down for us.  We had
> > > one earlier bug report on matlab, but our local matlab was an earlier
> > > matlab, and so we couldn't reproduce the bug.
> > >     I've now moved to a local cluster here that has a more recent matlab.
> > > (Kapil and others, if you want to reproduce the bug, try it on the
> > >  'discovery' cluster at the university.)
> > >
> > > I can now reproduce the bug.  When I run:
> > >     gdb --args bin/dmtcp_launch matlab -nodisplay -nojvm
> > > it appears to hang.  But then I type ^C to interrupt, and 'where' to
> > > see the stack.  From there, I'm seeing a call to std::terminate()
> > > from within eh_terminate.cc (C++ standard library: exception
> > > handler: terminate).  This calls ' boost::call_once()' which then hangs.
> > > So, something in 'matlab' created a C++ exception that couldn't be
> > > handled.
> > >
> > > Is this consistent with what you're seeing?  Either way, I'll look more
> > > deeply into this.
> > >
> > > Thanks for the bug report,
> > > - Gene
> > >
> > >
> > > On Mon, Dec 01, 2014 at 11:54:37AM +0100, Werner Hack wrote:
> > >> Hi all,
> > >>
> > >> I want to use dmtcp with matlab.
> > >> I have dmtcp 2.3.1 installed but problems using it with newer releases 
> > >> of matlab.
> > >> For testing I used a simple script with a counter in a loop which prints 
> > >> its result
> > >> to the display (like your testprogram dmtcp1).
> > >>
> > >> Using matlab R2012a it seems to work.
> > >> I can create checkpoints and the counter is printed to the display.
> > >> Restarting from checkpoint continues counting as expected.
> > >>
> > >> But using matlab R2012b and above no output is printed anymore.
> > >> I also tried to print to a file but same result.
> > >> A checkpoint can be done but is useless in this context.
> > >>
> > >> Is this a known issue?
> > >> Or have you an idea what goes wrong here?
> > >>
> > >> Any hint will be appreciated.
> > >> Regards
> > >> Werner
> > >>
> > >> #-------------------------------------------------------------------------------
> > >> Output from dmtcp_coordinator while doing a checkpoint:
> > >>
> > >> [20384] NOTE at dmtcp_coordinator.cpp:1271 in startCheckpoint; 
> > >> REASON='starting checkpoint,
> > >> suspending all nodes'
> > >>      s.numPeers = 1
> > >> [20384] NOTE at dmtcp_coordinator.cpp:1273 in startCheckpoint; 
> > >> REASON='Incremented Generation'
> > >>      compId.generation() = 3
> > >> [20384] NOTE at dmtcp_coordinator.cpp:615 in updateMinimumState; 
> > >> REASON='locking all nodes'
> > >> [20384] NOTE at dmtcp_coordinator.cpp:621 in updateMinimumState; 
> > >> REASON='draining all nodes'
> > >> [20384] NOTE at dmtcp_coordinator.cpp:627 in updateMinimumState; 
> > >> REASON='checkpointing all nodes'
> > >> [20384] NOTE at dmtcp_coordinator.cpp:641 in updateMinimumState; 
> > >> REASON='building name service database'
> > >> [20384] NOTE at dmtcp_coordinator.cpp:657 in updateMinimumState; 
> > >> REASON='entertaining queries now'
> > >> [20384] NOTE at dmtcp_coordinator.cpp:662 in updateMinimumState; 
> > >> REASON='refilling all nodes'
> > >> [20384] NOTE at dmtcp_coordinator.cpp:693 in updateMinimumState; 
> > >> REASON='restarting all nodes'
> > >>
> > >> #-------------------------------------------------------------------------------
> > >> Output from dmtcp_launch  while doing a checkpoint:
> > >>
> > >> [95000] NOTE at writeckpt.cpp:513 in preprocess_special_segments; 
> > >> REASON='bottom-most page of stack
> > >> (page with highest address) was
> > >>   invisible in /proc/self/maps. It is made visible again now.'
> > >>
> > >>
> > >> -- 
> > >>
> > >> -----------------------------------------------------------------------
> > >> Werner Hack  
> > >> Universität Ulm   
> > >> Institut für Nachrichtentechnik 
> > >> -----------------------------------------------------------------------
> > >>
> > >>
> > >
> > >
> > >> ------------------------------------------------------------------------------
> > >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> > >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> > >> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> > >> Get technology previously reserved for billion-dollar corporations, FREE
> > >> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> > >> _______________________________________________
> > >> Dmtcp-forum mailing list
> > >> [email protected]
> > >> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
> > >
> > >
> > 
> > -- 
> > 
> > -----------------------------------------------------------------------
> > Werner Hack              
> > Universität Ulm  
> > Institut für Nachrichtentechnik 
> > -----------------------------------------------------------------------
> > 
> > 
> 
> 

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to