Hi Werner,
As usual, the next release of DMTCP will depend on people's schedules.
So, maybe it won't be ready by Christmas. :-( However, we will check in the
full fix for DMTCP by sometime next week. The github repo is currently
fairly stable as we approach a new release. So, it will be safe to use
the github repo as an early alpha version.
Best,
- Gene
On Thu, Dec 04, 2014 at 01:33:16PM +0100, Werner Hack wrote:
> Hi Gene,
>
> these are great news.
> Thanks for the workaround. Works fine for the moment :)
> I'm looking forward to the next release of dmtcp.
> Can we expect it as a christmas present? ;)
>
> Thanks
> Werner
>
>
> On 12/04/2014 01:24 AM, Gene Cooperman wrote:
> > Hi Werner and Kapil,
> > Rohan and I found the DMTCP bug affecting matlab version 2012b and
> > later.
> >
> > Werner,
> > If you want a quick workaround for now, just look for where we define
> > 'realpath()' in DMTCP. This workaround will be good enough for matlab,
> > There are a small number of other applications that will still need a more
> > general solution. We'll add the general solution to the github repo and
> > to the next rel4ease of DMTCP. Thanks greatly for reporting this bug to us.
> >
> > You'll need to go to these two files in the DMTCP source:
> > DMTCP/src/plugin/ipc/file/filewrappers.cpp
> > DMTCP/src/plugin/pid/pid_filewrappers.cpp
> > In each of these two files, you'll need to comment out all uses of
> > the function realpath(). For example, in filewrappers.cpp, change it to be:
> >
> > #if 0
> > extern "C" char *realpath(const char *path, char *resolved_path)
> >
> > ...
> > extern "C" char *canonicalize_file_name(const char *path)
> > {
> > return realpath(path, NULL);
> > }
> > #endif
> >
> > and do something similar in: pid_filewrappers.cpp
> > After that, do the usual:
> > ./configure
> > make
> > and DMTCP/bin will have the patched executables.
> >
> >
> > Kapil, Werner, and everyone else:
> > The issue is that the definition of realpath() changed between:
> > POSIX.1-2001 and POSIX.1-2008
> > Apparently in matlab version 2012b and later, Matlab began using the newer
> > definition found in POSIX.1-2008. (The newer definition allows
> > the second argument to be NULL.)
> > The issue arises because newer versions of glibc maintain two versions
> > of realpath(). It's the symbol versioning issue that we've seen before
> > for DMTCP. DMTCP needs to invoke the default version instead of the
> > older version of realpath.
> >
> > Best,
> > - Gene
> >
> >
> > On Tue, Dec 02, 2014 at 09:52:37AM -0500, Gene Cooperman wrote:
> >> Hi Werner,
> >> I'm making some progress. Apparently, the bug is caused by something
> >> in the combination of the DMTCP pid plugin and its file plugin.
> >>
> >> Kapil,
> >> If you're following this, I can comment out the use of
> >> filewrappers.o
> >> in the definition of OBJS in src/plugin/ipc/Makefile
> >> (but I use the rest of the file plugin),
> >> and I also need to disable the pid plugin. Under these conditions,
> >> DMTCP successfully launches matlab.
> >> I guess the next thing to do is to find which is the bad wrapper
> >> inside filewrappers.o. I'm still not sure why I also need to disable
> >> the pid plugin to make this work. Any thoughts on that?
> >>
> >> Best,
> >> - Gene
> >>
> >> On Tue, Dec 02, 2014 at 12:08:31PM +0100, Werner Hack wrote:
> >>> Hi Gene,
> >>>
> >>> thanks for looking after this bug.
> >>> Yes, I observed the same behaviour as you in gdb.
> >>> So I'll wait until you found a solution.
> >>>
> >>> Thanks
> >>> Werner
> >>>
> >>> On 12/02/2014 12:39 AM, Gene Cooperman wrote:
> >>>> Hi Werner,
> >>>> Thanks for this bug report that pins things down for us. We had
> >>>> one earlier bug report on matlab, but our local matlab was an earlier
> >>>> matlab, and so we couldn't reproduce the bug.
> >>>> I've now moved to a local cluster here that has a more recent matlab.
> >>>> (Kapil and others, if you want to reproduce the bug, try it on the
> >>>> 'discovery' cluster at the university.)
> >>>>
> >>>> I can now reproduce the bug. When I run:
> >>>> gdb --args bin/dmtcp_launch matlab -nodisplay -nojvm
> >>>> it appears to hang. But then I type ^C to interrupt, and 'where' to
> >>>> see the stack. From there, I'm seeing a call to std::terminate()
> >>>> from within eh_terminate.cc (C++ standard library: exception
> >>>> handler: terminate). This calls ' boost::call_once()' which then hangs.
> >>>> So, something in 'matlab' created a C++ exception that couldn't be
> >>>> handled.
> >>>>
> >>>> Is this consistent with what you're seeing? Either way, I'll look more
> >>>> deeply into this.
> >>>>
> >>>> Thanks for the bug report,
> >>>> - Gene
> >>>>
> >>>>
> >>>> On Mon, Dec 01, 2014 at 11:54:37AM +0100, Werner Hack wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I want to use dmtcp with matlab.
> >>>>> I have dmtcp 2.3.1 installed but problems using it with newer releases
> >>>>> of matlab.
> >>>>> For testing I used a simple script with a counter in a loop which
> >>>>> prints its result
> >>>>> to the display (like your testprogram dmtcp1).
> >>>>>
> >>>>> Using matlab R2012a it seems to work.
> >>>>> I can create checkpoints and the counter is printed to the display.
> >>>>> Restarting from checkpoint continues counting as expected.
> >>>>>
> >>>>> But using matlab R2012b and above no output is printed anymore.
> >>>>> I also tried to print to a file but same result.
> >>>>> A checkpoint can be done but is useless in this context.
> >>>>>
> >>>>> Is this a known issue?
> >>>>> Or have you an idea what goes wrong here?
> >>>>>
> >>>>> Any hint will be appreciated.
> >>>>> Regards
> >>>>> Werner
> >>>>>
> >>>>> #-------------------------------------------------------------------------------
> >>>>> Output from dmtcp_coordinator while doing a checkpoint:
> >>>>>
> >>>>> [20384] NOTE at dmtcp_coordinator.cpp:1271 in startCheckpoint;
> >>>>> REASON='starting checkpoint,
> >>>>> suspending all nodes'
> >>>>> s.numPeers = 1
> >>>>> [20384] NOTE at dmtcp_coordinator.cpp:1273 in startCheckpoint;
> >>>>> REASON='Incremented Generation'
> >>>>> compId.generation() = 3
> >>>>> [20384] NOTE at dmtcp_coordinator.cpp:615 in updateMinimumState;
> >>>>> REASON='locking all nodes'
> >>>>> [20384] NOTE at dmtcp_coordinator.cpp:621 in updateMinimumState;
> >>>>> REASON='draining all nodes'
> >>>>> [20384] NOTE at dmtcp_coordinator.cpp:627 in updateMinimumState;
> >>>>> REASON='checkpointing all nodes'
> >>>>> [20384] NOTE at dmtcp_coordinator.cpp:641 in updateMinimumState;
> >>>>> REASON='building name service database'
> >>>>> [20384] NOTE at dmtcp_coordinator.cpp:657 in updateMinimumState;
> >>>>> REASON='entertaining queries now'
> >>>>> [20384] NOTE at dmtcp_coordinator.cpp:662 in updateMinimumState;
> >>>>> REASON='refilling all nodes'
> >>>>> [20384] NOTE at dmtcp_coordinator.cpp:693 in updateMinimumState;
> >>>>> REASON='restarting all nodes'
> >>>>>
> >>>>> #-------------------------------------------------------------------------------
> >>>>> Output from dmtcp_launch while doing a checkpoint:
> >>>>>
> >>>>> [95000] NOTE at writeckpt.cpp:513 in preprocess_special_segments;
> >>>>> REASON='bottom-most page of stack
> >>>>> (page with highest address) was
> >>>>> invisible in /proc/self/maps. It is made visible again now.'
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> -----------------------------------------------------------------------
> >>>>> Werner Hack
> >>>>> Universität Ulm
> >>>>> Institut für Nachrichtentechnik
> >>>>> -----------------------------------------------------------------------
> >>>>>
> >>>>>
> >>>>
> >>>>> ------------------------------------------------------------------------------
> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> >>>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> >>>>> with Interactivity, Sharing, Native Excel Exports, App Integration &
> >>>>> more
> >>>>> Get technology previously reserved for billion-dollar corporations, FREE
> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> >>>>> _______________________________________________
> >>>>> Dmtcp-forum mailing list
> >>>>> [email protected]
> >>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
> >>>>
> >>> --
> >>>
> >>> -----------------------------------------------------------------------
> >>> Werner Hack
> >>> Universität Ulm
> >>> Institut für Nachrichtentechnik
> >>> -----------------------------------------------------------------------
> >>>
> >>>
> >>
> >
>
> --
>
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum