Hi Werner,
I'm making some progress. Apparently, the bug is caused by something
in the combination of the DMTCP pid plugin and its file plugin.
Kapil,
If you're following this, I can comment out the use of
filewrappers.o
in the definition of OBJS in src/plugin/ipc/Makefile
(but I use the rest of the file plugin),
and I also need to disable the pid plugin. Under these conditions,
DMTCP successfully launches matlab.
I guess the next thing to do is to find which is the bad wrapper
inside filewrappers.o. I'm still not sure why I also need to disable
the pid plugin to make this work. Any thoughts on that?
Best,
- Gene
On Tue, Dec 02, 2014 at 12:08:31PM +0100, Werner Hack wrote:
> Hi Gene,
>
> thanks for looking after this bug.
> Yes, I observed the same behaviour as you in gdb.
> So I'll wait until you found a solution.
>
> Thanks
> Werner
>
> On 12/02/2014 12:39 AM, Gene Cooperman wrote:
> > Hi Werner,
> > Thanks for this bug report that pins things down for us. We had
> > one earlier bug report on matlab, but our local matlab was an earlier
> > matlab, and so we couldn't reproduce the bug.
> > I've now moved to a local cluster here that has a more recent matlab.
> > (Kapil and others, if you want to reproduce the bug, try it on the
> > 'discovery' cluster at the university.)
> >
> > I can now reproduce the bug. When I run:
> > gdb --args bin/dmtcp_launch matlab -nodisplay -nojvm
> > it appears to hang. But then I type ^C to interrupt, and 'where' to
> > see the stack. From there, I'm seeing a call to std::terminate()
> > from within eh_terminate.cc (C++ standard library: exception
> > handler: terminate). This calls ' boost::call_once()' which then hangs.
> > So, something in 'matlab' created a C++ exception that couldn't be
> > handled.
> >
> > Is this consistent with what you're seeing? Either way, I'll look more
> > deeply into this.
> >
> > Thanks for the bug report,
> > - Gene
> >
> >
> > On Mon, Dec 01, 2014 at 11:54:37AM +0100, Werner Hack wrote:
> >> Hi all,
> >>
> >> I want to use dmtcp with matlab.
> >> I have dmtcp 2.3.1 installed but problems using it with newer releases of
> >> matlab.
> >> For testing I used a simple script with a counter in a loop which prints
> >> its result
> >> to the display (like your testprogram dmtcp1).
> >>
> >> Using matlab R2012a it seems to work.
> >> I can create checkpoints and the counter is printed to the display.
> >> Restarting from checkpoint continues counting as expected.
> >>
> >> But using matlab R2012b and above no output is printed anymore.
> >> I also tried to print to a file but same result.
> >> A checkpoint can be done but is useless in this context.
> >>
> >> Is this a known issue?
> >> Or have you an idea what goes wrong here?
> >>
> >> Any hint will be appreciated.
> >> Regards
> >> Werner
> >>
> >> #-------------------------------------------------------------------------------
> >> Output from dmtcp_coordinator while doing a checkpoint:
> >>
> >> [20384] NOTE at dmtcp_coordinator.cpp:1271 in startCheckpoint;
> >> REASON='starting checkpoint,
> >> suspending all nodes'
> >> s.numPeers = 1
> >> [20384] NOTE at dmtcp_coordinator.cpp:1273 in startCheckpoint;
> >> REASON='Incremented Generation'
> >> compId.generation() = 3
> >> [20384] NOTE at dmtcp_coordinator.cpp:615 in updateMinimumState;
> >> REASON='locking all nodes'
> >> [20384] NOTE at dmtcp_coordinator.cpp:621 in updateMinimumState;
> >> REASON='draining all nodes'
> >> [20384] NOTE at dmtcp_coordinator.cpp:627 in updateMinimumState;
> >> REASON='checkpointing all nodes'
> >> [20384] NOTE at dmtcp_coordinator.cpp:641 in updateMinimumState;
> >> REASON='building name service database'
> >> [20384] NOTE at dmtcp_coordinator.cpp:657 in updateMinimumState;
> >> REASON='entertaining queries now'
> >> [20384] NOTE at dmtcp_coordinator.cpp:662 in updateMinimumState;
> >> REASON='refilling all nodes'
> >> [20384] NOTE at dmtcp_coordinator.cpp:693 in updateMinimumState;
> >> REASON='restarting all nodes'
> >>
> >> #-------------------------------------------------------------------------------
> >> Output from dmtcp_launch while doing a checkpoint:
> >>
> >> [95000] NOTE at writeckpt.cpp:513 in preprocess_special_segments;
> >> REASON='bottom-most page of stack
> >> (page with highest address) was
> >> invisible in /proc/self/maps. It is made visible again now.'
> >>
> >>
> >> --
> >>
> >> -----------------------------------------------------------------------
> >> Werner Hack
> >> Universität Ulm
> >> Institut für Nachrichtentechnik
> >> -----------------------------------------------------------------------
> >>
> >>
> >
> >
> >> ------------------------------------------------------------------------------
> >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> >> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> >> Get technology previously reserved for billion-dollar corporations, FREE
> >> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> >> _______________________________________________
> >> Dmtcp-forum mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
> >
> >
>
> --
>
> -----------------------------------------------------------------------
> Werner Hack
> Universität Ulm
> Institut für Nachrichtentechnik
> -----------------------------------------------------------------------
>
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum