Hi Gene, these are great news. Thanks for the workaround. Works fine for the moment :) I'm looking forward to the next release of dmtcp. Can we expect it as a christmas present? ;)
Thanks
Werner
On 12/04/2014 01:24 AM, Gene Cooperman wrote:
> Hi Werner and Kapil,
> Rohan and I found the DMTCP bug affecting matlab version 2012b and later.
>
> Werner,
> If you want a quick workaround for now, just look for where we define
> 'realpath()' in DMTCP. This workaround will be good enough for matlab,
> There are a small number of other applications that will still need a more
> general solution. We'll add the general solution to the github repo and
> to the next rel4ease of DMTCP. Thanks greatly for reporting this bug to us.
>
> You'll need to go to these two files in the DMTCP source:
> DMTCP/src/plugin/ipc/file/filewrappers.cpp
> DMTCP/src/plugin/pid/pid_filewrappers.cpp
> In each of these two files, you'll need to comment out all uses of
> the function realpath(). For example, in filewrappers.cpp, change it to be:
>
> #if 0
> extern "C" char *realpath(const char *path, char *resolved_path)
>
> ...
> extern "C" char *canonicalize_file_name(const char *path)
> {
> return realpath(path, NULL);
> }
> #endif
>
> and do something similar in: pid_filewrappers.cpp
> After that, do the usual:
> ./configure
> make
> and DMTCP/bin will have the patched executables.
>
>
> Kapil, Werner, and everyone else:
> The issue is that the definition of realpath() changed between:
> POSIX.1-2001 and POSIX.1-2008
> Apparently in matlab version 2012b and later, Matlab began using the newer
> definition found in POSIX.1-2008. (The newer definition allows
> the second argument to be NULL.)
> The issue arises because newer versions of glibc maintain two versions
> of realpath(). It's the symbol versioning issue that we've seen before
> for DMTCP. DMTCP needs to invoke the default version instead of the
> older version of realpath.
>
> Best,
> - Gene
>
>
> On Tue, Dec 02, 2014 at 09:52:37AM -0500, Gene Cooperman wrote:
>> Hi Werner,
>> I'm making some progress. Apparently, the bug is caused by something
>> in the combination of the DMTCP pid plugin and its file plugin.
>>
>> Kapil,
>> If you're following this, I can comment out the use of
>> filewrappers.o
>> in the definition of OBJS in src/plugin/ipc/Makefile
>> (but I use the rest of the file plugin),
>> and I also need to disable the pid plugin. Under these conditions,
>> DMTCP successfully launches matlab.
>> I guess the next thing to do is to find which is the bad wrapper
>> inside filewrappers.o. I'm still not sure why I also need to disable
>> the pid plugin to make this work. Any thoughts on that?
>>
>> Best,
>> - Gene
>>
>> On Tue, Dec 02, 2014 at 12:08:31PM +0100, Werner Hack wrote:
>>> Hi Gene,
>>>
>>> thanks for looking after this bug.
>>> Yes, I observed the same behaviour as you in gdb.
>>> So I'll wait until you found a solution.
>>>
>>> Thanks
>>> Werner
>>>
>>> On 12/02/2014 12:39 AM, Gene Cooperman wrote:
>>>> Hi Werner,
>>>> Thanks for this bug report that pins things down for us. We had
>>>> one earlier bug report on matlab, but our local matlab was an earlier
>>>> matlab, and so we couldn't reproduce the bug.
>>>> I've now moved to a local cluster here that has a more recent matlab.
>>>> (Kapil and others, if you want to reproduce the bug, try it on the
>>>> 'discovery' cluster at the university.)
>>>>
>>>> I can now reproduce the bug. When I run:
>>>> gdb --args bin/dmtcp_launch matlab -nodisplay -nojvm
>>>> it appears to hang. But then I type ^C to interrupt, and 'where' to
>>>> see the stack. From there, I'm seeing a call to std::terminate()
>>>> from within eh_terminate.cc (C++ standard library: exception
>>>> handler: terminate). This calls ' boost::call_once()' which then hangs.
>>>> So, something in 'matlab' created a C++ exception that couldn't be
>>>> handled.
>>>>
>>>> Is this consistent with what you're seeing? Either way, I'll look more
>>>> deeply into this.
>>>>
>>>> Thanks for the bug report,
>>>> - Gene
>>>>
>>>>
>>>> On Mon, Dec 01, 2014 at 11:54:37AM +0100, Werner Hack wrote:
>>>>> Hi all,
>>>>>
>>>>> I want to use dmtcp with matlab.
>>>>> I have dmtcp 2.3.1 installed but problems using it with newer releases of
>>>>> matlab.
>>>>> For testing I used a simple script with a counter in a loop which prints
>>>>> its result
>>>>> to the display (like your testprogram dmtcp1).
>>>>>
>>>>> Using matlab R2012a it seems to work.
>>>>> I can create checkpoints and the counter is printed to the display.
>>>>> Restarting from checkpoint continues counting as expected.
>>>>>
>>>>> But using matlab R2012b and above no output is printed anymore.
>>>>> I also tried to print to a file but same result.
>>>>> A checkpoint can be done but is useless in this context.
>>>>>
>>>>> Is this a known issue?
>>>>> Or have you an idea what goes wrong here?
>>>>>
>>>>> Any hint will be appreciated.
>>>>> Regards
>>>>> Werner
>>>>>
>>>>> #-------------------------------------------------------------------------------
>>>>> Output from dmtcp_coordinator while doing a checkpoint:
>>>>>
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:1271 in startCheckpoint;
>>>>> REASON='starting checkpoint,
>>>>> suspending all nodes'
>>>>> s.numPeers = 1
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:1273 in startCheckpoint;
>>>>> REASON='Incremented Generation'
>>>>> compId.generation() = 3
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:615 in updateMinimumState;
>>>>> REASON='locking all nodes'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:621 in updateMinimumState;
>>>>> REASON='draining all nodes'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:627 in updateMinimumState;
>>>>> REASON='checkpointing all nodes'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:641 in updateMinimumState;
>>>>> REASON='building name service database'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:657 in updateMinimumState;
>>>>> REASON='entertaining queries now'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:662 in updateMinimumState;
>>>>> REASON='refilling all nodes'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:693 in updateMinimumState;
>>>>> REASON='restarting all nodes'
>>>>>
>>>>> #-------------------------------------------------------------------------------
>>>>> Output from dmtcp_launch while doing a checkpoint:
>>>>>
>>>>> [95000] NOTE at writeckpt.cpp:513 in preprocess_special_segments;
>>>>> REASON='bottom-most page of stack
>>>>> (page with highest address) was
>>>>> invisible in /proc/self/maps. It is made visible again now.'
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> -----------------------------------------------------------------------
>>>>> Werner Hack
>>>>> Universität Ulm
>>>>> Institut für Nachrichtentechnik
>>>>> -----------------------------------------------------------------------
>>>>>
>>>>>
>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>>>>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>>>>> Get technology previously reserved for billion-dollar corporations, FREE
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Dmtcp-forum mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>>
>>> --
>>>
>>> -----------------------------------------------------------------------
>>> Werner Hack
>>> Universität Ulm
>>> Institut für Nachrichtentechnik
>>> -----------------------------------------------------------------------
>>>
>>>
>>
>
--
smime.p7s
Description: S/MIME Cryptographic Signature
------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________ Dmtcp-forum mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
