Hi Gene,

these are great news.
Thanks for the workaround. Works fine for the moment :)
I'm looking forward to the next release of dmtcp.
Can we expect it as a christmas present? ;)

Thanks
Werner


On 12/04/2014 01:24 AM, Gene Cooperman wrote:
> Hi Werner and Kapil,
>     Rohan and I found the DMTCP bug affecting matlab version 2012b and later.
>
> Werner,
>     If you want a quick workaround for now, just look for where we define
> 'realpath()' in DMTCP.  This workaround will be good enough for matlab,
> There are a small number of other applications that will still need a more
> general solution.  We'll add the general solution to the github repo and
> to the next rel4ease of DMTCP.  Thanks greatly for reporting this bug to us.
>
> You'll need to go to these two files in the DMTCP source:
>   DMTCP/src/plugin/ipc/file/filewrappers.cpp
>   DMTCP/src/plugin/pid/pid_filewrappers.cpp
> In each of these two files, you'll need to comment out all uses of
> the function realpath().  For example, in filewrappers.cpp, change it to be:
>
> #if 0
> extern "C" char *realpath(const char *path, char *resolved_path)
>
> ...
> extern "C" char *canonicalize_file_name(const char *path)
> {
>   return realpath(path, NULL);
> }
> #endif
>
> and do something similar in:  pid_filewrappers.cpp
> After that, do the usual:
>   ./configure
>   make
> and DMTCP/bin will have the patched executables.
>
>
> Kapil, Werner, and everyone else:
>     The issue is that the definition of realpath() changed between:
>   POSIX.1-2001 and POSIX.1-2008
> Apparently in matlab version 2012b and later, Matlab began using the newer
> definition found in POSIX.1-2008.  (The newer definition allows
> the second argument to be NULL.)
>     The issue arises because newer versions of glibc maintain two versions
> of realpath().  It's the symbol versioning issue that we've seen before
> for DMTCP.  DMTCP needs to invoke the default version instead of the
> older version of realpath.
>
> Best,
> - Gene
>
>
> On Tue, Dec 02, 2014 at 09:52:37AM -0500, Gene Cooperman wrote:
>> Hi Werner,
>>     I'm making some progress.  Apparently, the bug is caused by something
>> in the combination of the DMTCP pid plugin and its file plugin.
>>
>> Kapil,
>>     If you're following this, I can comment out the use of
>>   filewrappers.o
>> in the definition of OBJS in src/plugin/ipc/Makefile
>> (but I use the rest of the file plugin),
>> and I also need to disable the pid plugin.  Under these conditions,
>> DMTCP successfully launches matlab.
>>     I guess the next thing to do is to find which is the bad wrapper
>> inside filewrappers.o.  I'm still not sure why I also need to disable
>> the pid plugin to make this work.  Any thoughts on that?
>>
>> Best,
>> - Gene
>>
>> On Tue, Dec 02, 2014 at 12:08:31PM +0100, Werner Hack wrote:
>>> Hi Gene,
>>>
>>> thanks for looking after this bug.
>>> Yes, I observed the same behaviour as you in gdb.
>>> So I'll wait until you found a solution.
>>>
>>> Thanks
>>> Werner
>>>
>>> On 12/02/2014 12:39 AM, Gene Cooperman wrote:
>>>> Hi Werner,
>>>>     Thanks for this bug report that pins things down for us.  We had
>>>> one earlier bug report on matlab, but our local matlab was an earlier
>>>> matlab, and so we couldn't reproduce the bug.
>>>>     I've now moved to a local cluster here that has a more recent matlab.
>>>> (Kapil and others, if you want to reproduce the bug, try it on the
>>>>  'discovery' cluster at the university.)
>>>>
>>>> I can now reproduce the bug.  When I run:
>>>>     gdb --args bin/dmtcp_launch matlab -nodisplay -nojvm
>>>> it appears to hang.  But then I type ^C to interrupt, and 'where' to
>>>> see the stack.  From there, I'm seeing a call to std::terminate()
>>>> from within eh_terminate.cc (C++ standard library: exception
>>>> handler: terminate).  This calls ' boost::call_once()' which then hangs.
>>>> So, something in 'matlab' created a C++ exception that couldn't be
>>>> handled.
>>>>
>>>> Is this consistent with what you're seeing?  Either way, I'll look more
>>>> deeply into this.
>>>>
>>>> Thanks for the bug report,
>>>> - Gene
>>>>
>>>>
>>>> On Mon, Dec 01, 2014 at 11:54:37AM +0100, Werner Hack wrote:
>>>>> Hi all,
>>>>>
>>>>> I want to use dmtcp with matlab.
>>>>> I have dmtcp 2.3.1 installed but problems using it with newer releases of 
>>>>> matlab.
>>>>> For testing I used a simple script with a counter in a loop which prints 
>>>>> its result
>>>>> to the display (like your testprogram dmtcp1).
>>>>>
>>>>> Using matlab R2012a it seems to work.
>>>>> I can create checkpoints and the counter is printed to the display.
>>>>> Restarting from checkpoint continues counting as expected.
>>>>>
>>>>> But using matlab R2012b and above no output is printed anymore.
>>>>> I also tried to print to a file but same result.
>>>>> A checkpoint can be done but is useless in this context.
>>>>>
>>>>> Is this a known issue?
>>>>> Or have you an idea what goes wrong here?
>>>>>
>>>>> Any hint will be appreciated.
>>>>> Regards
>>>>> Werner
>>>>>
>>>>> #-------------------------------------------------------------------------------
>>>>> Output from dmtcp_coordinator while doing a checkpoint:
>>>>>
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:1271 in startCheckpoint; 
>>>>> REASON='starting checkpoint,
>>>>> suspending all nodes'
>>>>>      s.numPeers = 1
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:1273 in startCheckpoint; 
>>>>> REASON='Incremented Generation'
>>>>>      compId.generation() = 3
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:615 in updateMinimumState; 
>>>>> REASON='locking all nodes'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:621 in updateMinimumState; 
>>>>> REASON='draining all nodes'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:627 in updateMinimumState; 
>>>>> REASON='checkpointing all nodes'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:641 in updateMinimumState; 
>>>>> REASON='building name service database'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:657 in updateMinimumState; 
>>>>> REASON='entertaining queries now'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:662 in updateMinimumState; 
>>>>> REASON='refilling all nodes'
>>>>> [20384] NOTE at dmtcp_coordinator.cpp:693 in updateMinimumState; 
>>>>> REASON='restarting all nodes'
>>>>>
>>>>> #-------------------------------------------------------------------------------
>>>>> Output from dmtcp_launch  while doing a checkpoint:
>>>>>
>>>>> [95000] NOTE at writeckpt.cpp:513 in preprocess_special_segments; 
>>>>> REASON='bottom-most page of stack
>>>>> (page with highest address) was
>>>>>   invisible in /proc/self/maps. It is made visible again now.'
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> -----------------------------------------------------------------------
>>>>> Werner Hack  
>>>>> Universität Ulm   
>>>>> Institut für Nachrichtentechnik 
>>>>> -----------------------------------------------------------------------
>>>>>
>>>>>
>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>>>>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>>>>> Get technology previously reserved for billion-dollar corporations, FREE
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Dmtcp-forum mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>>
>>> -- 
>>>
>>> -----------------------------------------------------------------------
>>> Werner Hack              
>>> Universität Ulm  
>>> Institut für Nachrichtentechnik 
>>> -----------------------------------------------------------------------
>>>
>>>
>>
>

-- 


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to