Hi Kapil,

Below are some stack traces and the code used to generate it.  This
includes traces to write() that had (fd > 100), which our code
normally quickly ignores. So for fd=821, dmtcp_is_running_state() we
can now see that it sometimes returns 0 and sometimes 1.  However, if
I understand correctly, it should always return 1 because fd=821 is
not used by our application.  This technique hit too many break points
and messed up execution of our app, probably because of timeouts.

So tried another technique to get a stack trace where our application
is writing data and dmtcp_is_running_state() should return 0:

(gdb) where
#0  0x40020416 in __kernel_vsyscall ()
#1  0x402c76f9 in lseek () from /lib/i386-linux-gnu/libc.so.6
#2  0x40057a28 in outside_help (checkfd=5, buf=0xbfb7b80c, count=41,
str=0x400cfaaf "write")
    at socketwrappers.cpp:529
#3  0x40058c77 in write (fd=5, buf=0xbfb7b80c, count=41) at
socketwrappers.cpp:594
#4  0x0804b26b in ?? ()
#5  0x0804a37c in ?? ()
#6  0x08049719 in ?? ()
#7  0x4021e113 in __libc_start_main () from /lib/i386-linux-gnu/libc.so.6
#8  0x08049d31 in ?? ()
Backtrace stopped: Not enough registers or memory available to unwind further
(gdb) up 2
#2  0x40057a28 in outside_help (checkfd=5, buf=0xbfb7b80c, count=41,
str=0x400cfaaf "write")
    at socketwrappers.cpp:529
529          lseek(fd,0,SEEK_SET);
(gdb) p isrunning    ## this is the variable used by the code, and
$1 = <optimized out>    ## from output to the terminal I can see that it was "1"
(gdb) p dmtcp_is_running_state()
$2 = 1
(gdb)

Our app writes 41 characters, so it is the correct one.
All of this output is from DMTCP v1.2.6.

Hope this is helpful,
--Richard

===============
STACK TRACES:
===============

Breakpoint 1, breakOnMe (isrunning=0) at socketwrappers.cpp:587
587    {
(gdb) where
#0  breakOnMe (isrunning=0) at socketwrappers.cpp:587
#1  0x40057bcb in write (fd=821, buf=0x400f984c, count=14) at
socketwrappers.cpp:599
#2  0x400c41ad in jalib::JSocket::write (this=0x400f2370,
buf=0x400f984c "white-fujitsu", len=14)
    at ../jalib/jsocket.cpp:246
#3  0x400c4cc2 in jalib::JSocket::writeAll (this=0x400f2370,
buf=0x400f984c "white-fujitsu", len=14)
    at ../jalib/jsocket.cpp:347
#4  0x400422c1 in dmtcp::DmtcpWorker::sendCkptFilenameToCoordinator
(this=0x400f2370) at dmtcpworker.cpp:940
#5  0x400547bd in callbackPostCheckpoint (isRestart=0,
mtcpRestoreArgvStartAddr=0x0) at mtcpinterface.cpp:298
#6  0x4052903d in checkpointhread (dummy=0x0) at mtcp.c:2135
#7  0x40062cbd in pthread_start (arg=0x400f9404) at threadwrappers.cpp:70
#8  0x4038ed31 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#9  0x40062994 in clone_start (arg=0x400f9384) at threadwrappers.cpp:147
#10 0x40527bd0 in threadcloned (threadv=0x400fd004) at mtcp.c:1226
#11 0x402d846e in clone () from /lib/i386-linux-gnu/libc.so.6
Backtrace stopped: Not enough registers or memory available to unwind further
(gdb) c
Continuing.

Breakpoint 1, breakOnMe (isrunning=0) at socketwrappers.cpp:587
587    {
(gdb) where
#0  breakOnMe (isrunning=0) at socketwrappers.cpp:587
#1  0x40057bcb in write (fd=821, buf=0x40746b78, count=388) at
socketwrappers.cpp:599
#2  0x400c41ad in jalib::JSocket::write (this=0x400f2370,
buf=0x40746b78 "DMTCP_CKPT_V0\n", len=388)
    at ../jalib/jsocket.cpp:246
#3  0x400c4cc2 in jalib::JSocket::writeAll (this=0x400f2370,
buf=0x40746b78 "DMTCP_CKPT_V0\n", len=388)
    at ../jalib/jsocket.cpp:347
#4  0x4004095e in operator<< <dmtcp::DmtcpMessage> (t=...,
this=0x400f2370) at ../jalib/jsocket.h:104
#5  dmtcp::DmtcpWorker::waitForCoordinatorMsg (this=0x400f2340, msgStr=...,
    type=dmtcp::DMT_DO_REGISTER_NAME_SERVICE_DATA) at dmtcpworker.cpp:623
#6  0x4004335a in dmtcp::DmtcpWorker::waitForStage3Refill
(this=0x400f2340, isRestart=false) at dmtcpworker.cpp:1026
#7  0x400547d2 in callbackPostCheckpoint (isRestart=0,
mtcpRestoreArgvStartAddr=0x0) at mtcpinterface.cpp:299
#8  0x4052903d in checkpointhread (dummy=0x0) at mtcp.c:2135
#9  0x40062cbd in pthread_start (arg=0x400f9404) at threadwrappers.cpp:70
#10 0x4038ed31 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#11 0x40062994 in clone_start (arg=0x400f9384) at threadwrappers.cpp:147
#12 0x40527bd0 in threadcloned (threadv=0x400fd004) at mtcp.c:1226
#13 0x402d846e in clone () from /lib/i386-linux-gnu/libc.so.6
Backtrace stopped: Not enough registers or memory available to unwind further
(gdb) c
Continuing.

<<<stuff cut>>>>
Breakpoint 1, breakOnMe (isrunning=1) at socketwrappers.cpp:587
587    {
(gdb) where
#0  breakOnMe (isrunning=1) at socketwrappers.cpp:587
#1  0x40057bcb in write (fd=821, buf=0x40746be8, count=388) at
socketwrappers.cpp:599
#2  0x400c41ad in jalib::JSocket::write (this=0x400f2370,
buf=0x40746be8 "DMTCP_CKPT_V0\n", len=388)
    at ../jalib/jsocket.cpp:246
#3  0x400c4cc2 in jalib::JSocket::writeAll (this=0x400f2370,
buf=0x40746be8 "DMTCP_CKPT_V0\n", len=388)
    at ../jalib/jsocket.cpp:347
#4  0x4004095e in operator<< <dmtcp::DmtcpMessage> (t=...,
this=0x400f2370) at ../jalib/jsocket.h:104
#5  dmtcp::DmtcpWorker::waitForCoordinatorMsg (this=0x400f2340,
msgStr=..., type=dmtcp::DMT_DO_SUSPEND)
    at dmtcpworker.cpp:623
#6  0x40042f8d in dmtcp::DmtcpWorker::waitForStage1Suspend
(this=0x400f2340) at dmtcpworker.cpp:739
#7  0x40054e1b in callbackSleepBetweenCheckpoint (sec=195948557) at
mtcpinterface.cpp:218
#8  0x40528a11 in checkpointhread (dummy=0x0) at mtcp.c:1903
#9  0x40062cbd in pthread_start (arg=0x400f9404) at threadwrappers.cpp:70
#10 0x4038ed31 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#11 0x40062994 in clone_start (arg=0x400f9384) at threadwrappers.cpp:147
#12 0x40527bd0 in threadcloned (threadv=0x400fd004) at mtcp.c:1226
#13 0x402d846e in clone () from /lib/i386-linux-gnu/libc.so.6
Backtrace stopped: Not enough registers or memory available to unwind further
(gdb) c

===============
CODE:
===============

int dosleep = 1;
void breakOnMe(int isrunning)
{
  if (dosleep)
    sleep(3);
}

ssize_t write(int fd, const void *buf, size_t count) {
  ssize_t rr;
  int isrunning;

  isrunning = dmtcp_is_running_state();
  breakOnMe(isrunning);

        WRAPPER_EXECUTION_DISABLE_CKPT(); // The lock is released
inside the macro.
    rr = _real_write(fd, buf, count);
    int saved_errno;
    saved_errno = errno;
    WRAPPER_EXECUTION_ENABLE_CKPT();
    errno =saved_errno;            \
    return rr;
}



On Tue, Jul 2, 2013 at 10:12 PM, Kapil Arya <[email protected]> wrote:
> Hi Richard,
>
> I almost forgot that we had _real_read and _real_write definitions :-).
> Anyways, the wrapper code looks correct, although you shouldn't need to do
> the saved_errno stuff. It is taken care of inside the DMTCP_XXX macros.
>
> I am a little confused about the dmtcp_is_running_state() always returning
> 1. Is it possible for you to send me a stacktrace of the write call that
> originates from DMTCP? That might help us narrow down why is it returning 1.
>
> Thanks,
> Kapil
>
>
> On Tue, Jul 2, 2013 at 5:50 AM, Richard Potter <[email protected]>
> wrote:
>>
>> Hi Kapil,
>>
>> Thank you for your reply and the new info.  I tried a quick test of
>> dmtcp_is_running_state() (using #include "dmtcpplugin.h") and the
>> result was that it always returned 1, even when write() was sending
>> out real application data, so it does not seem to help us.
>>
>> In regards to _real_write and _real_read, connecting gdb to a process
>> shows this:
>>
>> Loaded symbols for
>> /home/knoppix/dmtcp-1.2.7/dmtcp/src/../../lib/libmtcp.so.1
>>    0x401a803c in nanosleep () from /lib/i386-linux-gnu/libc.so.6
>>    (gdb) br _real_write
>>    Breakpoint 1 at 0x400bd690: file syscallsreal.c, line 459.
>>    (gdb) br _real_read
>>    Breakpoint 2 at 0x400bd600: file syscallsreal.c, line 454.
>>    (gdb)
>>
>> And syscallsreal.c:439 contains:
>>    LIB_PRIVATE
>>    ssize_t _real_read(int fd, void *buf, size_t count) {
>>      REAL_FUNC_PASSTHROUGH ( read ) ( fd,buf,count );
>>    }
>>
>>    LIB_PRIVATE
>>    ssize_t _real_write(int fd, const void *buf, size_t count) {
>>      REAL_FUNC_PASSTHROUGH_TYPED ( ssize_t,write ) ( fd,buf,count );
>>    }
>>
>> For reference, our attempt at wrapping write() basically looks
>> like this:
>>
>> ssize_t write(int fd, const void *buf, size_t count) {
>>   ssize_t rr;
>>   if ( our filter code to identify which writes are really
>>                        done by the application  ) {
>>      our code to change the behavior of write
>>      }
>>      WRAPPER_EXECUTION_DISABLE_CKPT(); // The lock is released inside the
>> macro.
>>      rr = _real_write(fd, buf, count);
>>      int saved_errno;
>>      saved_errno = errno;
>>      WRAPPER_EXECUTION_ENABLE_CKPT();
>>      errno =saved_errno;
>>      return rr;
>> }
>>
>> The good news is it almost works.  The bad news is we are really just
>> guessing here what is possible, based on a quick study of other wrappers
>> in socketwrappers.cpp.  In fact, when you first said that _real_write
>> were not defined, it sounded plausible, because we had not checked.
>> It just compiled and linked by magic. :-)
>>
>> The dmtcp_is_running_state() would have been nice because our
>> filtering code is matching for patterns, which works, but could break
>> anytime. Therefore, any further clues on how to use
>> dmtcp_is_running_state()
>> or something else would be helpful.  Also, how lucky are we that the
>> wrapping of write() is working?  Hints on how to do it more properly would
>> be appreciated.
>>
>> DMTCP is amazing.  Wish I had time to dive in and really understand
>> how it works.
>>
>> --Richard
>>
>> (It turned out that a problem we thought was related to the the write()
>> wrapper
>> occurs on unmodified dmtcp, so I'll send that in a separate email.)
>>
>> On Tue, Jul 2, 2013 at 1:43 AM, Kapil Arya <[email protected]> wrote:
>> > Hi,
>> >
>> > DMTCP does not put wrappers around read and write and thus no _real_
>> > versions of these functions. In your application, are the _real function
>> > defined in the binary or in some shared library?
>> >
>> > A simple trick would be to use the function:
>> >   int  dmtcp_is_running_state();
>> > to check if the computation is in _RUNNING_ state or not. DMTCP uses
>> > read/write only when it's _not_ in RUNNING state, i.e. during checkpoint
>> > and
>> > restart.
>> >
>> > Does this help?
>> >
>> > Kapil
>> >
>> >
>> >
>> > On Thu, Jun 27, 2013 at 3:44 AM, Cyrille Artho <[email protected]>
>> > wrote:
>> >>
>> >> Hi all,
>> >> We are trying to use DMTCP for fault injection on sockets that are
>> >> accessed via a file descriptor.
>> >>
>> >> The problem we ran into is that DMTCP sends its own data over existing
>> >> application sockets. That data is used for internal bookkeeping in
>> >> DMTCP, and not seen by the application.
>> >>
>> >> However, our wrapper for read sees this extra data, and we are thinking
>> >> about how to ignore it or filter it out.
>> >>
>> >> Is it intentional that DMTCP uses "read" instead of "_real_read" for
>> >> its
>> >> own communication? If so, is there any chance for us to have internal
>> >> use of "read" flagged somehow? It seems DMTCP now uses a lock for
>> >> internal operations, so if DMTCP's internal use of "read" cannot be
>> >> changed to "_real_read", the following may do the trick for us as well:
>> >>
>> >> lock
>> >> set flag
>> >> read
>> >> clear flag
>> >> unlock
>> >>
>> >> This way, we always know when the application uses read (the flag is
>> >> not
>> >> set).
>> >>
>> >> If there is no easy way for us to tell if DMTCP or the application
>> >> calls
>> >> "read", then we can try to filter the messages from DMTCP. They seem to
>> >> use particular messages and port numbers. Is that documented somewhere?
>> >> Is that expected to be stable?
>> >> --
>> >> Regards,
>> >> Cyrille Artho - http://artho.com/
>> >> Those who will not reason, are bigots, those who cannot,
>> >> are fools, and those who dare not, are slaves.
>> >>                 -- George Gordon Noel Byron
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> This SF.net email is sponsored by Windows:
>> >>
>> >> Build for Windows Store.
>> >>
>> >> http://p.sf.net/sfu/windows-dev2dev
>> >> _______________________________________________
>> >> Dmtcp-forum mailing list
>> >> [email protected]
>> >> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>> >
>> >
>
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to