I'd need to look at the beginning of the stacktraces to understand
what's going on, may be 20-30 frames from the beginning. You can do a
'bactrace -20' to see the bottom 20 frames in gdb.

It's hard to say what's the exact cause because I do use glibc-2.23
(recently upgraded from 2.22) and haven't seen any issues so far.

Could you please confirm that the 3.0 zip you downloaded has the commits
from the following pull request: https://github.com/dmtcp/dmtcp/pull/372 ?

On Sat, May 14, 2016 at 01:24:37PM -0400, Calvin Ostrum wrote:
> 
> On 13/05/16 08:08 AM, Rohan Garg wrote:
> 
> > Could you try with `dmtcp_launch --disable-alloc-plugin`? Does that
> > help? It's not the final solution, of course, but I just want to
> > isolate and do sanity tests.
> 
> Sorry, I did not know about this.  Judging by the stack
> trace, and how it "worked" when I commented out the
> "free" body, the memory alloc routines seems to be
> where the problem lies for sure.  And it turns out they
> appear to work on all my systems if I use that flag.
> 
> Does that just mean that the memory alloc routines
> used are the raw ones?  And it can still successfully
> checkpoint?  What does one lose then by using this flag?
> (I have not yet run a big program for example to see if
> the saved image is much larger like it was when I just
> didn't free anything).
> 
> It looks like the problem lies, on my systems, with
> the newest version of glibc2.
> 
> So, here are the three systems I tested this time:
> 
> A: m3-6Y30 , Fedora 23, gcc 5.3.1, glibc2 2.22, kernel 4.5,3 64 bit
> B: Atom N550, Fedora 23, gcc 5.3.1, glibc 2.22, kernel 4.4.9, 32 bit
> C: i3-540, Fedora 21, gcc 4.8.3  glibc2 2.18, kernel 4.3.3, 64 bit
> 
> I tried
>       2.4.4
>       2.5rc1 and
>       3.0 git zip of may 12
> 
> on all systems.
> 
> I tried starting a session with ocaml
> (O) and a session with python (P).  Just
> defined a variable, checkpointed, and
> attempted to restart,and attempted
> to interrupt with ctrl-c.
> 
> ------------------------------------------
> 
> Results System C, with the older glibc2:
> 
> All three dmtcp versions WORK with both ocaml and
> python, and handle interrupt correctly.  So no
> regression there.
> 
> ------------------------------------------
> 
> Results System B: old netbook, 32 bit,
> I don't need to use it probably, but gives
> another test result, with newest glibc2:
> 
> NONE of them work.  Not even the ones
> that used to work before I recently
> updated it to new glibc2.
> 
> They all seem to show an infinite loop
> involving some alloc routines, and they
> all work if I start with --disable-alloc-plugin.
> Including handling ctrl-C
> 
> ------------------------------------------
> 
> Results System A: new ultrabook, 64 bit with
> newest glibc2:
> 
> NONE of them work, all show similar kind of
> infinite loop, and, ALL WORK with
> --disable-alloc-plugin.  Including
> handling ctrl-C
> 
> (It was apparently only with my lobotomized
> version with the "free" code removed that
> it didn't handle ctrl-C).
> 
> ------------------------------------------
> 
> > Interesting! So, it seems like we missed out on some corner case.
> > Could you please share the stack trace? Also, is it easy to isolate
> > it to a simple test case that you could share with us? It'll be
> > easy to debug if we can reproduce it locally.
> 
> It doesn't seem like a "corner case" in the code as
> it fails on absolutely anything I call it with.
> Maybe a corner case for environments if the phrase makes
> sense there. For a simple test case "dmtcp_launch python" fails,
> as I mentioned, or "dmctp_lauch who" even.
> 
> I hope there is *something* you can do to
> reproduce it locally.  What could it be on
> my systems (both of them with the new glibc2)
> that make it fail that you cannot easily
> reproduce??
> 
> Here are the tops of the stack crashes
> on B without disable alloc (enough to
> show the loop):
> 
> 2.4.4:
> > #0  0x00007ffff650915a in do_sym () from /lib64/libc.so.6
> > #1  0x00007ffff6509543 in _dl_vsym () from /lib64/libc.so.6
> > #2  0x00007ffff69aa198 in dlvsym_doit () from /lib64/libdl.so.2
> > #3  0x00007ffff7deb5f4 in _dl_catch_error () from 
> > /lib64/ld-linux-x86-64.so.2
> > #4  0x00007ffff69aa631 in _dlerror_run () from /lib64/libdl.so.2
> > #5  0x00007ffff69aa1ed in dlvsym () from /lib64/libdl.so.2
> > #6  0x00007ffff7121522 in initialize_libpthread_wrappers () at 
> > syscallsreal.c:315
> > #7  0x00007ffff70dfcc9 in dmtcp_prepare_wrappers () at dmtcpworker.cpp:152
> > #8  0x00007ffff7bd9b8f in malloc (size=118) at alloc/mallocwrappers.cpp:40
> > #9  0x00007ffff7deb3c1 in _dl_signal_error () from 
> > /lib64/ld-linux-x86-64.so.2
> > #10 0x00007ffff7deb573 in _dl_signal_cerror () from 
> > /lib64/ld-linux-x86-64.so.2
> > #11 0x00007ffff7de6303 in _dl_lookup_symbol_x () from 
> > /lib64/ld-linux-x86-64.so.2
> > #12 0x00007ffff6509161 in do_sym () from /lib64/libc.so.6
> > #13 0x00007ffff6509543 in _dl_vsym () from /lib64/libc.so.6
> > #14 0x00007ffff69aa198 in dlvsym_doit () from /lib64/libdl.so.2
> > #15 0x00007ffff7deb5f4 in _dl_catch_error () from 
> > /lib64/ld-linux-x86-64.so.2
> > #16 0x00007ffff69aa631 in _dlerror_run () from /lib64/libdl.so.2
> > #17 0x00007ffff69aa1ed in dlvsym () from /lib64/libdl.so.2
> > #18 0x00007ffff7121522 in initialize_libpthread_wrappers () at 
> > syscallsreal.c:315
> > #19 0x00007ffff70dfcc9 in dmtcp_prepare_wrappers () at dmtcpworker.cpp:152
> > #20 0x00007ffff7bd9b8f in malloc (size=118) at alloc/mallocwrappers.cpp:40
> 
> 2.5:
> > #0  0x00007ffff7deb58d in _dl_catch_error () from 
> > /lib64/ld-linux-x86-64.so.2
> > #1  0x00007ffff69aa631 in _dlerror_run () from /lib64/libdl.so.2
> > #2  0x00007ffff69aa148 in dlsym () from /lib64/libdl.so.2
> > #3  0x00007ffff7121624 in initialize_libc_wrappers () at syscallsreal.c:256
> > #4  dmtcp_prepare_wrappers () at syscallsreal.c:302
> > #5  0x00007ffff7bd9b8f in malloc (size=109) at alloc/mallocwrappers.cpp:40
> > #6  0x00007ffff7deb3c1 in _dl_signal_error () from 
> > /lib64/ld-linux-x86-64.so.2
> > #7  0x00007ffff7deb573 in _dl_signal_cerror () from 
> > /lib64/ld-linux-x86-64.so.2
> > #8  0x00007ffff7de6303 in _dl_lookup_symbol_x () from 
> > /lib64/ld-linux-x86-64.so.2
> > #9  0x00007ffff6509161 in do_sym () from /lib64/libc.so.6
> > #10 0x00007ffff6509543 in _dl_vsym () from /lib64/libc.so.6
> > #11 0x00007ffff69aa198 in dlvsym_doit () from /lib64/libdl.so.2
> > #12 0x00007ffff7deb5f4 in _dl_catch_error () from 
> > /lib64/ld-linux-x86-64.so.2
> > #13 0x00007ffff69aa631 in _dlerror_run () from /lib64/libdl.so.2
> > #14 0x00007ffff69aa1ed in dlvsym () from /lib64/libdl.so.2
> > #15 0x00007ffff71223ad in initialize_libc_wrappers () at syscallsreal.c:267
> > #16 dmtcp_prepare_wrappers () at syscallsreal.c:302
> 
> 3.0
> > #0  free (ptr=0x63d010) at alloc/mallocwrappers.cpp:72
> > #1  0x00007ffff699f715 in _dlerror_run () from /lib64/libdl.so.2
> > #2  0x00007ffff699f148 in dlsym () from /lib64/libdl.so.2
> > #3  0x00007ffff7bd9da4 in free (ptr=0x63d010) at alloc/mallocwrappers.cpp:74
> > #4  0x00007ffff699f715 in _dlerror_run () from /lib64/libdl.so.2
> > #5  0x00007ffff699f148 in dlsym () from /lib64/libdl.so.2
> > #6  0x00007ffff7bd9da4 in free (ptr=0x63d010) at alloc/mallocwrappers.cpp:74
> > #7  0x00007ffff699f715 in _dlerror_run () from /lib64/libdl.so.2
> > #8  0x00007ffff699f148 in dlsym () from /lib64/libdl.so.2
> > #9  0x00007ffff7bd9da4 in free (ptr=0x63d010) at alloc/mallocwrappers.cpp:74
> 
> ------------------------------------------------------------------------------
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
> _______________________________________________
> Dmtcp-forum mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to