A reply from stackoverflow suggests I might have hit this bug: https://sourceware.org/bugzilla/show_bug.cgi?id=14023
I can confirm that this glibc bug affects my system (latest CentOS 7). However, as far as I know, R is not multithreaded in its core. Is it possible that some library triggered this? Regards, Steve Tomas Kalibera <tomas.kalib...@gmail.com> 于2019年8月2日周五 下午4:59写道: > > In an optimized build, debug info is just an approximation. It might > help to debug in a build of R and packages without compiler > optimizations (-O0), where the debug information is accurate. However, > first I would try to modify the example to trigger more often, or try to > find external ways to make it trigger more often (e.g. via gctorture). > Then I would try to make the example smaller (not call gc() explicitly, > not call any external code - e.g. the jobs, etc) - any time the example > is reduced but still triggers the errors, the reasoning is made easier. > Once you have a repeatable situation in a build with reliable debug > symbols, debugging is easier too, e.g. sometimes a watchpoint helps to > find memory corruption. Please feel free to ask more when you have more > information/updates. If this ends up being a bug in R, please report > (and with a reproducible example, if it is not obvious from the source > code). > > Best > Tomas > > > On 8/2/19 10:23 AM, Sun Yijiang wrote: > > The R script I run daily for hours looks like this: > > > > while (!finish) { > > Sys.sleep(0.1) > > time = as.integer(format(Sys.time(), "%H%M")) # always crash here > > if (new.data.timestamp() <= time) > > next > > # ... do some jobs for about 2 minutes ... > > gc() > > } > > > > Basically it waits for new data, which comes in every 10 minutes, and > > do some jobs, then gc(), then loop again. It works great most of the > > time, but crashes strangely once a month or so. Although infrequent, > > it always crashes at the same place and gives the same error info, > > like this: > > > > *** caught segfault *** > > address (nil), cause 'memory not mapped' > > > > Traceback: > > 1: as.POSIXlt.POSIXct(x, tz) > > 2: as.POSIXlt(x, tz) > > 3: format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) > > 4: structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...), > > names = names(x)) > > 5: format.POSIXct(Sys.time(), format = "%H%M") > > 6: format(Sys.time(), format = "%H%M") > > 7: format(Sys.time(), format = "%H%M") > > … … > > > > I looked into the dumped core with gdb, and found something very strange: > > > > gdb /usr/lib64/R/bin/exec/R ~/core.30387 > > (gdb) bt 5 > > #0 0x00007f1dca844ff1 in __strlen_sse2_pminub () from /lib64/libc.so.6 > > #1 0x00007f1dcb20e8f9 in Rf_mkChar (name=0x0) at envir.c:3725 > > #2 0x00007f1dcb1dc225 in do_asPOSIXlt (call=<optimized out>, > > op=<optimized out>, args=<optimized out>, > > env=<optimized out>) at datetime.c:705 > > #3 0x00007f1dcb22197f in bcEval (body=body@entry=0x4064b28, > > rho=rho@entry=0xc449d38, useCache=useCache@entry=TRUE) > > at eval.c:6473 > > #4 0x00007f1dcb230370 in Rf_eval (e=0x4064b28, > > rho=rho@entry=0xc449d38) at eval.c:624 > > (More stack frames follow…) > > > > Tracing into src/main/datetime.c:705, it’s a simple string-making code: > > SET_STRING_ELT(tzone, 1, mkChar(R_tzname[0])); > > > > mkChar function is defined in envir.c:3725: > > 3723 SEXP mkChar(const char *name) > > 3724 { > > 3725 size_t len = strlen(name); > > … … > > > > gdb shows that the string pointer (name=0x0) mkChar received is NULL, > > and subsequently strlen(NULL) caused the segfault. But quite > > contradictorily, gdb shows the value passed to mkChar in the caller is > > valid: > > > > (gdb) frame 2 > > #2 0x00007f1dcb1dc225 in do_asPOSIXlt (call=<optimized out>, > > op=<optimized out>, args=<optimized out>, > > env=<optimized out>) at datetime.c:705 > > 705 datetime.c: No such file or directory. > > (gdb) p tzname[0] > > $1 = 0x4cf39c0 “CST” > > > > R_tzname is an alias of tzname. (#define R_tzname tzname in the same file.) > > > > At first, I suspect that some library may have messed up the memory > > and accidentally zeroed tzname (a global variable). But with this gdb > > trace, it shows that tzname is good, only that the pointer passed to > > mkChar magically changed to zero. Like this: > > > > mkChar(tzname[0]) // tzname[0] is “CST”, address 0x4cf39c > > … … > > SEXP mkChar(const char *name) // name should be 0x4cf39c, but gdb shows 0x0 > > { > > size_t len = strlen(name); // segfault, as name is NULL > > … … > > > > The only theory I can think of so far is that, on calling mkChar, the > > parameter passed on stack somehow got wiped out to zero by some buggy > > code in R or library. At a higher level, what I see is this: If you > > run format(Sys.time(), "%H%M”) a million times a day (together with > > other codes of course), once in a month or so this simple line can > > segfault. > > > > I’m lost in this confusion, could someone please help me find the > > right direction to further look into this problem? > > > > Regards, > > Steve > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel