Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-11 Thread Vladimir Dergachev




On Tue, 12 Mar 2024, Ivan Krylov wrote:


Vladimir,

Thank you for the example and for sharing the ideas regarding
symbol-relative offsets!

On Thu, 7 Mar 2024 09:38:18 -0500 (EST)
Vladimir Dergachev  wrote:


 unw_get_reg(, UNW_REG_IP, );


Is it ever possible for unw_get_reg() to fail (return non-zero) for
UNW_REG_IP? The documentation isn't being obvious about this. Then
again, if the process is so damaged it cannot even read the instruction
pointer from its own stack frame, any attempts at self-debugging must
be doomed.


Not sure. I think it just returns what is in it, you will get a false 
reading if the stack is corrupted. The way that I see it - some printout 
is better than none, and having signs that stack is badly corrupted is a 
useful debugging clue.





   * this should work as a package, but I am not sure whether the
offsets between package symbols and R symbols would be static or not.


Since package shared objects are mmap()ed into the address space and
(at least on Linux with ASLR enabled) mmap()s are supposed to be made
unpredictable, this offset ends up not being static. On Linux, R seems
to be normally built as a position-independent executable, so no matter
whether there is a libR.so, both the R base address and the package
shared object base address are randomised:

$ cat ex.c
#include 
#include 
void addr_diff(void) {
ptrdiff_t diff = (char*)_diff - (char*)
Rprintf("self - Rprintf = %td\n", diff);
}
$ R CMD SHLIB ex.c
$ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = -9900928
$ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = -15561600
$ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = 45537907472976
$ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = 46527711447632


   * R ought to know where packages are loaded, we might want to be
clever and print out information on which package contains which
function, or there might be identical R_init_RMVL() printouts.


That's true. Informaion on all registered symbols is available from
getLoadedDLLs().


Ok, so this is reasonably straighforward.

best

Vladimir Dergachev



--
Best regards,
Ivan



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-11 Thread Ivan Krylov via R-package-devel
Vladimir,

Thank you for the example and for sharing the ideas regarding
symbol-relative offsets!

On Thu, 7 Mar 2024 09:38:18 -0500 (EST)
Vladimir Dergachev  wrote:

>  unw_get_reg(, UNW_REG_IP, );

Is it ever possible for unw_get_reg() to fail (return non-zero) for
UNW_REG_IP? The documentation isn't being obvious about this. Then
again, if the process is so damaged it cannot even read the instruction
pointer from its own stack frame, any attempts at self-debugging must
be doomed.

>* this should work as a package, but I am not sure whether the
> offsets between package symbols and R symbols would be static or not.

Since package shared objects are mmap()ed into the address space and
(at least on Linux with ASLR enabled) mmap()s are supposed to be made
unpredictable, this offset ends up not being static. On Linux, R seems
to be normally built as a position-independent executable, so no matter
whether there is a libR.so, both the R base address and the package
shared object base address are randomised:

$ cat ex.c
#include 
#include 
void addr_diff(void) {
 ptrdiff_t diff = (char*)_diff - (char*)
 Rprintf("self - Rprintf = %td\n", diff);
}
$ R CMD SHLIB ex.c
$ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = -9900928
$ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = -15561600
$ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = 45537907472976
$ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = 46527711447632

>* R ought to know where packages are loaded, we might want to be
> clever and print out information on which package contains which
> function, or there might be identical R_init_RMVL() printouts.

That's true. Informaion on all registered symbols is available from
getLoadedDLLs().

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-07 Thread Vladimir Dergachev



Hi Ivan,

Here is the piece of code I currently use:

void backtrace_dump(void)
{
unw_cursor_tcursor;
unw_context_t   context;

unw_getcontext();
unw_init_local(, );

while (unw_step() > 0)
{
unw_word_t  offset, pc;
charfname[64];

unw_get_reg(, UNW_REG_IP, );

fname[0] = '\0';
(void) unw_get_proc_name(, fname, 64, );

fprintf(stderr, "0x%016lx : (%s+0x%lx)\n", pc-(long)backtrace_dump, 
fname, offset);
}
}

To make it safe, one can simply replace fprintf() with a function that 
stores information into a buffer.


Several things to point out:

  * printing pc-(long)backtrace_dump works around address randomization, 
so that if you attach the debugger you can find the location again by 
using backtrace_dump+0 (it does not have to be backtrace_dump, any 
symbol will do)


  * this works even if the symbols are stripped, in which case it finds an 
offset relative to the nearest available symbol - there are always some 
from the loader. Of course, in this case you should use the offsets and 
the debugger to find out whats wrong


  * you can call backtrace_dump() from anywhere, does not have to be a 
signal handler. I've taken to calling it when my programs detect some 
abnormal situation, so I can see the call chain.


  * this should work as a package, but I am not sure whether the offsets 
between package symbols and R symbols would be static or not. For R it 
might be a good idea to also print a table of offsets between some R 
symbol and all the loaded C packages R_init_RMVL(), at least initially.


  * R ought to know where packages are loaded, we might want to be clever 
and print out information on which package contains which function, or 
there might be identical R_init_RMVL() printouts.


best

Vladimir Dergachev

On Thu, 7 Mar 2024, Ivan Krylov wrote:


On Tue, 5 Mar 2024 18:26:28 -0500 (EST)
Vladimir Dergachev  wrote:


I use libunwind in my programs, works quite well, and simple to use.

Happy to share the code if there is interest..


Do you mean that you use libunwind in signal handlers? An example on
how to produce a backtrace without calling any async-signal-unsafe
functions would indeed be greatly useful.

Speaking of shared objects injected using LD_PRELOAD, I've experimented
some more, and I think that none of them would work with R without
additional adjustments. They install their signal handler very soon
after the process starts up, and later, when R initialises, it
installs its own signal handler, overwriting the previous one. For this
scheme to work, either R would have to cooperate, remembering a pointer
to the previous signal handler and calling it at some point (which
sounds unsafe), or the injected shared object would have to override
sigaction() and call R's signal handler from its own (which sounds
extremely unsafe).

Without that, if we want C-level backtraces, we either need to patch R
to produce them (using backtrace() and limiting this to glibc systems
or using libunwind and paying the dependency cost) or to use a debugger.

--
Best regards,
Ivan



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-07 Thread Ivan Krylov via R-package-devel
On Tue, 5 Mar 2024 18:26:28 -0500 (EST)
Vladimir Dergachev  wrote:

> I use libunwind in my programs, works quite well, and simple to use.
> 
> Happy to share the code if there is interest..

Do you mean that you use libunwind in signal handlers? An example on
how to produce a backtrace without calling any async-signal-unsafe
functions would indeed be greatly useful.

Speaking of shared objects injected using LD_PRELOAD, I've experimented
some more, and I think that none of them would work with R without
additional adjustments. They install their signal handler very soon
after the process starts up, and later, when R initialises, it
installs its own signal handler, overwriting the previous one. For this
scheme to work, either R would have to cooperate, remembering a pointer
to the previous signal handler and calling it at some point (which
sounds unsafe), or the injected shared object would have to override
sigaction() and call R's signal handler from its own (which sounds
extremely unsafe).

Without that, if we want C-level backtraces, we either need to patch R
to produce them (using backtrace() and limiting this to glibc systems
or using libunwind and paying the dependency cost) or to use a debugger.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-05 Thread Vladimir Dergachev



I use libunwind in my programs, works quite well, and simple to use.

Happy to share the code if there is interest..

best

Vladimir Dergachev

On Mon, 4 Mar 2024, Ivan Krylov via R-package-devel wrote:


On Sun, 3 Mar 2024 19:19:43 -0800
Kevin Ushey  wrote:


Would libSegFault be useful here?


Glad to know it has been moved to
 and not
just removed altogether after the upstream commit
.

libSegFault is safer than, say, libsegfault [*] because it both
supports SA_ONSTACK (for when a SIGSEGV is caused by stack overflow)
and avoids functions like snprintf() (which depend on the locale code,
which may have been the source of the crash). The only correctness
problem that may still be unaddressed is potential memory allocations
in backtrace() when it loads libgcc on first use. That should be easy
to fix by calling backtrace() once in segfault_init(). Unfortunately,
libSegFault is limited to glibc systems, so a different solution will
be needed on Windows, macOS and Linux systems with the musl libc.

Google-owned "backward" [**] tries to do most of this right, but (1) is
designed to be compiled together with C++ programs, not injected into
unrelated processes and (2) will exit the process if it survives
raise(signum), which will interfere with both rJava (judging by the
number of Java-related SIGSEGVs I saw while running R CMD check) and R's
own stack overflow survival attempts.

--
Best regards,
Ivan

[*] https://github.com/stass/libsegfault
(Which doesn't compile out of the box on GNU/Linux due to missing
pthread_np.h, although that should be easy to patch.)

[**] https://github.com/bombela/backward-cpp

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-03 Thread Ivan Krylov via R-package-devel
On Sun, 3 Mar 2024 19:19:43 -0800
Kevin Ushey  wrote:

> Would libSegFault be useful here?

Glad to know it has been moved to
 and not
just removed altogether after the upstream commit
.

libSegFault is safer than, say, libsegfault [*] because it both
supports SA_ONSTACK (for when a SIGSEGV is caused by stack overflow)
and avoids functions like snprintf() (which depend on the locale code,
which may have been the source of the crash). The only correctness
problem that may still be unaddressed is potential memory allocations
in backtrace() when it loads libgcc on first use. That should be easy
to fix by calling backtrace() once in segfault_init(). Unfortunately,
libSegFault is limited to glibc systems, so a different solution will
be needed on Windows, macOS and Linux systems with the musl libc.

Google-owned "backward" [**] tries to do most of this right, but (1) is
designed to be compiled together with C++ programs, not injected into
unrelated processes and (2) will exit the process if it survives
raise(signum), which will interfere with both rJava (judging by the
number of Java-related SIGSEGVs I saw while running R CMD check) and R's
own stack overflow survival attempts.

-- 
Best regards,
Ivan

[*] https://github.com/stass/libsegfault
(Which doesn't compile out of the box on GNU/Linux due to missing
pthread_np.h, although that should be easy to patch.)

[**] https://github.com/bombela/backward-cpp

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-03 Thread Kevin Ushey
Would libSegFault be useful here?
https://lemire.me/blog/2023/05/01/under-linux-libsegfault-and-addr2line-are-underrated/

On Sun, Mar 3, 2024, 5:15 PM Rolf Turner  wrote:

> On Sun, 3 Mar 2024 11:14:44 +0300
> Ivan Krylov via R-package-devel  wrote:
>
> > Hello,
> >
> > This may be of interest to people who run lots of R CMD checks and
> > have to deal with resulting crashes in compiled code.
>
> 
>
> > Is adding C-level backtraces to R CMD checks worth the effort? Could
> > it be a good idea to add this on CRAN? If yes, how can I help?
> >
>
> Sounds like an excellent idea to me, but I am not really qualified to
> judge.  Most of this stuff is was over my head.
>
> cheers,
>
> Rolf Turner
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Stats. Dep't. (secretaries) phone:
>  +64-9-373-7599 ext. 89622
> Home phone: +64-9-480-4619
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-03 Thread Rolf Turner
On Sun, 3 Mar 2024 11:14:44 +0300
Ivan Krylov via R-package-devel  wrote:

> Hello,
> 
> This may be of interest to people who run lots of R CMD checks and
> have to deal with resulting crashes in compiled code.



> Is adding C-level backtraces to R CMD checks worth the effort? Could
> it be a good idea to add this on CRAN? If yes, how can I help?
> 

Sounds like an excellent idea to me, but I am not really qualified to
judge.  Most of this stuff is was over my head.

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. (secretaries) phone:
 +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel