On 8/27/20 8:38 PM, Jeroen Ooms wrote:
On Wed, Aug 26, 2020 at 7:54 PM Tomas Kalibera <tomas.kalib...@gmail.com> wrote:
On 8/25/20 6:14 PM, Tomas Kalibera wrote:
On 8/22/20 9:33 PM, Jeroen Ooms wrote:
On Sat, Aug 22, 2020 at 9:10 PM Tomas Kalibera
<tomas.kalib...@gmail.com> wrote:
On 8/22/20 8:26 PM, Tomas Kalibera wrote:
On 8/22/20 7:58 PM, Jeroen Ooms wrote:
On Sat, Aug 22, 2020 at 8:39 AM Tomas Kalibera
<tomas.kalib...@gmail.com> wrote:
On 8/21/20 11:45 PM, m19tdn+9alxwj7d2bmk--- via R-devel wrote:
Ah yes, this is related. I reported v2010 below, but it looks like
I was updated to this Insider Build overnight without my knowledge,
and conflated it with the new installation R v4 this morning.

I will continue to look into the issue with the methods Tomas
mentioned.
It is interesting that a rare 5 years old problem would re-appear on
current Insider builds. Which build of Windows are you running
exactly?
I've seen another report about a crash on 20190.1000. It'd be
nice to
know if it is present also in newer builds, i.e. in 20197.
I installed the latest 20197 build in a vm, and I can indeed
reproduce
this problem.

What seems to be happening is that R triggers an infinite
recursion in
Windows unwinding mechanism, and eventually dies with a stack
overflow. Attached a backtrace of the initial 100 frames of the main
thread (the pattern in the top ~30 frames continues forever).

The microsoft blog doesn't mention anything related to exception
handling has changed in recent versions:
https://docs.microsoft.com/en-us/windows-insider/at-home/active-dev-branch


Thanks, unfortunately that does not ring any bells (except below), I
can't guess from this what is the underlying cause of the problem.
There may be something wrong in how we use setjmp/longjmp or how
setjmp/longjmp works on Windows.

It reminds me of a problem I've been debugging few days ago, when
longjump implementation segfaults on Windows 10 (recent but not
Insider build) probably soon after unwinding the stack, but only with
GCC 10 / MinGW 7 and only in one of the no-segfault tests, and only
with -03 (not -O2, not with with -O3 -fno-split-loops). The problem
was sensitive to these optimization options interestingly on the call
site of long jump (do_abs), even when it was not an immediate caller
of the longjump. I've not tracked this down yet, it will require
looking at the assembly level, and I was suspecting a compiler error
causing the compiler to generate code that messes with the stack or
registers in a way that impacts the upcoming jump. But now as we have
this other problem with setjmp/logjmp, the compiler may not be the top
suspect anymore.

I may not be able to work on this in the next few days or a week, so
if anyone gets there first, please let me know what you find out.
Btw could you please try out if the UCRT build of R crashes as well in
the Insider Windows build ?
Yes, it hangs in exactly the same way, except that the backtrace shows

   ucrtbase!.intrinsic_setjmpex () from C:\WINDOWS\System32\ucrtbase.dll

Instead of msvcrt!_setjmpex (as expected of course).
Thanks. I found what is causing the problem I observed with
GCC10/stock Windows 10, I expect this is the same one as in the
Insider build.
I will investigate further,

Tomas

It seems the problem is between MinGW-W64 and Windows, and really it
causes both the reported crashes in an Insider build (I tested in 20197)
and in my GCC 10 builds in a single "no-segfault" test. setjmp is
implemented using Windows call _setjmpex, which has a second argument
argument, which is set differently by MinGW based on GCC version. When I
set this argument as MinGW-W64 did on early versions of GCC,
mingw_getsp(), it fixes/hides the problems on my systems. Perl5 uses a
similar workaround, but otherwise there is no solid base (documentation,
specification, etc) I am aware of for this change, so this may take some
more time to be properly fixed. Still, if anyone experiments with this
workaround and finds a problem, please let me know. In particular, I am
curious whether it works on earlier versions of Windows (at least with
check-all, including recommended packages).
FYI, the problem has disappeared on Windows dev built 20201 (released
yesterday), so it may have been a Windows bug. That is not to say
there is no bug on the R/mingw side, but at least the current and past
releases of R are working again on the latest versions of Windows,
which is a big relief.

I've added a workaround, for now only to R-devel, which fixes both issues:

- infinite recursion on startup in 20197 (and some other pre-releases, as reported by others) - segfault during longjump with gcc10 in multiple versions of Windows 10, including 20211

The workaround uses NULL as the second argument to _setjmpex, which effectively disables SEH in internal R code for jump targets created using R's setjmp. This provides the same behavior as we have on Linux, potentially improves performance, and most importantly makes the problem go away. I've tested on CRAN/BIOC packages and did not find any issues, but potentially this could uncover bugs related to improper use of C++ with R (relying on that C++ destructors are run on R errors/long jumps). Such bugs should, however, have already been found on Linux where destructors were never run on long jumps.

Tomas

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to