Re: Segfault while building on 64-bit Cygwin
Aah, you all are amazing -- thank you!! Applied and merged. Cheers, Andy On Mon 17 Feb 2020 20:27, Charles Stanhope writes: > On 2/16/20, Charles Stanhope wrote: >> On 2/16/20, Mike Gran wrote: >>> >>> I can confirm that Charles's patch, plus another one line patch >>> to define CPU_SETSIZE, is enough to get Guile 3.0.x to build and run >>> on my box. All tests pass except strptime in French, and the absence >>> of crypt. This is a 64-bit build. >> >> Mike, thanks for going further with the Guile build. The CPU_SETSIZE >> issue was what was hanging me up from compiling before Andy's comment >> got me to look at lightening. I assumed I had some configuration, >> package, or compiler issue. Good to know there's a simple fix. >> >> Just a further warning to anyone watching, that patch I posted is a >> real hack job just to test my theory of the cause of the segfault. I >> would expect it to fail when you have fewer than four arguments in a >> JITed function call. I wouldn't try doing much else with that Guile >> build besides run the tests. :) > > I had a little bit more time to look into the lightening > implementation last night. I've attached a patch that is less horrible > and more correct than my previous one. It reserves the stack space > regardless of the number of parameters and appears to work. But I'm > new to the lightening code base, so I'm not convinced it is the > correct solution. It's just the solution I was left with after my time > ran out. I wanted to post this patch as a replacement to the prior one > in case people did want to do more testing with Guile 3.0 on Cygwin > x64. > > With that, I will let more experienced people come up with the > appropriate solution. Happy hacking, everybody! > > -- > Charles > > diff --git a/lightening/x86.c b/lightening/x86.c > index 965191a..bdd26e1 100644 > --- a/lightening/x86.c > +++ b/lightening/x86.c > @@ -328,6 +328,10 @@ reset_abi_arg_iterator(struct abi_arg_iterator *iter, > size_t argc, >memset(iter, 0, sizeof *iter); >iter->argc = argc; >iter->args = args; > +#if __CYGWIN__ && __X64 > + // Reserve slots on the stack for 4 register parameters (8 bytes each). > + iter->stack_size = 32; > +#endif > } > > static void
Re: Segfault while building on 64-bit Cygwin
On 2/16/20, Charles Stanhope wrote: > On 2/16/20, Mike Gran wrote: >> >> I can confirm that Charles's patch, plus another one line patch >> to define CPU_SETSIZE, is enough to get Guile 3.0.x to build and run >> on my box. All tests pass except strptime in French, and the absence >> of crypt. This is a 64-bit build. > > Mike, thanks for going further with the Guile build. The CPU_SETSIZE > issue was what was hanging me up from compiling before Andy's comment > got me to look at lightening. I assumed I had some configuration, > package, or compiler issue. Good to know there's a simple fix. > > Just a further warning to anyone watching, that patch I posted is a > real hack job just to test my theory of the cause of the segfault. I > would expect it to fail when you have fewer than four arguments in a > JITed function call. I wouldn't try doing much else with that Guile > build besides run the tests. :) I had a little bit more time to look into the lightening implementation last night. I've attached a patch that is less horrible and more correct than my previous one. It reserves the stack space regardless of the number of parameters and appears to work. But I'm new to the lightening code base, so I'm not convinced it is the correct solution. It's just the solution I was left with after my time ran out. I wanted to post this patch as a replacement to the prior one in case people did want to do more testing with Guile 3.0 on Cygwin x64. With that, I will let more experienced people come up with the appropriate solution. Happy hacking, everybody! -- Charles diff --git a/lightening/x86.c b/lightening/x86.c index 965191a..bdd26e1 100644 --- a/lightening/x86.c +++ b/lightening/x86.c @@ -328,6 +328,10 @@ reset_abi_arg_iterator(struct abi_arg_iterator *iter, size_t argc, memset(iter, 0, sizeof *iter); iter->argc = argc; iter->args = args; +#if __CYGWIN__ && __X64 + // Reserve slots on the stack for 4 register parameters (8 bytes each). + iter->stack_size = 32; +#endif } static void
Re: Segfault while building on 64-bit Cygwin
On 2/16/20, Mike Gran wrote: > On Fri, Feb 14, 2020 at 09:46:04AM -0800, Charles Stanhope wrote: >> Andy, I don't know if you'd want to continue this here or on >> lightening's gitlab page, but I looked into this a little bit a few >> minutes here and there this past weeek. The x86 "fast-call" calling >> convention used on Windows x64[0] and shared by Cygwin[1] requires >> that the caller reserve 32 bytes of memory on the stack for the callee >> to spill the register parameters (even if the callee takes fewer than >> four parameters). I think lightening is currently missing that for the >> x64 case for Cygwin. >> >> To test the idea, I made a small modification (patch attached) that is >> *not* intended as a solution as it doesn't work for the general case, >> but it does allow the tests to pass on Cygwin 64. > > I can confirm that Charles's patch, plus another one line patch > to define CPU_SETSIZE, is enough to get Guile 3.0.x to build and run > on my box. All tests pass except strptime in French, and the absence > of crypt. This is a 64-bit build. Mike, thanks for going further with the Guile build. The CPU_SETSIZE issue was what was hanging me up from compiling before Andy's comment got me to look at lightening. I assumed I had some configuration, package, or compiler issue. Good to know there's a simple fix. Just a further warning to anyone watching, that patch I posted is a real hack job just to test my theory of the cause of the segfault. I would expect it to fail when you have fewer than four arguments in a JITed function call. I wouldn't try doing much else with that Guile build besides run the tests. :) -- Charles
Re: Segfault while building on 64-bit Cygwin
Excellent, and thank you all! I've been WIndowsless for a few weeks, but that should change again soon. On Sun, Feb 16, 2020 at 6:23 PM Mike Gran wrote: > On Fri, Feb 14, 2020 at 09:46:04AM -0800, Charles Stanhope wrote: > > Andy, I don't know if you'd want to continue this here or on > > lightening's gitlab page, but I looked into this a little bit a few > > minutes here and there this past weeek. The x86 "fast-call" calling > > convention used on Windows x64[0] and shared by Cygwin[1] requires > > that the caller reserve 32 bytes of memory on the stack for the callee > > to spill the register parameters (even if the callee takes fewer than > > four parameters). I think lightening is currently missing that for the > > x64 case for Cygwin. > > > > To test the idea, I made a small modification (patch attached) that is > > *not* intended as a solution as it doesn't work for the general case, > > but it does allow the tests to pass on Cygwin 64. > > I can confirm that Charles's patch, plus another one line patch > to define CPU_SETSIZE, is enough to get Guile 3.0.x to build and run > on my box. All tests pass except strptime in French, and the absence > of crypt. This is a 64-bit build. > > -Mike Gran >
Re: Segfault while building on 64-bit Cygwin
Am 14.02.2020 um 18:46 schrieb Charles Stanhope: On 2/6/20, Charles Stanhope wrote: On 2/6/20, Andy Wingo wrote: Given that John said that compilation went fine with GUILE_JIT_THRESHOLD=-1, I think perhaps this problem may have been fixed in the past. My suspicions are that this issue is an ABI issue with lightening that could perhaps be reproduced by: git co https://gitlab.com/wingo/lightening cd lightening make -C tests test-native Of course any additional confirmation is useful and welcome! I haven't been able to get guile to compile under Cygwin (just a compilation error I haven't had time to track down), but I was able to quickly try the above. I get: Testing: test-native-call_10 call_10.c:9: assertion failed: e == 4 /bin/sh: line 1: 7063 Aborted (core dumped) ./$test make: *** [Makefile:31: test-native] Error 134 Andy, I don't know if you'd want to continue this here or on lightening's gitlab page, but I looked into this a little bit a few minutes here and there this past weeek. The x86 "fast-call" calling convention used on Windows x64[0] and shared by Cygwin[1] requires that the caller reserve 32 bytes of memory on the stack for the callee to spill the register parameters (even if the callee takes fewer than four parameters). I think lightening is currently missing that for the x64 case for Cygwin. To test the idea, I made a small modification (patch attached) that is *not* intended as a solution as it doesn't work for the general case, but it does allow the tests to pass on Cygwin 64. [0] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 [1] https://cygwin.com/cygwin-ug-net/programming.html#gcc-64 -- Charles as guile 3.0 builds fine on Cygwin i686 but segfault immediately on bootstrap for x86_64 I bet you are right on the root cause Marco
Re: Segfault while building on 64-bit Cygwin
On 2/6/20, Charles Stanhope wrote: > On 2/6/20, Andy Wingo wrote: > >> Given that John said that compilation went fine with >> GUILE_JIT_THRESHOLD=-1, I think perhaps this problem may have been fixed >> in the past. My suspicions are that this issue is an ABI issue with >> lightening that could perhaps be reproduced by: >> >> git co https://gitlab.com/wingo/lightening >> cd lightening >> make -C tests test-native >> >> Of course any additional confirmation is useful and welcome! > > I haven't been able to get guile to compile under Cygwin (just a > compilation error I haven't had time to track down), but I was able to > quickly try the above. I get: > > Testing: test-native-call_10 > call_10.c:9: assertion failed: e == 4 > /bin/sh: line 1: 7063 Aborted (core dumped) ./$test > make: *** [Makefile:31: test-native] Error 134 > Andy, I don't know if you'd want to continue this here or on lightening's gitlab page, but I looked into this a little bit a few minutes here and there this past weeek. The x86 "fast-call" calling convention used on Windows x64[0] and shared by Cygwin[1] requires that the caller reserve 32 bytes of memory on the stack for the callee to spill the register parameters (even if the callee takes fewer than four parameters). I think lightening is currently missing that for the x64 case for Cygwin. To test the idea, I made a small modification (patch attached) that is *not* intended as a solution as it doesn't work for the general case, but it does allow the tests to pass on Cygwin 64. [0] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 [1] https://cygwin.com/cygwin-ug-net/programming.html#gcc-64 -- Charles diff --git a/lightening/x86.c b/lightening/x86.c index 965191a..91b3a94 100644 --- a/lightening/x86.c +++ b/lightening/x86.c @@ -338,11 +338,13 @@ next_abi_arg(struct abi_arg_iterator *iter, jit_operand_t *arg) if (is_gpr_arg(abi) && iter->gpr_idx < abi_gpr_arg_count) { *arg = jit_operand_gpr (abi, abi_gpr_args[iter->gpr_idx++]); #ifdef __CYGWIN__ +iter->stack_size += 8; iter->fpr_idx++; #endif } else if (is_fpr_arg(abi) && iter->fpr_idx < abi_fpr_arg_count) { *arg = jit_operand_fpr (abi, abi_fpr_args[iter->fpr_idx++]); #ifdef __CYGWIN__ +iter->stack_size += 8; iter->gpr_idx++; #endif } else {
Re: Segfault while building on 64-bit Cygwin
On 2/6/20, Andy Wingo wrote: > Given that John said that compilation went fine with > GUILE_JIT_THRESHOLD=-1, I think perhaps this problem may have been fixed > in the past. My suspicions are that this issue is an ABI issue with > lightening that could perhaps be reproduced by: > > git co https://gitlab.com/wingo/lightening > cd lightening > make -C tests test-native > > Of course any additional confirmation is useful and welcome! I haven't been able to get guile to compile under Cygwin (just a compilation error I haven't had time to track down), but I was able to quickly try the above. I get: Testing: test-native-call_10 call_10.c:9: assertion failed: e == 4 /bin/sh: line 1: 7063 Aborted (core dumped) ./$test make: *** [Makefile:31: test-native] Error 134 Despite what it says about a core dump, I find no such thing. Just a file with the same name as the executable suffixed with ".stackdump". (I did attempt to configure the Cygwin dumper before running the tests.) Unless somebody suggests otherwise, I think the error message is more useful. -- Charles
Re: Segfault while building on 64-bit Cygwin
On Mon 20 Jan 2020 18:22, Mike Gran writes: > On Mon, Jan 20, 2020 at 11:38:35AM -0500, John Cowan wrote: >> Yes, gladly, but I don't know how to get one in this context. Do I need to >> add some flags to the Makefile, and if so, where? (It's a twisty maze of >> passages, all different.) . Note that this *is* a build with JIT enabled; >> when I disable it using the env variable, there are no errors and 3.0.0 >> works fine. >> >> Also, it may take some time, as I have to rebuild my Windows system. > > I also tried building Guile 3.0.0 on Cygwin 3.1.x. The failure comes from > trying to parse compiled .go files. > > The last time that I had this sort of problem, it was because the > O_BINARY flag was dropped or missing when writing .go files, leading > to CR+LF characters in the compiled files. And I diagnosed it by > byte-comparing Linux-compiled .go files with Cygwin-compiled .go > files, and by looking for CR+LF combinations in the compiled .go > files. > > I don't know if that is what is happening here, but, I'll check that > next time I have a chance. Given that John said that compilation went fine with GUILE_JIT_THRESHOLD=-1, I think perhaps this problem may have been fixed in the past. My suspicions are that this issue is an ABI issue with lightening that could perhaps be reproduced by: git co https://gitlab.com/wingo/lightening cd lightening make -C tests test-native Of course any additional confirmation is useful and welcome! Cheers, Andy
Re: Segfault while building on 64-bit Cygwin
Aaaand... Cygwin doesn't do core dumps. Under the skin it's WIndows, after all. This is what I get when I specify ulimit -c unlimited and rebuild: Exception: STATUS_ACCESS_VIOLATION at rip=0055A8B1B25 rax= rbx=FF90 rcx=FF90 rdx=0034964A rsi=0784ECC0 rdi=FF90 r8 =0784ECC0 r9 =0002 r10=0001 r11=00055A86B190 r12=0002 r13=00055A931EA0 r14=06FEF840 r15= rbp=0034964A rsp=BDA0 program=C:\Users\rr828893\Downloads\guile-3.0.0\libguile\.libs\guile.exe, pid 62833, thread main cs=0033 ds=002B es=002B fs=0053 gs=002B ss=002B I can't imagine what you can make of that. On Sat, Jan 25, 2020 at 10:54 AM John Cowan wrote: > > > On Sat, Jan 25, 2020 at 8:51 AM Ludovic Courtès wrote: > > >> That I understand. However, I was asking for the backtrace of the crash >> on Cygwin when JIT is enabled. Could you grab it? >> > > 1. The wisdom of the Internet has not been able to figure out how to > generate a core dump on MacOS 10.15.2 (Catalina). The usual set of > enabling steps can be performed without error, but still no core dump. > > 2. Until today I believed that there was no way to generate a Cygwin core > dump. I know now that there is, but I may not be able to test it until > Monday. I'll let you know, and hopefully that will provide insight into > the MacOS problem as well. > > 3. I will try to work further on the MacOS libffi problem (which surfaces > when you do --disable-jit to bypass the above problem) to convince MacOS to > use GNU libffi rather than the native one. It probably has to do with > pkg-config, which I barely understand. > > "All problems are config problems." > > > > John Cowan http://vrici.lojban.org/~cowanco...@ccil.org > We are lost, lost. No name, no business, no Precious, nothing. Only > empty. > Only hungry: yes, we are hungry. A few little fishes, nassty bony little > fishes, for a poor creature, and they say death. So wise they are; so > just, > so very just. --Gollum > guile.exe.stackdump Description: Binary data
Re: Segfault while building on 64-bit Cygwin
On Sat, Jan 25, 2020 at 8:51 AM Ludovic Courtès wrote: > That I understand. However, I was asking for the backtrace of the crash > on Cygwin when JIT is enabled. Could you grab it? > 1. The wisdom of the Internet has not been able to figure out how to generate a core dump on MacOS 10.15.2 (Catalina). The usual set of enabling steps can be performed without error, but still no core dump. 2. Until today I believed that there was no way to generate a Cygwin core dump. I know now that there is, but I may not be able to test it until Monday. I'll let you know, and hopefully that will provide insight into the MacOS problem as well. 3. I will try to work further on the MacOS libffi problem (which surfaces when you do --disable-jit to bypass the above problem) to convince MacOS to use GNU libffi rather than the native one. It probably has to do with pkg-config, which I barely understand. "All problems are config problems." John Cowan http://vrici.lojban.org/~cowanco...@ccil.org We are lost, lost. No name, no business, no Precious, nothing. Only empty. Only hungry: yes, we are hungry. A few little fishes, nassty bony little fishes, for a poor creature, and they say death. So wise they are; so just, so very just. --Gollum
Re: Segfault while building on 64-bit Cygwin
John Cowan skribis: > Both Cygwin and MacOS crash in pretty much the same way. By disabling the > JIT, I was able to get the Cygwin build to run to completion. That I understand. However, I was asking for the backtrace of the crash on Cygwin when JIT is enabled. Could you grab it? Thanks in advance, Ludo’.
Re: Segfault while building on 64-bit Cygwin
Both Cygwin and MacOS crash in pretty much the same way. By disabling the JIT, I was able to get the Cygwin build to run to completion. On MacOS with --disable-jit, however, I am now getting an entirely new failure: CC readline.lo readline.c:432:7: warning: implicitly declaring library function 'strncmp' with type 'int (const char *, const char *, unsigned long)' [-Wimplicit-function-declaration] if (strncmp (rl_get_keymap_name (rl_get_keymap ()), "vi", 2)) ^ readline.c:432:7: note: include the header or explicitly provide a declaration for 'strncmp' readline.c:432:16: warning: implicit declaration of function 'rl_get_keymap_name' is invalid in C99 [-Wimplicit-function-declaration] if (strncmp (rl_get_keymap_name (rl_get_keymap ()), "vi", 2)) ^ readline.c:432:16: warning: incompatible integer to pointer conversion passing 'int' to parameter of type 'const char *' [-Wint-conversion] if (strncmp (rl_get_keymap_name (rl_get_keymap ()), "vi", 2)) ^ 3 warnings generated. CCLD guile-readline.la Undefined symbols for architecture x86_64: "_rl_get_keymap_name", referenced from: _scm_init_readline in readline.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) On Thu, Jan 23, 2020 at 3:35 PM Ludovic Courtès wrote: > Hi, > > John Cowan skribis: > > > Thanks. Unfortunately, the standard recipe for making core dumps on Mac > > This bug report is about Cygwin, not macOS, right? :-) > > Ludo’. >
Re: Segfault while building on 64-bit Cygwin
Hi, John Cowan skribis: > Thanks. Unfortunately, the standard recipe for making core dumps on Mac This bug report is about Cygwin, not macOS, right? :-) Ludo’.
Re: Segfault while building on 64-bit Cygwin
Thanks. Unfortunately, the standard recipe for making core dumps on Mac (put "limit core unlimited" into /etc/launchd.conf and reboot, make sure /cores is writable, set ulimit -c unlimited) seem to actually enable them on MacOS Catalina (10.15.2). I have tested with SIGQUIT and SIGSEGV on running processes and no dumps appear in /cores. On Tue, Jan 21, 2020 at 4:02 AM Ludovic Courtès wrote: > Hello, > > John Cowan skribis: > > > Yes, gladly, but I don't know how to get one in this context. > > You would unpack, configure, and build like you did before (with JIT > enabled, so as to reproduce the crash), but before that you’d run > “ulimit -c unlimited” in that shell to make sure there’s a core dumped > when it crashes. > > Once it has crashed, locate the ‘core’ file (or ‘core.*’), and run, say: > > gdb libguile/.libs/guile bootstrap/core > > Then from the GDB prompt: > > thread apply all bt > > TIA, > Ludo’. >
Re: Segfault while building on 64-bit Cygwin
Hello, John Cowan skribis: > Yes, gladly, but I don't know how to get one in this context. You would unpack, configure, and build like you did before (with JIT enabled, so as to reproduce the crash), but before that you’d run “ulimit -c unlimited” in that shell to make sure there’s a core dumped when it crashes. Once it has crashed, locate the ‘core’ file (or ‘core.*’), and run, say: gdb libguile/.libs/guile bootstrap/core Then from the GDB prompt: thread apply all bt TIA, Ludo’.
Re: Segfault while building on 64-bit Cygwin
Yes, gladly, but I don't know how to get one in this context. Do I need to add some flags to the Makefile, and if so, where? (It's a twisty maze of passages, all different.) . Note that this *is* a build with JIT enabled; when I disable it using the env variable, there are no errors and 3.0.0 works fine. Also, it may take some time, as I have to rebuild my Windows system.