Re: [Chicken-hackers] How to interpret chicken post mortem?
Am 26.11.2015 um 20:08 schrieb Peter Bex: > On Thu, Nov 26, 2015 at 06:01:13PM +0100, Jörg F. Wittenberger wrote: >> Am 26.11.2015 um 11:34 schrieb Peter Bex: Error: (assq) bad argument type: # >>> Do you also get this when compiling said code with the 4.10.1 snapshot? >> >> I get the same strange segfaults from 4.10.1 snapshot (plus both the >> mutex-related fixes I posted these days as they are essential to work >> long enough). > > Hi Joerg, > > Unfortunately, it is impossible for us to debug this without some kind > of reproducible test case. I know, I know. I just hoped this may trigger an idea in someones head. > If we can't have that, a full unoptimised build's stack trace would be > useful instead of the truncated snippet full of "optimised out" values > you posted. The optimized stuff was the 4.10.1 as downloaded. I collected several snippets from the beginnings of gdb backtraces from the master debug-build. Though their value may be of limited help now that I recompiled the whole thing using the other compiler version. I just started yet another recompile, expecting results tomorrow. So what exactly would be helpful? (Short of a 10-liner reliably reproducing the problem.) I can't correlate the breakage to any activity of mine so far. So what is does by itself are two things: A) (The background job) walk down a directory structure filled with XML files. Each file it reads/parses and mirrors some of the information in a sqlite database. Limited to 5 files per second. B) It maintains persistent tcp connections to two handful of external IP-addresses and talks to them sometimes. In other words: nothing too fancy. So how many frames the gdb backtrace would be interesting? How many of the would you like? And which way do you like the traces, .c and source files? (Certainly not attached to a posting here.) CU /Jörg > But only if you have the C code that goes along with it, > because f_1234 doesn't mean anything without being able to look at the > C code: different compiler flags and different versions of CHICKEN will > cause it to generate completely different C output. > > Cheers, > Peter > signature.asc Description: OpenPGP digital signature ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] How to interpret chicken post mortem?
On Thu, Nov 26, 2015 at 06:01:13PM +0100, Jörg F. Wittenberger wrote: > Am 26.11.2015 um 11:34 schrieb Peter Bex: > >> Error: (assq) bad argument type: # > > Do you also get this when compiling said code with the 4.10.1 snapshot? > > I get the same strange segfaults from 4.10.1 snapshot (plus both the > mutex-related fixes I posted these days as they are essential to work > long enough). Hi Joerg, Unfortunately, it is impossible for us to debug this without some kind of reproducible test case. If we can't have that, a full unoptimised build's stack trace would be useful instead of the truncated snippet full of "optimised out" values you posted. But only if you have the C code that goes along with it, because f_1234 doesn't mean anything without being able to look at the C code: different compiler flags and different versions of CHICKEN will cause it to generate completely different C output. Cheers, Peter signature.asc Description: Digital signature ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] How to interpret chicken post mortem?
Am 26.11.2015 um 11:34 schrieb Peter Bex: >> Error: (assq) bad argument type: # > Do you also get this when compiling said code with the 4.10.1 snapshot? I get the same strange segfaults from 4.10.1 snapshot (plus both the mutex-related fixes I posted these days as they are essential to work long enough). /Jörg signature.asc Description: OpenPGP digital signature ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] How to interpret chicken post mortem?
Am 26.11.2015 um 13:56 schrieb Jörg F. Wittenberger: > Am 26.11.2015 um 11:34 schrieb Peter Bex: >> On Thu, Nov 26, 2015 at 11:29:20AM +0100, Jörg F. Wittenberger wrote: >>> Something is definitely wrong at master and it is non-deterministic. ... >>> (if (let ((g10816 key)) (trstcntl#x509-subject-hash cert)) >>> (let ((t10269 ...)) (let (...) (util#remove-file ...))) >>> (##core#undefined)) >>> >>> Error: (assq) bad argument type: # >> >> Do you also get this when compiling said code with the 4.10.1 snapshot? > > So far i did not try. ... > I'll report when it breaks. Sure still on master. This time the gdb stack trace may be enlightening. (Let me promise one thing: this wt-tree code is innocent. I was not touched for many years and is heavily used.) So obviously t4...t7 are loaded from rather strange addresses here: #0 f_4526 (c=7, av=0xbe186050) at wttree.c:4637 4637C_word ab[13],*a=ab; [Current thread is 1 (Thread 0x4009a210 (LWP 7600))] (gdb) bt #0 f_4526 (c=7, av=0xbe186050) at wttree.c:4637 #1 0x00146b54 in f_3409 (c=, av=) at wttree.c:6087 #2 0x001470a4 in f_4320 (c=, av=0xbe187a14) at wttree.c:1696 #3 0x00145990 in f_7008 (t0=, t1=-1105692140, t2=, t3=, t4=, t5=, t6=, t7=) at wttree.c:7604 #4 0x0014eff0 in f_4511 (c=, av=) at wttree.c:4793 #5 0x0014f320 in f_4658 (c=, av=0xbe1862c0) at wttree.c:2947 Does this help in any way? Best /Jörg signature.asc Description: OpenPGP digital signature ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] How to interpret chicken post mortem?
Am 26.11.2015 um 11:34 schrieb Peter Bex: > On Thu, Nov 26, 2015 at 11:29:20AM +0100, Jörg F. Wittenberger wrote: >> Something is definitely wrong at master and it is non-deterministic. >> >> I may be able to help narrowing this down, since I get these segfaults >> within minutes. But I lack any idea what to look for by now. >> >> Most of the time I get a segfault, that is. (NB: No matter whether or >> not I run with -:S the segfault is never caught.) >> >> Sometimes I just get weird results. This also happened in csc, though >> only once while compiling 102 modules totaling ~10 lines of code: >> >> >> Note: in toplevel procedure `cntrl#ball-control-default': >> expected value of type boolean in conditional but were given a value >> of type >> `string' which is always true: >> >> (if (let ((g10816 key)) (trstcntl#x509-subject-hash cert)) >> (let ((t10269 ...)) (let (...) (util#remove-file ...))) >> (##core#undefined)) >> >> Error: (assq) bad argument type: # > > Do you also get this when compiling said code with the 4.10.1 snapshot? So far i did not try. Actually I've been slightly incorrect: I compiled this code several times. Only once I observed the error from csc. (And once more, maybe; 1st time recall only vaguely to have ignored "something weird" and had to recompile anyway.) So chances to reproduce it are basically nil. Especially when taking into account that it may be any kind of error I'd expect. I'll try to recompile it a couple of times in a loop. (Which will take days to complete.) I'll report when it breaks. /Jörg signature.asc Description: OpenPGP digital signature ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] How to interpret chicken post mortem?
On Thu, Nov 26, 2015 at 11:29:20AM +0100, Jörg F. Wittenberger wrote: > Something is definitely wrong at master and it is non-deterministic. > > I may be able to help narrowing this down, since I get these segfaults > within minutes. But I lack any idea what to look for by now. > > Most of the time I get a segfault, that is. (NB: No matter whether or > not I run with -:S the segfault is never caught.) > > Sometimes I just get weird results. This also happened in csc, though > only once while compiling 102 modules totaling ~10 lines of code: > > > Note: in toplevel procedure `cntrl#ball-control-default': > expected value of type boolean in conditional but were given a value > of type > `string' which is always true: > > (if (let ((g10816 key)) (trstcntl#x509-subject-hash cert)) > (let ((t10269 ...)) (let (...) (util#remove-file ...))) > (##core#undefined)) > > Error: (assq) bad argument type: # Do you also get this when compiling said code with the 4.10.1 snapshot? Cheers, Peter signature.asc Description: Digital signature ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] How to interpret chicken post mortem?
Something is definitely wrong at master and it is non-deterministic. I may be able to help narrowing this down, since I get these segfaults within minutes. But I lack any idea what to look for by now. Most of the time I get a segfault, that is. (NB: No matter whether or not I run with -:S the segfault is never caught.) Sometimes I just get weird results. This also happened in csc, though only once while compiling 102 modules totaling ~10 lines of code: Note: in toplevel procedure `cntrl#ball-control-default': expected value of type boolean in conditional but were given a value of type `string' which is always true: (if (let ((g10816 key)) (trstcntl#x509-subject-hash cert)) (let ((t10269 ...)) (let (...) (util#remove-file ...))) (##core#undefined)) Error: (assq) bad argument type: # The cure: compile again. Second time it worked. Looks like I can not switch to master now. :-/ (I'm running this from master with debugbuild; no optimizations.) Any suggestions? Thanks /Jörg Am 24.11.2015 um 21:48 schrieb Jörg F. Wittenberger: > Hi all, > > just managed to switch to the master branch, eventually. (Congrats to > myself, sight ;-) > > > Now there is a segfault. Looks almost random. The program runs for a > fairly long time until crash. (Megabytes of chicken -:r backtrace and > no hint either). Some gdb backtraces from a debug build below. > > Anybody having an idea how to narrow this down? > > Thanks so much > > /Jörg > > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00a004b4 in f_10074 (c=4, av=0xbe2b503c) at library.c:26635 > 26635 if(!C_demand(c*C_SIZEOF_PAIR+5)){ > [Current thread is 1 (Thread 0x400e9210 (LWP 19666))] > (gdb) bt > #0 0x00a004b4 in f_10074 (c=4, av=0xbe2b503c) at library.c:26635 > #1 0x00c54858 in f_20122 (t0=-1104456948, t1=-1104456232, t2=3) > at irregex.c:32507 > #2 0x00c550f8 in f_20143 (c=2, av=0xbe2b510c) at irregex.c:32659 > #3 0x00d632b8 in allocate_vector_2 (c=0, av=0x104c3a4) at runtime.c:7268 > #4 0x00d62f50 in C_allocate_vector (c=6, av=0xbe2b51ac) at runtime.c:7217 > #5 0x00a02fb4 in f_10081 (c=2, av=0xbe2b5238) at library.c:27315 > #6 0x00a0062c in f_10074 (c=4, av=0xbe2b5294) at library.c:26659 > #7 0x00c54858 in f_20122 (t0=-1104456948, t1=-1104456232, t2=1) > at irregex.c:32507 > #8 0x00c5434c in f_20117 (c=2, av=0xbe2b537c) at irregex.c:32409 > #9 0x00d632b8 in allocate_vector_2 (c=0, av=0x104c3a4) at runtime.c:7268 > #10 0x00d62f50 in C_allocate_vector (c=6, av=0xbe2b5434) at runtime.c:7217 > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00c8f16c in C_i_pairp ( > x=) > at ./chicken.h:2236 > 2236 { > [Current thread is 1 (Thread 0x4004a210 (LWP 19876))] > (gdb) bt > #0 0x00c8f16c in C_i_pairp ( > x=) > at ./chicken.h:2236 > #1 0x00c8d104 in f_1713 (t0=-1105088272, t1=-1105088364, t2=1126061980) > at scheduler.c:3109 > #2 0x00c88dcc in f_1810 (t0=-1105088208, t1=-1105088248, t2=1125996968) > at scheduler.c:1696 > #3 0x00c8cc88 in f_1702 (t0=-1105085432, t1=-1105085252, t2=-1096772972) > at scheduler.c:2955 > #4 0x00c88474 in f_1798 (c=2, av=0xbe21b200) at scheduler.c:1523 > #5 0x00c88e00 in f_1810 (t0=-1105087552, t1=-1105087592, t2=14) at > scheduler.c:1703 > #6 0x00c890fc in f_1820 (c=2, av=0xbe21b2a0) at scheduler.c:1765 > #7 0x00c8d1cc in f_1713 (t0=-1105087616, t1=-1105087708, t2=1125996908) > at scheduler.c:3119 > #8 0x00c88dcc in f_1810 (t0=-1105087552, t1=-1105087592, t2=1125935220) > at scheduler.c:1696 > #9 0x00c8cc88 in f_1702 (t0=-1105085432, t1=-1105085252, t2=-1096773036) > at scheduler.c:2955 > #10 0x00c88474 in f_1798 (c=2, av=0xbe21b490) at scheduler.c:1523 > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 f_1713 (t0=-1106919288, t1=-1106919380, t2=1137556300) at > scheduler.c:3105 > 3105 C_word ab[5],*a=ab; > [Current thread is 1 (Thread 0x400c1210 (LWP 19953))] > (gdb) bt > #0 f_1713 (t0=-1106919288, t1=-1106919380, t2=1137556300) at > scheduler.c:3105 > #1 0x00c88dcc in f_1810 (t0=-1106919224, t1=-1106919264, t2=1137487936) > at scheduler.c:1696 > #2 0x00c8cc88 in f_1702 (t0=-1106915792, t1=-1106915612, t2=-1098645684) > at scheduler.c:2955 > #3 0x00c88474 in f_1798 (c=2, av=0xbe05c198) at scheduler.c:1523 > #4 0x00c88e00 in f_1810 (t0=-1106918568, t1=-1106918608, t2=14) at > scheduler.c:1703 > #5 0x00c890fc in f_1820 (c=2, av=0xbe05c238) at scheduler.c:1765 > #6 0x00c8d1cc in f_1713 (t0=-1106918632, t1=-1106918724, t2=1137487876) > at scheduler.c:3119 > #7 0x00c88dcc in f_1810 (t0=-1106918568, t1=-1106918608, t2=1137413272) > at scheduler.c:1696 > #8 0x00c8cc88 in f_1702 (t0=-1106915792, t1=-1106915612, t2=-1098645748) > at scheduler.c:2955 > #9 0x00c88474 in f_1798 (c=2, av=0xbe05c428) at scheduler.c:1523 > #10 0x00c88e00 in f_1810 (t0=-1106917912, t1=-1106917952, t2=14) at > scheduler.c:1703 > #11 0x00c890fc in f