Re: [Chicken-hackers] How to interpret chicken post mortem?

2015-11-26 Thread Jörg F . Wittenberger
Am 26.11.2015 um 20:08 schrieb Peter Bex:
> On Thu, Nov 26, 2015 at 06:01:13PM +0100, Jörg F. Wittenberger wrote:
>> Am 26.11.2015 um 11:34 schrieb Peter Bex:
 Error: (assq) bad argument type: #
>>> Do you also get this when compiling said code with the 4.10.1 snapshot?
>>
>> I get the same strange segfaults from 4.10.1 snapshot (plus both the
>> mutex-related fixes I posted these days as they are essential to work
>> long enough).
> 
> Hi Joerg,
> 
> Unfortunately, it is impossible for us to debug this without some kind
> of reproducible test case.

I know, I know.  I just hoped this may trigger an idea in someones head.

> If we can't have that, a full unoptimised build's stack trace would be
> useful instead of the truncated snippet full of "optimised out" values
> you posted.

The optimized stuff was the 4.10.1 as downloaded.  I collected several
snippets from the beginnings of gdb backtraces from the master
debug-build.  Though their value may be of limited help now that I
recompiled the whole thing using the other compiler version.

I just started yet another recompile, expecting results tomorrow.

So what exactly would be helpful?  (Short of a 10-liner reliably
reproducing the problem.)

I can't correlate the breakage to any activity of mine so far.  So what
is does by itself are two things:  A) (The background job) walk down a
directory structure filled with XML files.  Each file it reads/parses
and mirrors some of the information in a sqlite database.  Limited to 5
files per second.  B) It maintains persistent tcp connections to two
handful of external IP-addresses and talks to them sometimes.

In other words: nothing too fancy.

So how many frames the gdb backtrace would be interesting?  How many of
the would you like?  And which way do you like the traces, .c and source
files?  (Certainly not attached to a posting here.)

CU

/Jörg

>  But only if you have the C code that goes along with it,
> because f_1234 doesn't mean anything without being able to look at the
> C code: different compiler flags and different versions of CHICKEN will
> cause it to generate completely different C output.
> 
> Cheers,
> Peter
> 




signature.asc
Description: OpenPGP digital signature
___
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers


Re: [Chicken-hackers] How to interpret chicken post mortem?

2015-11-26 Thread Peter Bex
On Thu, Nov 26, 2015 at 06:01:13PM +0100, Jörg F. Wittenberger wrote:
> Am 26.11.2015 um 11:34 schrieb Peter Bex:
> >> Error: (assq) bad argument type: #
> > Do you also get this when compiling said code with the 4.10.1 snapshot?
> 
> I get the same strange segfaults from 4.10.1 snapshot (plus both the
> mutex-related fixes I posted these days as they are essential to work
> long enough).

Hi Joerg,

Unfortunately, it is impossible for us to debug this without some kind
of reproducible test case.

If we can't have that, a full unoptimised build's stack trace would be
useful instead of the truncated snippet full of "optimised out" values
you posted.  But only if you have the C code that goes along with it,
because f_1234 doesn't mean anything without being able to look at the
C code: different compiler flags and different versions of CHICKEN will
cause it to generate completely different C output.

Cheers,
Peter


signature.asc
Description: Digital signature
___
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers


Re: [Chicken-hackers] How to interpret chicken post mortem?

2015-11-26 Thread Jörg F . Wittenberger
Am 26.11.2015 um 11:34 schrieb Peter Bex:
>> Error: (assq) bad argument type: #
> Do you also get this when compiling said code with the 4.10.1 snapshot?

I get the same strange segfaults from 4.10.1 snapshot (plus both the
mutex-related fixes I posted these days as they are essential to work
long enough).

/Jörg




signature.asc
Description: OpenPGP digital signature
___
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers


Re: [Chicken-hackers] How to interpret chicken post mortem?

2015-11-26 Thread Jörg F . Wittenberger
Am 26.11.2015 um 13:56 schrieb Jörg F. Wittenberger:
> Am 26.11.2015 um 11:34 schrieb Peter Bex:
>> On Thu, Nov 26, 2015 at 11:29:20AM +0100, Jörg F. Wittenberger wrote:
>>> Something is definitely wrong at master and it is non-deterministic.
...
>>> (if (let ((g10816 key)) (trstcntl#x509-subject-hash cert))
>>>   (let ((t10269 ...)) (let (...) (util#remove-file ...)))
>>>   (##core#undefined))
>>>
>>> Error: (assq) bad argument type: #
>>
>> Do you also get this when compiling said code with the 4.10.1 snapshot?
> 
> So far i did not try.
...

> I'll report when it breaks.

Sure still on master.  This time the gdb stack trace may be enlightening.

(Let me promise one thing: this wt-tree code is innocent. I was not
touched for many years and is heavily used.)

So obviously t4...t7 are loaded from rather strange addresses here:

#0  f_4526 (c=7, av=0xbe186050) at wttree.c:4637
4637C_word ab[13],*a=ab;
[Current thread is 1 (Thread 0x4009a210 (LWP 7600))]
(gdb) bt
#0  f_4526 (c=7, av=0xbe186050) at wttree.c:4637
#1  0x00146b54 in f_3409 (c=, av=)
at wttree.c:6087
#2  0x001470a4 in f_4320 (c=, av=0xbe187a14) at wttree.c:1696
#3  0x00145990 in f_7008 (t0=, t1=-1105692140,
t2=, t3=,
t4=,
t5=,
t6=,
t7=)
at wttree.c:7604
#4  0x0014eff0 in f_4511 (c=, av=)
at wttree.c:4793
#5  0x0014f320 in f_4658 (c=, av=0xbe1862c0) at wttree.c:2947

Does this help in any way?

Best

/Jörg



signature.asc
Description: OpenPGP digital signature
___
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers


Re: [Chicken-hackers] How to interpret chicken post mortem?

2015-11-26 Thread Jörg F . Wittenberger
Am 26.11.2015 um 11:34 schrieb Peter Bex:
> On Thu, Nov 26, 2015 at 11:29:20AM +0100, Jörg F. Wittenberger wrote:
>> Something is definitely wrong at master and it is non-deterministic.
>>
>> I may be able to help narrowing this down, since I get these segfaults
>> within minutes.  But I lack any idea what to look for by now.
>>
>> Most of the time I get a segfault, that is.  (NB: No matter whether or
>> not I run with -:S the segfault is never caught.)
>>
>> Sometimes I just get weird results.  This also happened in csc, though
>> only once while compiling 102 modules totaling ~10 lines of code:
>>
>> 
>> Note: in toplevel procedure `cntrl#ball-control-default':
>>   expected value of type boolean in conditional but were given a value
>> of type
>>   `string' which is always true:
>>
>> (if (let ((g10816 key)) (trstcntl#x509-subject-hash cert))
>>   (let ((t10269 ...)) (let (...) (util#remove-file ...)))
>>   (##core#undefined))
>>
>> Error: (assq) bad argument type: #
> 
> Do you also get this when compiling said code with the 4.10.1 snapshot?

So far i did not try.

Actually I've been slightly incorrect: I compiled this code several
times.  Only once I observed the error from csc.  (And once more, maybe;
1st time recall only vaguely to have ignored "something weird" and had
to recompile anyway.)

So chances to reproduce it are basically nil.  Especially when taking
into account that it may be any kind of error I'd expect.

I'll try to recompile it a couple of times in a loop.  (Which will take
days to complete.)

I'll report when it breaks.

/Jörg



signature.asc
Description: OpenPGP digital signature
___
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers


Re: [Chicken-hackers] How to interpret chicken post mortem?

2015-11-26 Thread Peter Bex
On Thu, Nov 26, 2015 at 11:29:20AM +0100, Jörg F. Wittenberger wrote:
> Something is definitely wrong at master and it is non-deterministic.
> 
> I may be able to help narrowing this down, since I get these segfaults
> within minutes.  But I lack any idea what to look for by now.
> 
> Most of the time I get a segfault, that is.  (NB: No matter whether or
> not I run with -:S the segfault is never caught.)
> 
> Sometimes I just get weird results.  This also happened in csc, though
> only once while compiling 102 modules totaling ~10 lines of code:
> 
> 
> Note: in toplevel procedure `cntrl#ball-control-default':
>   expected value of type boolean in conditional but were given a value
> of type
>   `string' which is always true:
> 
> (if (let ((g10816 key)) (trstcntl#x509-subject-hash cert))
>   (let ((t10269 ...)) (let (...) (util#remove-file ...)))
>   (##core#undefined))
> 
> Error: (assq) bad argument type: #

Do you also get this when compiling said code with the 4.10.1 snapshot?

Cheers,
Peter


signature.asc
Description: Digital signature
___
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers


Re: [Chicken-hackers] How to interpret chicken post mortem?

2015-11-26 Thread Jörg F . Wittenberger
Something is definitely wrong at master and it is non-deterministic.

I may be able to help narrowing this down, since I get these segfaults
within minutes.  But I lack any idea what to look for by now.

Most of the time I get a segfault, that is.  (NB: No matter whether or
not I run with -:S the segfault is never caught.)

Sometimes I just get weird results.  This also happened in csc, though
only once while compiling 102 modules totaling ~10 lines of code:


Note: in toplevel procedure `cntrl#ball-control-default':
  expected value of type boolean in conditional but were given a value
of type
  `string' which is always true:

(if (let ((g10816 key)) (trstcntl#x509-subject-hash cert))
  (let ((t10269 ...)) (let (...) (util#remove-file ...)))
  (##core#undefined))

Error: (assq) bad argument type: #


The cure: compile again.  Second time it worked.

Looks like I can not switch to master now.  :-/

(I'm running this from master with debugbuild; no optimizations.)

Any suggestions?

Thanks

/Jörg

Am 24.11.2015 um 21:48 schrieb Jörg F. Wittenberger:
> Hi all,
> 
> just managed to switch to the master branch, eventually.  (Congrats to
> myself, sight ;-)
> 
> 
> Now there is a segfault.  Looks almost random.   The program runs for a
> fairly long time until crash.  (Megabytes of chicken -:r backtrace and
> no hint either). Some gdb backtraces from a debug build below.
> 
> Anybody having an idea how to narrow this down?
> 
> Thanks so much
> 
> /Jörg
> 
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00a004b4 in f_10074 (c=4, av=0xbe2b503c) at library.c:26635
> 26635 if(!C_demand(c*C_SIZEOF_PAIR+5)){
> [Current thread is 1 (Thread 0x400e9210 (LWP 19666))]
> (gdb) bt
> #0  0x00a004b4 in f_10074 (c=4, av=0xbe2b503c) at library.c:26635
> #1  0x00c54858 in f_20122 (t0=-1104456948, t1=-1104456232, t2=3)
> at irregex.c:32507
> #2  0x00c550f8 in f_20143 (c=2, av=0xbe2b510c) at irregex.c:32659
> #3  0x00d632b8 in allocate_vector_2 (c=0, av=0x104c3a4) at runtime.c:7268
> #4  0x00d62f50 in C_allocate_vector (c=6, av=0xbe2b51ac) at runtime.c:7217
> #5  0x00a02fb4 in f_10081 (c=2, av=0xbe2b5238) at library.c:27315
> #6  0x00a0062c in f_10074 (c=4, av=0xbe2b5294) at library.c:26659
> #7  0x00c54858 in f_20122 (t0=-1104456948, t1=-1104456232, t2=1)
> at irregex.c:32507
> #8  0x00c5434c in f_20117 (c=2, av=0xbe2b537c) at irregex.c:32409
> #9  0x00d632b8 in allocate_vector_2 (c=0, av=0x104c3a4) at runtime.c:7268
> #10 0x00d62f50 in C_allocate_vector (c=6, av=0xbe2b5434) at runtime.c:7217
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00c8f16c in C_i_pairp (
> x=)
> at ./chicken.h:2236
> 2236  {
> [Current thread is 1 (Thread 0x4004a210 (LWP 19876))]
> (gdb) bt
> #0  0x00c8f16c in C_i_pairp (
> x=)
> at ./chicken.h:2236
> #1  0x00c8d104 in f_1713 (t0=-1105088272, t1=-1105088364, t2=1126061980)
> at scheduler.c:3109
> #2  0x00c88dcc in f_1810 (t0=-1105088208, t1=-1105088248, t2=1125996968)
> at scheduler.c:1696
> #3  0x00c8cc88 in f_1702 (t0=-1105085432, t1=-1105085252, t2=-1096772972)
> at scheduler.c:2955
> #4  0x00c88474 in f_1798 (c=2, av=0xbe21b200) at scheduler.c:1523
> #5  0x00c88e00 in f_1810 (t0=-1105087552, t1=-1105087592, t2=14) at
> scheduler.c:1703
> #6  0x00c890fc in f_1820 (c=2, av=0xbe21b2a0) at scheduler.c:1765
> #7  0x00c8d1cc in f_1713 (t0=-1105087616, t1=-1105087708, t2=1125996908)
> at scheduler.c:3119
> #8  0x00c88dcc in f_1810 (t0=-1105087552, t1=-1105087592, t2=1125935220)
> at scheduler.c:1696
> #9  0x00c8cc88 in f_1702 (t0=-1105085432, t1=-1105085252, t2=-1096773036)
> at scheduler.c:2955
> #10 0x00c88474 in f_1798 (c=2, av=0xbe21b490) at scheduler.c:1523
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  f_1713 (t0=-1106919288, t1=-1106919380, t2=1137556300) at
> scheduler.c:3105
> 3105  C_word ab[5],*a=ab;
> [Current thread is 1 (Thread 0x400c1210 (LWP 19953))]
> (gdb) bt
> #0  f_1713 (t0=-1106919288, t1=-1106919380, t2=1137556300) at
> scheduler.c:3105
> #1  0x00c88dcc in f_1810 (t0=-1106919224, t1=-1106919264, t2=1137487936)
> at scheduler.c:1696
> #2  0x00c8cc88 in f_1702 (t0=-1106915792, t1=-1106915612, t2=-1098645684)
> at scheduler.c:2955
> #3  0x00c88474 in f_1798 (c=2, av=0xbe05c198) at scheduler.c:1523
> #4  0x00c88e00 in f_1810 (t0=-1106918568, t1=-1106918608, t2=14) at
> scheduler.c:1703
> #5  0x00c890fc in f_1820 (c=2, av=0xbe05c238) at scheduler.c:1765
> #6  0x00c8d1cc in f_1713 (t0=-1106918632, t1=-1106918724, t2=1137487876)
> at scheduler.c:3119
> #7  0x00c88dcc in f_1810 (t0=-1106918568, t1=-1106918608, t2=1137413272)
> at scheduler.c:1696
> #8  0x00c8cc88 in f_1702 (t0=-1106915792, t1=-1106915612, t2=-1098645748)
> at scheduler.c:2955
> #9  0x00c88474 in f_1798 (c=2, av=0xbe05c428) at scheduler.c:1523
> #10 0x00c88e00 in f_1810 (t0=-1106917912, t1=-1106917952, t2=14) at
> scheduler.c:1703
> #11 0x00c890fc in f