Re: crash in gc with upside-down stack

2008-11-12 Thread Linas Vepstas
Some minor updates:

2008/11/11 Linas Vepstas <[EMAIL PROTECTED]>:
>
> My stack below.
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xf5333b90 (LWP 20587)]
> 0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782) at gc-mark.c:435
> 435   SCM obj = * (SCM *) &x[m];
> Current language:  auto; currently c
> (gdb) bt
> #0  0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782)
>at gc-mark.c:435
> #1  0xf7766a12 in scm_threads_mark_stacks () at threads.c:1375
> #2  0xf7711d38 in scm_mark_all () at gc-mark.c:82
> #3  0xf7710d33 in scm_i_gc (what=0xf778602e "cells") at gc.c:598

My current code reproduces this fairly readily, I am seeing
it many dozens/hundreds of times a day.

I tweaked guile to check that the stack bounds are in order,
and to print an error message when they are, and then to
just troop on -- and so I see dozens/hundreds of prints.
When the stack bounds are reversed, the difference
is *always* 58 bytes; and in fact, the two bad stack
bounds are always the same.

It appears to happen *only* when I have multiple threads
all trying to define functions at the same time, it never
happens when one thread goes off to do some heavy
computing.

--linas




Re: Does anyone actually use threads with guile?

2008-11-12 Thread Linas Vepstas
2008/11/12 Linas Vepstas <[EMAIL PROTECTED]>:

> Today, I got a new crash. I have multiple threads, which
> are doing nothing but a bunch of define's, in parallel.
> (They're loading scheme code from various files).

Studying the code just a little bit more, this looks like
a dopey and pointless error check within guile,
specifically, at libguile/throw.c line 695.
Removing it seems to result in a runable system.

Basically, at any given time, some thread might be
in a critical section. Some other thread may be
throwing an error for some utterly unrelated reason.
Yet, when the error is thrown, this "critical section"
check will trip, and it will do so for an utterly bogus
reason.  At least, that describes my case.

Is there any reason at all not to remove this check
entirely? (at  libguile/throw.c line 695.)

Should I be posting this sort of stuff to guile-devel,
instead of guile-user?  I've cross-posted to bug-guile too,
if this is driving you nuts, please tell me to stop;
I wanted to make sure the right folks saw this.

--linas

>
> The stack trace is below. This is on guile-1.8.5
>
> --linas
>
> throw from within critical section.
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0xf5e56b90 (LWP 8655)]
> 0xe425 in __kernel_vsyscall ()
> (gdb) bt
> #0  0xe425 in __kernel_vsyscall ()
> #1  0xf7aee085 in raise () from /lib/tls/i686/cmov/libc.so.6
> #2  0xf7aefa01 in abort () from /lib/tls/i686/cmov/libc.so.6
> #3  0xf7789208 in scm_ithrow (key=0xf460f190, args=0xf3c8da88, noreturn=1)
>at throw.c:695
> #4  0xf771eeb2 in scm_error_scm (key=0xf460f190, subr=0xf3d4e360,
>message=0xf3d4e340, args=0x404, data=0x4) at error.c:92
> #5  0xf775ef9b in scm_i_input_error (function=0xf77aaf38 "scm_i_lreadparen",
>port=0xf3c8d970, message=0xf77aae7c "end of file", arg=0x404) at read.c:110
> #6  0xf775f2e9 in flush_ws (port=0xf3c8d970,
>eoferr=0xf77aaf38 "scm_i_lreadparen") at read.c:261
> #7  0xf776204c in scm_read_sexp (chr=, port=0xf3c8d970)
>at read.c:357
> #8  0xf7760b91 in scm_read_expression (port=0xf3c8d970) at read.c:1079
> #9  0xf7761fde in scm_read_sexp (chr=, port=0xf3c8d970)
>at read.c:362
> #10 0xf7760b91 in scm_read_expression (port=0xf3c8d970) at read.c:1079
> #11 0xf7761fde in scm_read_sexp (chr=, port=0xf3c8d970)
>at read.c:362
> ---Type  to continue, or q  to quit---
> #12 0xf7760b91 in scm_read_expression (port=0xf3c8d970) at read.c:1079
> #13 0xf7782bc2 in inner_eval_string (data=0xf3c8d970) at strports.c:499
> #14 0xf772dfde in scm_c_with_fluid (fluid=0x8aa7900, value=0xf45e8e80,
>cproc=0xf7782b90 , cdata=0xf3c8d970) at fluids.c:459
> #15 0xf7747335 in scm_c_call_with_current_module (module=0xf45e8e80,
>func=0xf7782b90 , data=0xf3c8d970) at modules.c:104
> #16 0xf7782e21 in scm_eval_string_in_module (string=0xf3d4e230,
>module=0xf45e8e80) at strports.c:527
> #17 0xf7782e55 in scm_eval_string (string=0xf3d4e230) at strports.c:535
> #18 0xf7782e85 in scm_c_eval_string (
>expr=0xf3a05a44 "(define (wire-bidi a-wire b-wire
> uni-device)\n\t(let ((device (wire-null-device))\n\t\t\t(do-connect
> #t)\n\t\t\t(myname \"\")\n\t\t)\n\t\t(define (connect-me)\n\t\t\t; Two
> distinct checks of 'do-connect' are made, because "...) at
> strports.c:481
> #19 0xf7788b79 in scm_c_catch (tag=0x104, body=0xf7782e60 ,
>body_data=0xf3a05a44,
>handler=0xf78b05d8
>  scm_unused_struct*)>, handler_data=0x8ac32d0,
>pre_unwind_handler=0xf78b064a
>  to
> continue, or q  to quit---
> void*, scm_unused_struct*, scm_unused_struct*)>,
>pre_unwind_handler_data=0x8ac32d0) at throw.c:200
> #20 0xf78b0a1d in opencog::SchemeEval::do_eval (this=0x8ac32d0,
> [EMAIL PROTECTED])
>at 
> /home/linas/src/novamente/src/opencog-stage4/staging/opencog/guile/SchemeEval.cc:364
> #21 0xf78b0cd7 in opencog::SchemeEval::c_wrap_eval (p=0x8ac32d0)
>at 
> /home/linas/src/novamente/src/opencog-stage4/staging/opencog/guile/SchemeEval.cc:341
> #22 0xf7713842 in c_body (d=0xf5e55a08) at continuations.c:350
> #23 0xf7788b79 in scm_c_catch (tag=0x104, body=0xf7713830 ,
>body_data=0xf5e55a08, handler=0xf7713850 ,
>handler_data=0xf5e55a08,
>pre_unwind_handler=0xf7788440 ,
>pre_unwind_handler_data=0x0) at throw.c:200
> #24 0xf7713cf2 in scm_i_with_continuation_barrier (body=0xf7713830 ,
>body_data=0xf5e55a08, handler=0xf7713850 ,
>handler_data=0xf5e55a08,
>pre_unwind_handler=0xf7788440 ,
>pre_unwind_handler_data=0x0) at continuations.c:326
> ---Type  to continue, or q  to quit---
> #25 0xf7713dd3 in scm_c_with_continuation_barrier (
>func=0xf78b0ca2 , data=0x8ac32d0)
>at continuations.c:368
> #26 0xf7787959 in scm_i_with_guile_and_parent (
>func=0xf78b0ca2 , data=0x8ac32d0,
>parent=0xf45e1e70) at threads.c:695
> #27 0xf7787a4e in scm_with_guile (
>func=0xf78b0ca2 , data=0x8ac32d0)
>at threads.c:683
> #28 0xf78b034c in opencog::SchemeEval::eval (this=0x8ac32d0, [EMAIL 
> PROTECTED])
>at 
> /home/linas/src/novamente/s

crash in gc with upside-down stack

2008-11-12 Thread Linas Vepstas
Here's another one, I'm trying to dig into this:

Its more or less the same crash as the one  reported at:

http://bugs.gentoo.org/228097
and
http://www.mail-archive.com/bug-guile@gnu.org/msg04568.html

My stack below.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf5333b90 (LWP 20587)]
0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782) at gc-mark.c:435
435   SCM obj = * (SCM *) &x[m];
Current language:  auto; currently c
(gdb) bt
#0  0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782)
at gc-mark.c:435
#1  0xf7766a12 in scm_threads_mark_stacks () at threads.c:1375
#2  0xf7711d38 in scm_mark_all () at gc-mark.c:82
#3  0xf7710d33 in scm_i_gc (what=0xf778602e "cells") at gc.c:598
#4  0xf7710f4d in scm_gc_for_newcell (freelist=0xf779b76c,
free_cells=0x1228e9b0)
at gc.c:509
#5  0xf7768bd8 in scm_c_catch (tag=0x104, body=0xf76f3830 ,
body_data=0xf528, handler=0xf76f3850 ,
handler_data=0xf528,
pre_unwind_handler=0xf77683e0 ,
pre_unwind_handler_data=0x0) at ../libguile/inline.h:186
#6  0xf76f3cf2 in scm_i_with_continuation_barrier (body=0xf76f3830 ,
body_data=0xf528, handler=0xf76f3850 ,
handler_data=0xf528,
pre_unwind_handler=0xf77683e0 ,
pre_unwind_handler_data=0x0) at continuations.c:326
#7  0xf76f3dd3 in scm_c_with_continuation_barrier (
func=0xf7767ab0 , data=0x1228e938) at continuations.c:368
---Type  to continue, or q  to quit---
#8  0xf77678f9 in scm_i_with_guile_and_parent (func=0xf7767ab0
,
data=0x1228e938, parent=0x19f63670) at threads.c:695
#9  0xf77679ee in scm_with_guile (func=0xf7767ab0 ,
data=0x1228e938) at threads.c:683
#10 0xf7767a43 in on_thread_exit (v=0x1228e938) at threads.c:505
#11 0xf7d7abb0 in __nptl_deallocate_tsd ()
   from /lib/tls/i686/cmov/libpthread.so.0
#12 0xf7d7b509 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#13 0xf7b79e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb)

I've seen this twice now in two days, but its not readily reproducible.
By plugging in the insanely large n into a hex calc, you'll see its actually
0xfffsomething. Looking carefully near  threads.c:1375 seems to imply
that stack top and stack bottom are reversed. So I added a printf at that
location, and tried to reproduce the crash. Several gazzilion print
statements later, no crash.

I suspect that this is some sort of thread-race condition; I think it
happens when I am defining some functions from several different
threads at once. It seems *not* to occur once I get into hard-core
computations-- i.e. it happens no later than the first few dozen gc's.

This is on guile-1.8.5, --with-threads, on Ubuntu, Intel (actually AMD64 cpu.)

--linas




guile-1.8.5 segfaults while building on ppc with gcc-4.3 and -O2

2008-11-12 Thread Marijn Schouten (hkBst)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I've had some reports that guile-1.8.5 segfaults while building on ppc with
gcc-4.3 and -O2[1]. It seems to build fine with -O1 instead. Did anyone else see
this behavior?

Marijn

[1]:http://bugs.gentoo.org/show_bug.cgi?id=228097

- --
Marijn Schouten (hkBst), Gentoo Lisp project, Gentoo ML
, #gentoo-{lisp,ml} on FreeNode
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkka7dQACgkQp/VmCx0OL2x7fACgpNyEz/vsU87ErpRhaLQO+nkT
1nUAoLELq+SrYOb/pwxb32r4Nc/zKKLk
=tnq6
-END PGP SIGNATURE-