Re: Core dump when throwing an exception from a resumed partial continuation
On Thu 21 Mar 2013 14:53, Andrew Gaylard writes: > (catch #t > (λ () > (throw 'oops)) ; should not crash the vm > (λ () > (display "Success!")(newline))) ; never reached > > the VM still cores; "Success" is never shown. However, you've probably > spotted my mistake: the handler should be (λ (key . args) ... ). The core dump is another bug. but fixing the handler is the key thing: > (catch #t > (λ () > (throw 'oops)) ; should not crash the vm > (λ (key . args) > (display "Success!")(newline))) ; works! > > ...solves the problem, and the VM doesn't core any more. Yep Happy hacking :) A -- http://wingolog.org/
Re: Core dump when throwing an exception from a resumed partial continuation
On 03/21/13 11:43, Andy Wingo wrote: On Fri 15 Mar 2013 22:01, Brent Pinkney writes: When I resume the continuation in another thread, all works perfectly UNLESS the continued execution throws and exception. Then guile exits with a core dump. By contrast if I resume the continuation in the same thread and then throw and exception all works as expected. I think I know what this is. So, a delimited continuation should capture that part of the dynamic environment made in its extent. (See Oleg Kiselyov and Chung-Chieh Shan's "Delimited Dynamic Binding" paper.) That is what Guile does, for fluids, prompts, and dynamic-wind blocks. Our implementation of exception handling uses a fluid, %exception-handler (boot-9.scm:86). However that fluid references a stack of exception handlers on the heap. There is the problem: an exception in a reinstated delimited continuation continuation will walk the captured exception handler stack from the heap, not from its own dynamic environment. Therefore it could abort to a continuation that is not present on the new thread. The solution is to have the exception handler find the next handler from the dynamic environment. This will need a new primitive to walk the dynamic stack, I think. I can't look at this atm as I broke my arm (!) and so typing is tough. For now as a workaround I suggest you put a catch #t in each of your delimited continuations. This way all throws will be handled by catches established by the continuation. Regards, Andy Andy, Thanks for giving this some thought -- sorry to hear about your arm! This does shed some light on things. If I change this: (throw 'oops) ; should not crash the vm to this: (catch #t (λ () (throw 'oops)) ; should not crash the vm (λ () (display "Success!")(newline))) ; never reached the VM still cores; "Success" is never shown. However, you've probably spotted my mistake: the handler should be (λ (key . args) ... ). But this core shows up differently in the stack-trace in gdb: #0 scm_error (key=0x1001854c0, subr=0x0, message=0x7e7ef518 "Wrong number of arguments to ~A", args=0x100db95b0, rest=0x4) at error.c:62 ... which is exactly the exception one would expect. Fixing the handler thus: (catch #t (λ () (throw 'oops)) ; should not crash the vm (λ (key . args) (display "Success!")(newline))) ; works! ...solves the problem, and the VM doesn't core any more. So it seems that although we *did* have a catch around our resumption, there must have been some (different) error in its handler, which caused a second exception, which caused the VM to crash. Unfortunately, the test-case we made handles this second exception fine. It'd be great to be able to distill this problem down to a pithy test-case. (Our app is 4500 lines and still growing, so it's not really a candidate to send to the list.) The same problem happens (VM cores) if I do this: (catch 'not-oops (λ () (throw 'oops)) ; should not crash the vm (λ (key . args) (display "Success!")(newline))); never reached So your answer to surround the resumption with a (catch #t ...) is a good workaround. For our code, anyway. (I'm now off to go read http://www.cs.indiana.edu/~sabry/papers/delim-dyn-bind.pdf :) -- Andrew
Re: Core dump when throwing an exception from a resumed partial continuation
On Fri 15 Mar 2013 22:01, Brent Pinkney writes: > When I resume the continuation in another thread, all works perfectly > UNLESS the continued execution throws and exception. > Then guile exits with a core dump. > > By contrast if I resume the continuation in the same thread and then > throw and exception all works as expected. I think I know what this is. So, a delimited continuation should capture that part of the dynamic environment made in its extent. (See Oleg Kiselyov and Chung-Chieh Shan's "Delimited Dynamic Binding" paper.) That is what Guile does, for fluids, prompts, and dynamic-wind blocks. Our implementation of exception handling uses a fluid, %exception-handler (boot-9.scm:86). However that fluid references a stack of exception handlers on the heap. There is the problem: an exception in a reinstated delimited continuation continuation will walk the captured exception handler stack from the heap, not from its own dynamic environment. Therefore it could abort to a continuation that is not present on the new thread. The solution is to have the exception handler find the next handler from the dynamic environment. This will need a new primitive to walk the dynamic stack, I think. I can't look at this atm as I broke my arm (!) and so typing is tough. For now as a workaround I suggest you put a catch #t in each of your delimited continuations. This way all throws will be handled by catches established by the continuation. Regards, Andy -- http://wingolog.org/
Re: Core dump when throwing an exception from a resumed partial continuation
On 03/15/13 23:30, Andy Wingo wrote: On Fri 15 Mar 2013 22:01, Brent Pinkney writes: I am using partial continuations to resume a computation when an external system returns with an answer. I am using (call-with-prompt ...) and (abort-to-prompt) When I resume the continuation in another thread, all works perfectly Neat :) UNLESS the continued execution throws and exception. Then guile exits with a core dump. That's not good! Can you work up a short test case? We've tried to create a short test-case. Unfortunately, it doesn't seem to trigger the core. However, the app we're creating triggers the core-dump every time. So, to dig into this problem, I built a debuggable VM. What we see in the debuggable cores is the first backtrace. You'll note that aside from the stack overflow at frame #3, the pattern of "Abort to unknown prompt" is repeated /ad infinitum/. Well, certainly to a stack depth of 28,000 :). So the stack overflow is understandable. I guess the question is, why does guile get stuck in a loop aborting to an unknown prompt?. This is on Linux x86 Ubuntu 12.04, both 32- and 64-bit. The same code crashes the same VM at the same point on Solaris SPARC 64-bit, but that core does not appear to show this repetitive pattern. When I say the "same VM", I mean it: all dependencies except for the kernel and libc are built from identical sources, using as near as possible the same configure flags: gcc-4.7.2 bdw-gc-7.2d libtool-2.2.10 gmp-5.0.2 libiconv-1.14 libunistring-0.9.3 libffi-3.0.10 readline-6.1 guile-2.0.7 Ubuntu's guile also shows the same problem. To understand how guile gets into this state, I put a breakpoint in the VM at the point where it first calls abort. That reveals the second backtrace below. This shows what happens immediately before the VM goes bananas, and fills up the stack. Which is exactly what happens when gdb allows guile to continue beyond the breakpoint. I then tried stepping through the scm_c_abort code in frame #2, and it indeed does not find anything in the wind list. Certainly, the list returned by scm_i_dynwinds has 12 entries in it. It's just that none of them match. I'd be really grateful for any help on this -- as you can tell, I'm not a VM hacker! -- Andrew #0 0x0033b416 in __kernel_vsyscall () #1 0x004ff1df in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0x00502825 in __GI_abort () at abort.c:91 #3 0x00a106b7 in vm_error_stack_overflow (vp=0x9c9cfc0) at vm.c:516 #4 0x00a204a4 in vm_regular_engine (vm=0x9cb29e8, program=0x93b50d0, argv=0xac055e70, nargs=4) at vm-engine.c:166 #5 0x00a34d00 in scm_c_vm_run (vm=0x9cb29e8, program=0x93b50d0, argv=0xac055e70, nargs=4) at vm.c:741 #6 0x00a3585e in scm_call_with_vm (vm=0x9cb29e8, proc=0x93b50d0, args=0x304) at vm.c:1033 #7 0x009763e1 in scm_apply (proc=0x93b50d0, arg1=0x95dd4c8, args=0x95dd4c8) at eval.c:748 #8 0x00975f7c in scm_apply_1 (proc=0x93b50d0, arg1=0x937df10, args=0x95dd4d0) at eval.c:588 #9 0x00a0ba1d in scm_throw (key=0x937df10, args=0x95dd4d0) at throw.c:104 #10 0x00a102ff in vm_error (msg=0xa6631d "VM: Too many arguments", arg=0x16) at vm.c:414 #11 0x00a105e6 in vm_error_too_many_args (nargs=5) at vm.c:490 #12 0x00a11a42 in vm_regular_engine (vm=0x9cb29e8, program=0x93b50d0, argv=0xac056770, nargs=5) at vm-engine.c:104 #13 0x00a34d00 in scm_c_vm_run (vm=0x9cb29e8, program=0x93b50d0, argv=0xac056770, nargs=5) at vm.c:741 #14 0x00a3585e in scm_call_with_vm (vm=0x9cb29e8, proc=0x93b50d0, args=0x304) at vm.c:1033 #15 0x009763e1 in scm_apply (proc=0x93b50d0, arg1=0x95dd550, args=0x95dd550) at eval.c:748 #16 0x00975f7c in scm_apply_1 (proc=0x93b50d0, arg1=0x9362130, args=0x95dd558) at eval.c:588 #17 0x00a0ba1d in scm_throw (key=0x9362130, args=0x95dd558) at throw.c:104 #18 0x00a0c097 in scm_ithrow (key=0x9362130, args=0x95dd558, noreturn=1) at throw.c:441 #19 0x009735bf in scm_error_scm (key=0x9362130, subr=0x99105b0, message=0x99105c0, args=0x95dd5c8, data=0x4) at error.c:95 #20 0x00973576 in scm_error (key=0x9362130, subr=0xa4cd3b "abort", message=0xa4cd23 "Abort to unknown prompt", args=0x95dd5c8, rest=0x4) at error.c:62 #21 0x00973b6b in scm_misc_error (subr=0xa4cd3b "abort", message=0xa4cd23 "Abort to unknown prompt", args=0x95dd5c8) at error.c:316 #22 0x0096aef5 in scm_c_abort (vm=0x9cb29e8, tag=0x9c08af0, n=5, argv=0xac056960, cookie=6614) at control.c:209 #23 0x00a0fe36 in vm_abort (vm=0x9cb29e8, n=0, vm_cookie=6614) at vm.c:264 #24 0x00a18942 in vm_regular_engine (vm=0x9cb29e8, program=0x93b5260, argv=0xac0571f4, nargs=6) at vm-i-system.c:1528 #25 0x00a34d00 in scm_c_vm_run (vm=0x9cb29e8, program=0x93b50d0, argv=0xac0571e0, nargs=5) at vm.c:741 #26 0x00a3585e in scm_call_with_vm (vm=0x9cb29e8, proc=0x93b50d0, args=0x304) at vm.c:1033 #27 0x009763e1 in scm_apply (proc=0x93b50d0, arg1=0x95dd678, args=0x95dd678) at eval.c:748 #28 0x00975f7c in scm_apply_1 (proc=0x93b50d0, a
Re: Core dump when throwing an exception from a resumed partial continuation
Hi, On Fri 15 Mar 2013 22:01, Brent Pinkney writes: > I am using partial continuations to resume a computation when an > external system returns with an answer. > I am using (call-with-prompt ...) and (abort-to-prompt) > > When I resume the continuation in another thread, all works perfectly Neat :) > UNLESS the continued execution throws and exception. > Then guile exits with a core dump. That's not good! Can you work up a short test case? Thanks, Andy -- http://wingolog.org/