[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Comment #14 from tromey at gcc dot gnu dot org 2006-03-08 19:27 --- I've been looking into this a bit. The current problem I see is that the heavyweight lock stuff relies on the GC. This won't interact well with the current code in natReference.cc, as those data structures are not scanned. Also, I do think that both calls to _Jv_RegisterFinalizer in Reference::create are problematic. The first call registers a finalizer for the Reference, the second for the referent. But, there is nothing preventing a subclass of Reference from having a finalizer; or from user code acquiring a heavy lock on a Reference object. So, all cases have to be handled here. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From ovidr at users dot sourceforge dot net 2005-07-19 15:06 --- I've spent a lot of time trying to make a testcase of this, but no luck yet. I can basically create a testapp with 2 threads. When they both access a synchronized method, and are forced to wait long enough (natObject.cc:907 spins 18 times before making a hard lock) you can drop the finalizer by inserting a WeakHashMap.put(this, null) call. (I force System.gc() regularly) In gdb I can see that heavy_lock_obj_finalization_proc is no longer called once the Reference::create() call has been made. I've done this a ton of times in a loop, but I just can't get the test app to crash. Is dropping the finalizer enough to cause a crash (over time)? I'm not sure what I'm missing, or what I can do to force this crash. My real app however does crash. I've recompiled libgcj and do get all the information originally requested from gdb from an above comment in yet another but similar backtrace. I don't know if it confirms that the problem is in dropping finalizers (or maybe that is a separate problem?), but thought I'd post it. Program received signal SIGSEGV, Segmentation fault. 0x404229f5 in GC_mark_from (mark_stack_top=0xc82b000, mark_stack=0xc82b000, mark_stack_limit=0xc83b000) at /home/gcc/gcc/boehm-gc/mark.c:724 724 descr = *(word *)(type_descr (gdb) bt #0 0x404229f5 in GC_mark_from (mark_stack_top=0xc82b000, mark_stack=0xc82b000, mark_stack_limit=0xc83b000) at /home/gcc/gcc/boehm-gc/mark.c:724 #1 0x4041eab8 in GC_finalize () at /home/gcc/gcc/boehm-gc/finalize.c:639 #2 0x4041ab83 in GC_finish_collection () at /home/gcc/gcc/boehm-gc/alloc.c:659 #3 0x4041a35b in GC_try_to_collect_inner (stop_func=0x40419c5c GC_never_stop_func) at /home/gcc/gcc/boehm-gc/alloc.c:376 #4 0x4041b3e8 in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0) at /home/gcc/gcc/boehm-gc/alloc.c:996 #5 0x4041b683 in GC_allocobj (sz=4, kind=0) at /home/gcc/gcc/boehm- gc/alloc.c:1071 #6 0x40420679 in GC_generic_malloc_inner (lb=16, k=0) at /home/gcc/gcc/boehm- gc/malloc.c:136 #7 0x404217b3 in GC_generic_malloc_many (lb=16, k=0, result=0x4062b4a8) at /home/gcc/gcc/boehm-gc/mallocx.c:512 #8 0x4042b32d in GC_local_malloc_atomic (bytes=12) at /home/gcc/gcc/boehm- gc/pthread_support.c:334 #9 0x401f2ec7 in _Jv_AllocPtrFreeObj (size=12, klass=0x8816688) at java-gc.h:57 #10 0x401f1674 in _Jv_NewPrimArray (eltype=0x87a3be0, count=1) at /home/gcc/gcc/libjava/prims.cc:559 #11 0x08287db9 in org.eclipse.swt.widgets.Table.textCellDataProc(int, int, int, int, int) ( this=0x8940dc0, tree_column=146453640, cell=146453856, tree_model=206469928, iter=-1073753012, data=146439960) at Table.java:2704 #12 0x082b15b4 in org.eclipse.swt.widgets.Display.textCellDataProc(int, int, int, int, int) ( this=0x884ed48, tree_column=146453640, cell=146453856, tree_model=206469928, iter=-1073753012, data=146439960) at Display.java:3305 #13 0x4040aceb in ffi_call_SYSV () at /home/gcc/gcc/libffi/src/x86/sysv.S:60 #14 0x4040a8d2 in ffi_call (cif=0xbfffd0b8, fn=0x82b1544 org.eclipse.swt.widgets.Display.textCellDataProc(int, int, int, int, int), rvalue=0xbfffd0b0, avalue=0xbfffcfd0) at /home/gcc/gcc/libffi/src/x86/ffi.c:221 #15 0x4023e91e in _Jv_CallAnyMethodA (obj=0x884ed48, return_type=0x87a3be0, meth=0x87007c0, is_constructor=0 '\0', is_virtual_call=1 '\001', parameter_types=0xc7a5460, args=0xbfffd160, result=0xbfffd1d4, is_jni_call=1 '\001', iface=0x0) at /home/gcc/gcc/libjava/java/lang/reflect/natMethod.cc:495 #16 0x401fa956 in _Jv_JNI_CallAnyMethodVjint, normal (env=0x87b28f8, obj=0x884ed48, klass=0x0, id=0x87007c0, vargs=0xbfffd250 \210ÎéÎ÷ÎáÎõÎù\b`ÎåÎéÎÝÎáÎõÎù\b ({N\fLÎùÎ÷ÎáÎñÎåÎáÎõÎý\030\177ÎáÎõÎù\bÎùÎé\200iKÎíÎíÎåÎíÎõ\227K) at /home/gcc/gcc/libjava/jni.cc:796 #17 0x401fa9ed in _Jv_JNI_CallMethodVjint (env=0x87b28f8, obj=0x884ed48, id=0x87007c0, args=0xbfffd250 \210ÎéÎ÷ÎáÎõÎù\b`ÎåÎéÎÝÎáÎõÎù\b({N\fLÎùÎ÷ÎáÎñÎåÎáÎõÎý\030 \177ÎáÎõÎù\bÎùÎé\200iKÎíÎíÎåÎíÎõ\227K) at /home/gcc/gcc/libjava/jni.cc:967 #18 0x40fbcfac in callback () from ./lib/libswt-gtk-3138.so #19 0x40faeb65 in fn16_5 () from ./lib/libswt-gtk-3138.so (gdb) p descr $1 = 4294967279 (gdb) p current_p $2 = (word *) 0x93b10e0 (gdb) p type_descr $3 = 0x2d02ca8a Address 0x2d02ca8a out of bounds (gdb) p GC_gc_no $4 = 1731 (gdb) p *mark_stack_top $5 = {mse_start = 0x93b10e0, mse_descr = 4294967279} (gdb) up #1 0x4041eab8 in GC_finalize () at /home/gcc/gcc/boehm-gc/finalize.c:639 639 GC_MARK_FO(real_ptr, GC_normal_finalize_mark_proc); (gdb) p real_ptr $6 = 0x93b10e0 \212ÎõÎ÷\002- (gdb) p *curr_fo $7 = {prolog = {hidden_key = 154865888, next = 0x96d54f8}, fo_fn = 0x40408c14 call_finalizer, fo_client_data = 0x4023b092 U\211ÎáÎéÎíVS\203ÎáÎíÎá`ÎáÎéÎý\026TÎáÎñÎÝÎáÎñÎå\201ÎõÎñÎáÎáÎá\225=, fo_object_size = 22, fo_mark_proc = 0x4041e03e GC_null_finalize_mark_proc} (gdb)
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From ovidr at users dot sourceforge dot net 2005-06-20 16:25 --- I've tried to create a testcase but can't seem to get a crash or infinite loop lockup. Anyway, I think I understand conceptually what must be done, but in practice I'm still unsure of how to go about it. You don't seem to have a problem with the call to: _Jv_RegisterFinalizer (this, finalize_reference); only _Jv_RegisterFinalizer (referent, finalize_referred_to_object); but since referent is an arbitrary object, what kind of finalizers can it already have? I'm stumped on how to get access to an arbitrary object's finalizers from natReference.cc. Is it just the heavy_lock structure from natObject that needs to be considered somehow? From comment #10: My impression is that natReference.cc already keeps a fairly elaborate data structure to which you should be able to add the prior finalization info This is the part that confuses me. Not all objects are References, so how would a Reference know about some arbitrary Object's previous finalizers or even attempt to maintain a data structure? And once I have the Object's old finalizer (if there is one), I guess I just run it, and register the new one with GC_REGISTER_FINALIZER_NO_ORDER(x, x, cd, 0, 0); from natReference's finalize_referred_to_object ? Or am I way off track? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From ovidr at users dot sourceforge dot net 2005-06-08 21:14 --- Since this bug seems a bit lost, I've been trying some things on my own without success. Can someone please explain: If referent is just a RawData pointer to some Object, how are its previous finalizers supposed to be found? How to append them along with the new finalizer in the correct order (order matters?) to GC_REGISTER_FINALIZER_NO_ORDER (GC_register_finalizer_inner(obj, fn, cd, ofn, ocd, mp)) which seems to be what is used in natObject.cc? The comments in this bug seem to suggest that there is some similar code somewhere that I could lift and hook into natReference.create, but all the code in natObject and String.intern finalization looks very different to my novice-gcj eyes since they don't seem to be working with some foreign object, nor with finalizers that aren't already locally stored in a struct. Secondly, is there any way to craft a testcase for this to know if it has been fixed? I don't fully understand why (based on comment #6) this would ever cause a crash, and waiting many days for the crash of my apps is a very tedious process. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From Hans dot Boehm at hp dot com 2005-06-09 05:10 --- Unfortunately, I haven't had time to pursue this. I think that in order to get this to fail, you want lots of weak references to objects which are also sobject to lock contention or wait/notify calls. I don't think we currently have a good test case. My impression is that natReference.cc already keeps a fairly elaborate data structure to which you should be able to add the prior finalization info, so that it can be invoked at the right point by the existing finalizer there. In general, the GC's data structures don't queue multiple finalizers. You need to register a new finalizer that knows it has to reregister the old one when it's done. The information that there was another finalizer needs to be kept off to the side somewhere in a separate table, or as part of the client data registered with the finalizer. The locking code also has to deal with opaque objects, but it again has its own hash table off to the side. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
-- What|Removed |Added AssignedTo|unassigned at gcc dot gnu |daney at gcc dot gnu dot org |dot org | Status|NEW |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
-- What|Removed |Added AssignedTo|daney at gcc dot gnu dot org|unassigned at gcc dot gnu ||dot org Status|ASSIGNED|NEW http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-02 13:42 --- Confirmed based on Tromey's comments. -- What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed||1 Last reconfirmed|-00-00 00:00:00 |2004-12-02 13:42:43 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From Hans dot Boehm at hp dot com 2004-11-25 01:50 --- After finally finding time to look at the code, it appears that my earlier guesses were correct. ::java::lang::ref::Reference::create in natReference.cc calls _Jv_RegisterFinalizer(referent ...), where referent is an arbitrary object, which may already have a finalizer. This is bad news, since the original finalizer will be dropped. The original finalizer may be a Java finalizer, or it may be one that was registered by the hash synchronization code to clean up a heavy lock entry for the object. In either case we lose. (The hash synchronization code is careful to not lose the original finalizer.) In both cases I think, we are likely to mostly introduce leaks, and crash only occasionally. So this may explain some other misbehavior. The fix may require some thought. At a minimum, we need to export more GC functionality, so that the Reference implementation can retrieve the old finalizer. (The hash synchronization code currently cheats and goes directly to the GC interface, which should also be fixed.) I think that so long as Reference gets the ordering right, and doesn't assume that all finalizers are Java finalizers, the hash synchronization code should work. It needs to drop the heavy lock before the object is deallocated, and while the lock is not held. I don't think the timing otherwise matters. If the object is resurrected 17 times, we can drop the heavy lock at any of those points, recreating it if necessary. This really needs to be fixed to make any use of References reliable. -- What|Removed |Added CC||tromey at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From tromey at gcc dot gnu dot org 2004-11-25 03:25 --- Oops, I wasn't aware that the locks code was using finalizers. We had to make special consideration in the reference code for String.intern; we can do something similar for locks. This is pretty important, I am going to add it to our 4.0 wish list. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
-- What|Removed |Added OtherBugsDependingO||17574 nThis|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From Hans dot Boehm at hp dot com 2004-11-08 19:55 --- I think this could be explained by the same problem. This time the collector is in the Java-specific finalization pass which marks objects reachable from objects that are about to be finalized, so that the finalizer doesn't see deallocated memory. It appears that the finalizable object it's marking from has somehow been clobbered already, and no longer has a proper vtable entry. Hence the collector dies trying to find the mark descriptor in the vtable. I can see how this might happen if we accidentally register a finalizer on something that was already collected, which can happen if the finalizer installed by hash synchronization is dropped. It might be useful to find a little more of the context. Try p descr p current_p p type_descr p GC_gc_no p *mark_stack_top up 1-- goto GC_finalize frame p real_ptr p *curr_fo x/8wx real_ptr -4 Printing *curr_fo should indicate the finalization function and client data associated with this object. It would be useful to explore the client data a bit further, so that we can understand what the finalizer is really trying to do. (All Java finalizers use the same function, and use the client data field to specify what really needs to be done.) Assuming current_p and real_ptr are the same, and you can call functions from gdb, try p GC_find_header(real_ptr) If that looks like a sane pointer, also try p *GC_find_header(real_ptr) p GC_base(real_ptr) I need to look at the WeakHashMap code, but I won't get a chance to do that for a few days. We could also probably track this down more systematically by having the hash synchronization code check that the vtable pointer hasn't changed when we reregister the client finalizer around line 757 in natObject.cc. We would have to remember the vtable pointer in the hl structure. This might cause this to fail far more predictably, and might at least confirm that we're not barking up the wrong tree. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From ovidr at users dot sourceforge dot net 2004-11-08 20:27 --- Unforunately many variables seem not to be available: GNU gdb 6.1 (gdb) p descr Variable descr is not available. (gdb) p current_p $1 = (word *) 0x9acf618 (gdb) p type_descr No symbol type_descr in current context. (gdb) p GC_gc_no $2 = 768 (gdb) p *mark_stack_top $3 = {mse_start = 0x9acf618, mse_descr = 4294967279} (gdb) up 1 #1 0x40523b4b in GC_finalize () at /datal/gcc/gcc/boehm-gc/finalize.c:639 639 GC_MARK_FO(real_ptr, GC_normal_finalize_mark_proc); (gdb) p real_ptr Variable real_ptr is not available. (gdb) p *curr_fo Variable curr_fo is not available. (gdb) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From ovidr at users dot sourceforge dot net 2004-11-08 03:00 --- Recompiled with -g (and waited a few days..), but I'm not sure if this is the same problem or not: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1106963376 (LWP 23098)] 0x4052753c in GC_mark_from (mark_stack_top=0x8c5f000, mark_stack=0x8c5f000, mark_stack_limit=0x8c7f000) at /datal/gcc/gcc/boehm-gc/mark.c:724 724 descr = *(word *)(type_descr (gdb) bt #0 0x4052753c in GC_mark_from (mark_stack_top=0x8c5f000, mark_stack=0x8c5f000, mark_stack_limit=0x8c7f000) at /datal/gcc/gcc/boehm-gc/mark.c:724 #1 0x40523b4b in GC_finalize () at /datal/gcc/gcc/boehm-gc/finalize.c:639 #2 0x4051fa60 in GC_finish_collection () at /datal/gcc/gcc/boehm- gc/alloc.c:659 #3 0x405200eb in GC_try_to_collect_inner (stop_func=Variable stop_func is not available. ) at /datal/gcc/gcc/boehm-gc/alloc.c:376 #4 0x4052087e in GC_collect_or_expand (needed_blocks=Variable needed_blocks is not available. ) at /datal/gcc/gcc/boehm-gc/alloc.c:1020 #5 0x40520adb in GC_allocobj (sz=12, kind=0) at /datal/gcc/gcc/boehm- gc/alloc.c:1071 #6 0x405253aa in GC_generic_malloc_inner (lb=48, k=0) at /datal/gcc/gcc/boehm- gc/malloc.c:136 #7 0x4052621c in GC_generic_malloc_many (lb=48, k=0, result=0x8722cf8) at /datal/gcc/gcc/boehm-gc/mallocx.c:512 #8 0x4053014e in GC_local_malloc_atomic (bytes=48) at /datal/gcc/gcc/boehm- gc/pthread_support.c:334 #9 0x403780fc in _Jv_AllocString (len=14) at java-gc.h:57 #10 0x403b0b75 in java::lang::String::toLowerCase (this=0x859fcc0, locale=Variable locale is not available. ) at cni.h:41 #11 0x403d4943 in java.lang.String.toLowerCase() (this=0xffef) at /datal/gcc/gcc/libjava/java/lang/String.java:1031 #12 0x4050de5f in gnu.gcj.convert.IOConverter.canonicalize(java.lang.String) (name=Variable name is not available. ) at /datal/gcc/gcc/libjava/gnu/gcj/convert/IOConverter.java:77 #13 0x4050c36d in gnu.gcj.convert.BytesToUnicode.getDecoder(java.lang.String) (encoding=0x859fcc0) at /datal/gcc/gcc/libjava/gnu/gcj/convert/BytesToUnicode.java:78 #14 0x403af2c0 in java::lang::String::init (this=0x97c42d0, bytes=0x88ea000, offset=Variable offset is not available. ) at /datal/gcc/gcc/libjava/java/lang/natString.cc:488 #15 0x403d429e in java.lang.String.String(byte[], int, int) (this=Variable this is not available. ) at /datal/gcc/gcc/libjava/java/lang/String.java:345 ... I'll leave it open in a screen session. Any gdb commands I should type? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
-- What|Removed |Added CC||Hans dot Boehm at hp dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From Hans dot Boehm at hp dot com 2004-11-01 20:44 --- This would be a lot easier if libgcj had been built with something like -O2 -g. Based on approximate manual matching of the object code to finalize.s, I think this is failing around line 452 of finalize.c on the line new_fo - fo_object_size = hhdr - hb_sz; It appears that hhdr is in %edx and is 1. This can occur if the first argument to GC_register_finalizer_inner is a pointer to somewhere in the second page of a large object. It should of course be a base pointer to an object, so this should be impossible. I think the GC_register_finalizer_no_order call must be coming from maybe_remove_all_heavy(), which called remove_all_heavy, which was presumably inlined into _Jv_MonitorExit(). I see no other path to GC_register_finalizer_no_order(). That makes it appear that an object whose heavy-weight lock we are about to remove has previously been garbage collected. That should be impossible since we previously registered our own finalizer for the object in question, and that acquires the lock bit in the lock hash table entry, as does remove_all_heavy. Thus the finalizer should have previously been run to completion, and all traces of the heavy lock should have been previously removed. Are there places we add a finalizer to an existing object without checking for prior finalizers? That might explain the problem. We really need some more evidence to confirm this chain of reasoning. A -g stack trace, and the values of the finalization proc and data (and the object the data pointer points to, if any) that are being passed to GC_register_finalizer_inner might help. So would GC_find_header (object_being_registered_address). Assuming that's one, as expected, then *GC_find_header(object_being_registered_address - 4096) together with GC_gc_no would also be somewhat interesting. Does this application use some flavor of weak references? If so, which one? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266
[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()
--- Additional Comments From ovidr at users dot sourceforge dot net 2004-11-01 22:08 --- The app uses many java.util.WeakHashMap s (usually with null values, just storing objects in the keys ie: map.put(object, null), if that matters). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266