[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2006-03-08 Thread tromey at gcc dot gnu dot org


--- Comment #14 from tromey at gcc dot gnu dot org  2006-03-08 19:27 ---
I've been looking into this a bit.

The current problem I see is that the heavyweight lock stuff
relies on the GC.  This won't interact well with the current
code in natReference.cc, as those data structures are not scanned.

Also, I do think that both calls to _Jv_RegisterFinalizer in
Reference::create are problematic.  The first call registers a finalizer
for the Reference, the second for the referent.  But, there is nothing
preventing a subclass of Reference from having a finalizer; or from
user code acquiring a heavy lock on a Reference object.  So, all
cases have to be handled here.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266



[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2005-07-19 Thread ovidr at users dot sourceforge dot net

--- Additional Comments From ovidr at users dot sourceforge dot net  
2005-07-19 15:06 ---
I've spent a lot of time trying to make a testcase of this, but no luck yet.

I can basically create a testapp with 2 threads. When they both access a 
synchronized method, and are forced to wait long enough (natObject.cc:907 spins 
18 times before making a hard lock) you can drop the finalizer by inserting a 
WeakHashMap.put(this, null) call. (I force System.gc() regularly)  In gdb I can 
see that heavy_lock_obj_finalization_proc is no longer called once the 
Reference::create() call has been made.  I've done this a ton of times in a 
loop, but I just can't get the test app to crash.   Is dropping the finalizer 
enough to cause a crash (over time)?  I'm not sure what I'm missing, or what I 
can do to force this crash.  

My real app however does crash.  I've recompiled libgcj and do get all the 
information originally requested from gdb from an above comment in yet another 
but similar backtrace.  I don't know if it confirms that the problem is in 
dropping finalizers (or maybe that is a separate problem?), but thought I'd 
post it.

Program received signal SIGSEGV, Segmentation fault.
0x404229f5 in GC_mark_from (mark_stack_top=0xc82b000, mark_stack=0xc82b000,
mark_stack_limit=0xc83b000) at /home/gcc/gcc/boehm-gc/mark.c:724
724 descr = *(word *)(type_descr
(gdb) bt
#0  0x404229f5 in GC_mark_from (mark_stack_top=0xc82b000, mark_stack=0xc82b000,
mark_stack_limit=0xc83b000) at /home/gcc/gcc/boehm-gc/mark.c:724
#1  0x4041eab8 in GC_finalize () at /home/gcc/gcc/boehm-gc/finalize.c:639
#2  0x4041ab83 in GC_finish_collection () at /home/gcc/gcc/boehm-gc/alloc.c:659
#3  0x4041a35b in GC_try_to_collect_inner (stop_func=0x40419c5c 
GC_never_stop_func)
at /home/gcc/gcc/boehm-gc/alloc.c:376
#4  0x4041b3e8 in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0)
at /home/gcc/gcc/boehm-gc/alloc.c:996
#5  0x4041b683 in GC_allocobj (sz=4, kind=0) at /home/gcc/gcc/boehm-
gc/alloc.c:1071
#6  0x40420679 in GC_generic_malloc_inner (lb=16, k=0) at /home/gcc/gcc/boehm-
gc/malloc.c:136
#7  0x404217b3 in GC_generic_malloc_many (lb=16, k=0, result=0x4062b4a8)
at /home/gcc/gcc/boehm-gc/mallocx.c:512
#8  0x4042b32d in GC_local_malloc_atomic (bytes=12) at /home/gcc/gcc/boehm-
gc/pthread_support.c:334
#9  0x401f2ec7 in _Jv_AllocPtrFreeObj (size=12, klass=0x8816688) at java-gc.h:57
#10 0x401f1674 in _Jv_NewPrimArray (eltype=0x87a3be0, count=1)
at /home/gcc/gcc/libjava/prims.cc:559
#11 0x08287db9 in org.eclipse.swt.widgets.Table.textCellDataProc(int, int, int, 
int, int) (
this=0x8940dc0, tree_column=146453640, cell=146453856, 
tree_model=206469928, iter=-1073753012,
data=146439960) at Table.java:2704
#12 0x082b15b4 in org.eclipse.swt.widgets.Display.textCellDataProc(int, int, 
int, int, int) (
this=0x884ed48, tree_column=146453640, cell=146453856, 
tree_model=206469928, iter=-1073753012,
data=146439960) at Display.java:3305
#13 0x4040aceb in ffi_call_SYSV () at /home/gcc/gcc/libffi/src/x86/sysv.S:60
#14 0x4040a8d2 in ffi_call (cif=0xbfffd0b8,

fn=0x82b1544 org.eclipse.swt.widgets.Display.textCellDataProc(int, int, 
int, int, int),
rvalue=0xbfffd0b0, avalue=0xbfffcfd0) 
at /home/gcc/gcc/libffi/src/x86/ffi.c:221
#15 0x4023e91e in _Jv_CallAnyMethodA (obj=0x884ed48, return_type=0x87a3be0, 
meth=0x87007c0,
is_constructor=0 '\0', is_virtual_call=1 '\001', parameter_types=0xc7a5460, 
args=0xbfffd160,
result=0xbfffd1d4, is_jni_call=1 '\001', iface=0x0)
at /home/gcc/gcc/libjava/java/lang/reflect/natMethod.cc:495
#16 0x401fa956 in _Jv_JNI_CallAnyMethodVjint, normal (env=0x87b28f8, 
obj=0x884ed48, klass=0x0,
id=0x87007c0, vargs=0xbfffd250 \210ÎéÎ÷ÎáÎõÎù\b`ÎåÎéÎÝÎáÎõÎù\b
({N\fLÎùÎ÷ÎáÎñÎåÎáÎõÎý\030\177ÎáÎõÎù\bÎùÎé\200iKÎíÎíÎåÎíÎõ\227K)
at /home/gcc/gcc/libjava/jni.cc:796
#17 0x401fa9ed in _Jv_JNI_CallMethodVjint (env=0x87b28f8, obj=0x884ed48, 
id=0x87007c0,
args=0xbfffd250 \210ÎéÎ÷ÎáÎõÎù\b`ÎåÎéÎÝÎáÎõÎù\b({N\fLÎùÎ÷ÎáÎñÎåÎáÎõÎý\030
\177ÎáÎõÎù\bÎùÎé\200iKÎíÎíÎåÎíÎõ\227K)
at /home/gcc/gcc/libjava/jni.cc:967
#18 0x40fbcfac in callback () from ./lib/libswt-gtk-3138.so
#19 0x40faeb65 in fn16_5 () from ./lib/libswt-gtk-3138.so

(gdb) p descr
$1 = 4294967279
(gdb) p current_p
$2 = (word *) 0x93b10e0
(gdb) p type_descr
$3 = 0x2d02ca8a Address 0x2d02ca8a out of bounds
(gdb) p GC_gc_no
$4 = 1731
(gdb) p *mark_stack_top
$5 = {mse_start = 0x93b10e0, mse_descr = 4294967279}
(gdb) up
#1  0x4041eab8 in GC_finalize () at /home/gcc/gcc/boehm-gc/finalize.c:639
639 GC_MARK_FO(real_ptr, GC_normal_finalize_mark_proc);
(gdb) p real_ptr
$6 = 0x93b10e0 \212ÎõÎ÷\002-
(gdb) p *curr_fo
$7 = {prolog = {hidden_key = 154865888, next = 0x96d54f8}, fo_fn = 0x40408c14 
call_finalizer, 
  fo_client_data = 
0x4023b092 U\211ÎáÎéÎíVS\203ÎáÎíÎá`ÎáÎéÎý\026TÎáÎñÎÝÎáÎñÎå\201ÎõÎñÎáÎáÎá\225=,
 fo_object_size = 22, 
  fo_mark_proc = 0x4041e03e GC_null_finalize_mark_proc}
(gdb) 

[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2005-06-20 Thread ovidr at users dot sourceforge dot net

--- Additional Comments From ovidr at users dot sourceforge dot net  
2005-06-20 16:25 ---
I've tried to create a testcase but can't seem to get a crash or infinite loop 
lockup.

Anyway, I think I understand conceptually what must be done, but in practice 
I'm still unsure of how to go about it. You don't seem to have a problem with 
the call to: 

_Jv_RegisterFinalizer (this, finalize_reference);

only

_Jv_RegisterFinalizer (referent, finalize_referred_to_object);

but since referent is an arbitrary object, what kind of finalizers can it 
already have? I'm stumped on how to get access to an arbitrary object's 
finalizers from natReference.cc.  Is it just the heavy_lock structure from 
natObject that needs to be considered somehow? 

From comment #10:
My impression is that natReference.cc already keeps a fairly elaborate data
structure to which you should be able to add the prior finalization info

This is the part that confuses me. Not all objects are References, so how would 
a Reference know about some arbitrary Object's previous finalizers or even 
attempt to maintain a data structure? 

And once I have the Object's old finalizer (if there is one), I guess I just 
run it, and register the new one with
GC_REGISTER_FINALIZER_NO_ORDER(x, x, cd, 0, 0);
from natReference's  finalize_referred_to_object ?  

Or am I way off track?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2005-06-08 Thread ovidr at users dot sourceforge dot net

--- Additional Comments From ovidr at users dot sourceforge dot net  
2005-06-08 21:14 ---
Since this bug seems a bit lost, I've been trying some things on my own without
success.

Can someone please explain:

If referent is just a RawData pointer to some Object, how are its previous
finalizers supposed to be found?  How to append them along with the new
finalizer in the correct order (order matters?) to
GC_REGISTER_FINALIZER_NO_ORDER (GC_register_finalizer_inner(obj, fn, cd, ofn,
ocd, mp)) which seems to be what is used in natObject.cc? 

The comments in this bug seem to suggest that there is some similar code
somewhere that I could lift and hook into natReference.create, but all the code
in natObject and String.intern finalization looks very different to my
novice-gcj eyes since they don't seem to be working with some foreign object,
nor with finalizers that aren't already locally stored in a struct.  

Secondly, is there any way to craft a testcase for this to know if it has been
fixed?  I don't fully understand why (based on comment #6) this would ever cause
a crash, and waiting many days for the crash of my apps is a very tedious 
process.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2005-06-08 Thread Hans dot Boehm at hp dot com

--- Additional Comments From Hans dot Boehm at hp dot com  2005-06-09 05:10 
---
Unfortunately, I haven't had time to pursue this.

I think that in order to get this to fail, you want lots of weak references to
objects which are also sobject to lock contention or wait/notify calls.  I don't
think we currently have a good test case.

My impression is that natReference.cc already keeps a fairly elaborate data
structure to which you should be able to add the prior finalization info, so
that it can be invoked at the right point by the existing finalizer there.

In general, the GC's data structures don't queue multiple finalizers.  You need
to register a new finalizer that knows it has to reregister the old one when
it's done.  The information that there was another finalizer needs to be kept
off to the side somewhere in a separate table, or as part of the client data
registered with the finalizer.  The locking code also has to deal with opaque
objects, but it again has its own hash table off to the side.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2005-05-17 Thread daney at gcc dot gnu dot org


-- 
   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |daney at gcc dot gnu dot org
   |dot org |
 Status|NEW |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2005-05-17 Thread pinskia at gcc dot gnu dot org


-- 
   What|Removed |Added

 AssignedTo|daney at gcc dot gnu dot org|unassigned at gcc dot gnu
   ||dot org
 Status|ASSIGNED|NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2004-12-02 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-02 
13:42 ---
Confirmed based on Tromey's comments.

-- 
   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed||1
   Last reconfirmed|-00-00 00:00:00 |2004-12-02 13:42:43
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2004-11-24 Thread Hans dot Boehm at hp dot com

--- Additional Comments From Hans dot Boehm at hp dot com  2004-11-25 01:50 
---
After finally finding time to look at the code, it appears that my earlier 
guesses were correct.

::java::lang::ref::Reference::create in natReference.cc calls 
_Jv_RegisterFinalizer(referent ...), where referent is an arbitrary object, 
which may already have a finalizer.  This is bad news, since the original 
finalizer will be dropped.

The original finalizer may be a Java finalizer, or it may be one that was 
registered by the hash synchronization code to clean up a heavy lock entry for 
the object.  In either case we lose.  (The hash synchronization code is careful 
to not lose the original finalizer.)  In both cases I think, we are likely to 
mostly introduce leaks, and crash only occasionally.  So this may explain some 
other misbehavior.

The fix may require some thought.  At a minimum, we need to export more GC 
functionality, so that the Reference implementation can retrieve the old 
finalizer.  (The hash synchronization code currently cheats and goes directly 
to the GC interface, which should also be fixed.)

I think that so long as Reference gets the ordering right, and doesn't assume 
that all finalizers are Java finalizers, the hash synchronization code should 
work.  It needs to drop the heavy lock before the object is deallocated, and 
while the lock is not held.  I don't think the timing otherwise matters.  If 
the object is resurrected 17 times, we can drop the heavy lock at any of those 
points, recreating it if necessary.

This really needs to be fixed to make any use of References reliable.

-- 
   What|Removed |Added

 CC||tromey at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2004-11-24 Thread tromey at gcc dot gnu dot org

--- Additional Comments From tromey at gcc dot gnu dot org  2004-11-25 
03:25 ---
Oops, I wasn't aware that the locks code was using finalizers.
We had to make special consideration in the reference code for
String.intern; we can do something similar for locks.
This is pretty important, I am going to add it to our 4.0
wish list.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2004-11-24 Thread tromey at gcc dot gnu dot org


-- 
   What|Removed |Added

OtherBugsDependingO||17574
  nThis||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2004-11-08 Thread Hans dot Boehm at hp dot com

--- Additional Comments From Hans dot Boehm at hp dot com  2004-11-08 19:55 
---
I think this could be explained by the same problem.

This time the collector is in the Java-specific finalization pass which
marks objects reachable from objects that are about to be finalized,
so that the finalizer doesn't see deallocated memory.

It appears that the finalizable object it's marking from has somehow
been clobbered already, and no longer has a proper vtable entry.
Hence the collector dies trying to find the mark descriptor in the
vtable.

I can see how this might happen if we accidentally register a finalizer
on something that was already collected, which can happen if the
finalizer installed by hash synchronization is dropped.

It might be useful to find a little more of the context.  Try

p descr
p current_p
p type_descr
p GC_gc_no
p *mark_stack_top
up 1-- goto GC_finalize frame
p real_ptr
p *curr_fo
x/8wx real_ptr -4

Printing *curr_fo should indicate the finalization function and client data
associated with this object.  It would be useful to explore the client data
a bit further, so that we can understand what the finalizer is really trying to 
do.
(All Java finalizers use the same function, and use the client data field to
specify what really needs to be done.)

Assuming current_p and real_ptr are the same, and you can call functions
from gdb, try

p GC_find_header(real_ptr)

If that looks like a sane pointer, also try 

p *GC_find_header(real_ptr)
p GC_base(real_ptr)

I need to look at the WeakHashMap code, but I won't get a chance to do that
for a few days.

We could also probably track this down more systematically by
having the hash synchronization code check that the vtable pointer hasn't
changed when we reregister the client finalizer around line 757 in natObject.cc.
We would have to remember the vtable pointer in the hl structure.
This might cause this to fail far more predictably, and might at
least confirm that we're not barking up the wrong tree.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2004-11-08 Thread ovidr at users dot sourceforge dot net

--- Additional Comments From ovidr at users dot sourceforge dot net  
2004-11-08 20:27 ---
Unforunately many variables seem not to be available:
GNU gdb 6.1

(gdb) p descr
Variable descr is not available.
(gdb) p current_p
$1 = (word *) 0x9acf618
(gdb) p type_descr
No symbol type_descr in current context.
(gdb) p GC_gc_no
$2 = 768
(gdb) p *mark_stack_top
$3 = {mse_start = 0x9acf618, mse_descr = 4294967279}
(gdb) up 1
#1  0x40523b4b in GC_finalize () at /datal/gcc/gcc/boehm-gc/finalize.c:639
639 GC_MARK_FO(real_ptr, GC_normal_finalize_mark_proc);
(gdb) p real_ptr
Variable real_ptr is not available.
(gdb) p *curr_fo
Variable curr_fo is not available.
(gdb) 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2004-11-07 Thread ovidr at users dot sourceforge dot net

--- Additional Comments From ovidr at users dot sourceforge dot net  
2004-11-08 03:00 ---
Recompiled with -g (and waited a few days..), but I'm not sure if this is the 
same problem or not:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1106963376 (LWP 23098)]
0x4052753c in GC_mark_from (mark_stack_top=0x8c5f000, mark_stack=0x8c5f000, 
mark_stack_limit=0x8c7f000)
at /datal/gcc/gcc/boehm-gc/mark.c:724
724 descr = *(word *)(type_descr
(gdb) bt
#0  0x4052753c in GC_mark_from (mark_stack_top=0x8c5f000, 
mark_stack=0x8c5f000, 
mark_stack_limit=0x8c7f000) at /datal/gcc/gcc/boehm-gc/mark.c:724
#1  0x40523b4b in GC_finalize () at /datal/gcc/gcc/boehm-gc/finalize.c:639
#2  0x4051fa60 in GC_finish_collection () at /datal/gcc/gcc/boehm-
gc/alloc.c:659
#3  0x405200eb in GC_try_to_collect_inner (stop_func=Variable stop_func is 
not available.
) at /datal/gcc/gcc/boehm-gc/alloc.c:376
#4  0x4052087e in GC_collect_or_expand (needed_blocks=Variable needed_blocks 
is not available.
) at /datal/gcc/gcc/boehm-gc/alloc.c:1020
#5  0x40520adb in GC_allocobj (sz=12, kind=0) at /datal/gcc/gcc/boehm-
gc/alloc.c:1071
#6  0x405253aa in GC_generic_malloc_inner (lb=48, k=0) at /datal/gcc/gcc/boehm-
gc/malloc.c:136
#7  0x4052621c in GC_generic_malloc_many (lb=48, k=0, result=0x8722cf8)
at /datal/gcc/gcc/boehm-gc/mallocx.c:512
#8  0x4053014e in GC_local_malloc_atomic (bytes=48) at /datal/gcc/gcc/boehm-
gc/pthread_support.c:334
#9  0x403780fc in _Jv_AllocString (len=14) at java-gc.h:57
#10 0x403b0b75 in java::lang::String::toLowerCase (this=0x859fcc0, 
locale=Variable locale is not available.
) at cni.h:41
#11 0x403d4943 in java.lang.String.toLowerCase() (this=0xffef)
at /datal/gcc/gcc/libjava/java/lang/String.java:1031
#12 0x4050de5f in gnu.gcj.convert.IOConverter.canonicalize(java.lang.String) 
(name=Variable name is not available.
)
at /datal/gcc/gcc/libjava/gnu/gcj/convert/IOConverter.java:77
#13 0x4050c36d in gnu.gcj.convert.BytesToUnicode.getDecoder(java.lang.String) 
(encoding=0x859fcc0)
at /datal/gcc/gcc/libjava/gnu/gcj/convert/BytesToUnicode.java:78
#14 0x403af2c0 in java::lang::String::init (this=0x97c42d0, bytes=0x88ea000, 
offset=Variable offset is not available.
)
at /datal/gcc/gcc/libjava/java/lang/natString.cc:488
#15 0x403d429e in java.lang.String.String(byte[], int, int) 
(this=Variable this is not available.
)
at /datal/gcc/gcc/libjava/java/lang/String.java:345

...
I'll leave it open in a screen session.
Any gdb commands I should type?



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2004-11-01 Thread ovidr at users dot sourceforge dot net


-- 
   What|Removed |Added

 CC||Hans dot Boehm at hp dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2004-11-01 Thread Hans dot Boehm at hp dot com

--- Additional Comments From Hans dot Boehm at hp dot com  2004-11-01 20:44 ---
This would be a lot easier if libgcj had been built with something like -O2 -g.

Based on approximate manual matching of the object code to finalize.s, I think 
this is failing around line 452 of finalize.c on the line

new_fo - fo_object_size = hhdr - hb_sz;

It appears that hhdr is in %edx and is 1.  This can occur if the first argument 
to GC_register_finalizer_inner is a pointer to somewhere in the second page of 
a large object.  It should of course be a base pointer to an object, so this 
should be impossible.

I think the GC_register_finalizer_no_order call must be coming from 
maybe_remove_all_heavy(), which called remove_all_heavy, which was presumably 
inlined into _Jv_MonitorExit().  I see no other path to 
GC_register_finalizer_no_order().

That makes it appear that an object whose heavy-weight lock we are about to 
remove has previously been garbage collected.  That should be impossible since 
we previously registered our own finalizer for the object in question, and that 
acquires the lock bit in the lock hash table entry, as does remove_all_heavy.  
Thus the finalizer should have previously been run to completion, and all 
traces of the heavy lock should have been previously removed.

Are there places we add a finalizer to an existing object without checking for 
prior finalizers?  That might explain the problem.

We really need some more evidence to confirm this chain of reasoning.  A -g 
stack trace, and the values of the finalization proc and data (and the object 
the data pointer points to, if any) that are being passed to 
GC_register_finalizer_inner might help.  So would GC_find_header
(object_being_registered_address).  Assuming that's one, as expected, then 
*GC_find_header(object_being_registered_address - 4096) together with GC_gc_no 
would also be somewhat interesting.

Does this application use some flavor of weak references?  If so, which one?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266


[Bug libgcj/18266] SIGSEGV in GC_register_finalizer_inner ()

2004-11-01 Thread ovidr at users dot sourceforge dot net

--- Additional Comments From ovidr at users dot sourceforge dot net  2004-11-01 
22:08 ---
The app uses many java.util.WeakHashMap s (usually with null values, just 
storing objects in the keys ie: map.put(object, null), if that matters).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18266