[Mono-devel-list] RE: [Gc] [PATCH] Race condition when restarting threads
Sorry about the long delay. I don't quite understand the problem here. If GC_stop_count has just been incremented, then I'm about to send another suspend signal to the thread, and it will have to stop again before we think the world is stopped. Can you be a bit more specific about the race here? Thanks. Hans -Original Message- From: Ben Maurer [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 12, 2005 11:29 PM To: Boehm, Hans Cc: [EMAIL PROTECTED]; mono-devel-list@lists.ximian.com Subject: RE: [Gc] [PATCH] Race condition when restarting threads On Tue, 2005-07-12 at 11:42 -0700, Boehm, Hans wrote: Your patch had the fields set as volatile, so shouldn't the compiler ensure that the cpu does not reorder the stores? We had a long discussion of that on the C++ memory model list. The answer is architecture dependent. Volatile will generally prevent compiler reordering. It usually introduces the necessary hardware barriers on Itanium, but not, for example, on PowerPC. I think your version has a race if this is the case: +sigsuspend(suspend_handler_mask);/* Wait for signal */ +while (GC_world_is_stopped GC_stop_count == my_stop_count) { Imagine that this thread gets a spurious signal. The GC_stop_count++ statement has already taken effect, but the GC_world_is_stopped = TRUE has not, the thread would bypass the wait, causing the world not to be stopped. In fact, how do we know that my_stop_count is correct? When I put this in the mono tree, I'd really have something with barriers in it and use the version that I suggested. Without the barriers, am a bit worried about the correctness (especially since the issues happen on platforms I am not testing with). -- Ben ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-list] RE: GC issue in mono 0.31
I assume nothing is prelinked? I don't think that was even an option on RedHat 7.1. It would be helpful to set a breakpoint in GC_add_roots_inner, and verify that it's actually being called with (DATAEND, which is defined to be) _end as its middle argument. It looks like both mono and libmono define _end. By the normal ELF default symbol lookup rules, I believe libmono references to _end should see the definition in the main program. If this is indeed not happening, and if the libmono developers aren't aware of other relevant issues, I would ask on a binutils mailing list for ideas. Hans -Original Message- From: Nikolai Zhubr [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 04, 2004 12:53 PM To: Boehm, Hans Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: GC issue in mono 0.31 Hello Hans, I think my binutils are just of regular Redhat 7.1 linux: binutils-2.10.91.0.2-3 GNU ld 2.10.91 Copyright 2001 Free Software Foundation, Inc. I've just checked, there are no occurencies of Bsymbolic anywhere within mono build tree. The output of nm is attached. -- Best regards, Nikolai Zhubr Tuesday, 04 May, 2004, 1:41:32, you wrote: The problem is pretty clear: The GC is treating the region between __data_start (0x80497b0) and the end of the libmono data segment (0x401851a0) as a single traceable data region. That probably means _end is somehow defined by the linker to be the end of the libmono data segment instead of the end of the main data segment. This is not how Linux linkers are supposed to behave, though I've seen similar behavior on other operating systems. (This assumes that libmono doesn't use some other mechanism for setting DATAEND.) If your binutils and linker script came from a standard Linux distribution, it would be nice to track down how this happened. If not, that's almost certainly the source of the problem. Another possible cause of the problem might be if the gc is linked into libmono, and that's linked with -Bsymbolic. (That's probably not an unreasonable thing to do. If so, there's probably a way to work around this.) The output of nm on the main executable and libmono might be useful in tracking this down further. Hans ___ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list
[Mono-list] RE: GC issue in mono 0.31
This means that the collector tried to trace, i.e. look for pointer in, a memory range that was not in fact mapped. The interesting values to look at are local variables current_p and limit, as well as the output of GC_dump() and a copy of /proc/pid/maps. Possible causes are: 1) The collector is confused about the location of the cold end of the main stack. You might check that GC_stackbottom looks reasonable. 2) The collector is confused about the location of a data segment. 3) The collector was mistakenly not configured for thread support. I suspect the collector hasn't been as heavily tested with 2.6 kernels and NPTL as it should have been. I'm also trying to so some of that as we speak. Does gctest (make check) in the gc directory work? (I assume that should still work with the Mono version of the collector.) Once you're sure that this is not a Mono-specific issue, it's good to copy the [EMAIL PROTECTED] mailing list. Hans -Original Message- From: Nikolai Zhubr [mailto:[EMAIL PROTECTED] Sent: Sunday, May 02, 2004 3:54 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: GC issue in mono 0.31 Hello, I have a problem running mono 0.31 - it segfaults almost immediately in GC startup code, as far as I can see. Here is the call sequence: GC_malloc - GC_generic_malloc_inner - GC_init_inner - GC_try_to_collect_inner - GC_stopped_mark - GC_mark_some - GC_mark_from - mark.c:line 769: defered = *limit; === segfault. Note: GC_mark_from() executes successfully some few times, then segfaults. The assembly file specified as an argument doesn't seem to be relevant. Starting with no arguments displays help screen normally. I'm using 686 linux 2.6.5, gcc 3.3.1. Let me know if I should provide more details. Thank you. (I'm not on ML, please CC me so I can get reply) -- Best regards, Nikolai Zhubr ___ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list
[Mono-list] RE: GC issue in mono 0.31
The problem is pretty clear: The GC is treating the region between __data_start (0x80497b0) and the end of the libmono data segment (0x401851a0) as a single traceable data region. That probably means _end is somehow defined by the linker to be the end of the libmono data segment instead of the end of the main data segment. This is not how Linux linkers are supposed to behave, though I've seen similar behavior on other operating systems. (This assumes that libmono doesn't use some other mechanism for setting DATAEND.) If your binutils and linker script came from a standard Linux distribution, it would be nice to track down how this happened. If not, that's almost certainly the source of the problem. Another possible cause of the problem might be if the gc is linked into libmono, and that's linked with -Bsymbolic. (That's probably not an unreasonable thing to do. If so, there's probably a way to work around this.) The output of nm on the main executable and libmono might be useful in tracking this down further. Hans -Original Message- From: Nikolai Zhubr [mailto:[EMAIL PROTECTED] Sent: Monday, May 03, 2004 2:09 PM To: Boehm, Hans Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: GC issue in mono 0.31 Hello Hans, It looks like mono GC version doesn't have gctest and does nothing for make check. I suppose I'll therefore need to download full version first. Meanwhile, I've added more debugging prints now so all variables you mentioned are displayed just before segv. This output + /proc/mono pid/maps is attached here. I'd also note that NPTL is not present on this box, AFAIK. Well, the base system is pretty old, basically RedHat 7.1 with only some specific packages updated, those which I was actually interested in. -- Best regards, Nikolai Zhubr Monday, 03 May, 2004, 22:18:09, you wrote: This means that the collector tried to trace, i.e. look for pointer in, a memory range that was not in fact mapped. The interesting values to look at are local variables current_p and limit, as well as the output of GC_dump() and a copy of /proc/pid/maps. Possible causes are: 1) The collector is confused about the location of the cold end of the main stack. You might check that GC_stackbottom looks reasonable. 2) The collector is confused about the location of a data segment. 3) The collector was mistakenly not configured for thread support. I suspect the collector hasn't been as heavily tested with 2.6 kernels and NPTL as it should have been. I'm also trying to so some of that as we speak. Does gctest (make check) in the gc directory work? (I assume that should still work with the Mono version of the collector.) Once you're sure that this is not a Mono-specific issue, it's good to copy the [EMAIL PROTECTED] mailing list. Hans -Original Message- From: Nikolai Zhubr [mailto:[EMAIL PROTECTED] Sent: Sunday, May 02, 2004 3:54 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: GC issue in mono 0.31 ___ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list