[Mono-devel-list] RE: [Gc] [PATCH] Race condition when restarting threads

2005-08-04 Thread Boehm, Hans
Sorry about the long delay.

I don't quite understand the problem here.  If GC_stop_count has
just been incremented, then I'm about to send another suspend
signal to the thread, and it will have to stop again before
we think the world is stopped.  

Can you be a bit more specific about the race here?

Thanks.

Hans

 -Original Message-
 From: Ben Maurer [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, July 12, 2005 11:29 PM
 To: Boehm, Hans
 Cc: [EMAIL PROTECTED]; mono-devel-list@lists.ximian.com
 Subject: RE: [Gc] [PATCH] Race condition when restarting threads
 
 
 On Tue, 2005-07-12 at 11:42 -0700, Boehm, Hans wrote:
   Your patch had the fields set as volatile, so shouldn't the
   compiler ensure that the cpu does not reorder the stores?
  We had a long discussion of that on the C++ memory model list. The 
  answer is architecture dependent.  Volatile will generally prevent 
  compiler reordering.  It usually introduces the necessary hardware 
  barriers on Itanium, but not, for example, on PowerPC.
 
 I think your version has a race if this is the case:
 
  +sigsuspend(suspend_handler_mask);/* Wait for signal */
  +while (GC_world_is_stopped  GC_stop_count == my_stop_count) {
 
 Imagine that this thread gets a spurious signal. The 
 GC_stop_count++ statement has already taken effect, but the 
 GC_world_is_stopped = TRUE has not, the thread would bypass 
 the wait, causing the world not to be stopped.
 
 In fact, how do we know that my_stop_count is correct?
 
 When I put this in the mono tree, I'd really have something 
 with barriers in it and use the version that I suggested. 
 Without the barriers, am a bit worried about the correctness 
 (especially since the issues happen on platforms I am not 
 testing with).
 
 -- Ben
 
 
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-list] RE: GC issue in mono 0.31

2004-05-05 Thread Boehm, Hans
I assume nothing is prelinked?  I don't think that was even an option
on RedHat 7.1.

It would be helpful to set a breakpoint in GC_add_roots_inner, and
verify that it's actually being called with (DATAEND, which is
defined to be) _end as its middle argument.

It looks like both mono and libmono define _end.  By the normal ELF
default symbol lookup rules, I believe libmono references to _end should see the
definition in the main program.  If this is indeed not happening, and if the
libmono developers aren't aware of other relevant issues, I would ask
on a binutils mailing list for ideas.

Hans

 -Original Message-
 From: Nikolai Zhubr [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 04, 2004 12:53 PM
 To: Boehm, Hans
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: GC issue in mono 0.31
 
 
 Hello Hans,
 I think my binutils are just of regular Redhat 7.1 linux:
 binutils-2.10.91.0.2-3
 GNU ld 2.10.91
 Copyright 2001 Free Software Foundation, Inc.
 I've just checked, there are no occurencies of Bsymbolic
 anywhere within mono build tree.
 The output of nm is attached.
 -- 
 Best regards,
  Nikolai Zhubr
 Tuesday, 04 May, 2004, 1:41:32, you wrote:
  The problem is pretty clear:  The GC is treating the region
  between __data_start (0x80497b0) and the end of the libmono 
 data segment
  (0x401851a0) as a single traceable data region.  That probably means
  _end is somehow defined by the linker to be the end of the libmono
  data segment instead of the end of the main data segment.  This
  is not how Linux linkers are supposed to behave, though I've
  seen similar behavior on other operating systems.  (This 
 assumes that
  libmono doesn't use some other mechanism for setting DATAEND.)
  If your binutils and linker script came from a standard 
 Linux distribution,
  it would be nice to track down how this happened.  If not, 
 that's almost
  certainly the source of the problem.
  Another possible cause of the problem might be if the gc is 
 linked into
  libmono, and that's linked with -Bsymbolic.  (That's probably
  not an unreasonable thing to do.  If so, there's probably a way to
  work around this.)
  The output of nm on the main executable and libmono might be useful
  in tracking this down further.
  Hans
 
___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


[Mono-list] RE: GC issue in mono 0.31

2004-05-04 Thread Boehm, Hans
This means that the collector tried to trace, i.e. look for pointer in,
a memory range that was not in fact mapped.  The interesting values
to look at are local variables current_p and limit, as well as the output
of GC_dump() and a copy of /proc/pid/maps.

Possible causes are:

1) The collector is confused about the location of the cold end of
the main stack.  You might check that GC_stackbottom looks reasonable.

2) The collector is confused about the location of a data segment.

3) The collector was mistakenly not configured for thread support.

I suspect the collector hasn't been as heavily tested with 2.6 kernels
and NPTL as it should have been.  I'm also trying to so some of that
as we speak.

Does gctest (make check) in the gc directory work?  (I assume that
should still work with the Mono version of the collector.)

Once you're sure that this is not a Mono-specific issue, it's good to
copy the [EMAIL PROTECTED] mailing list.

Hans

 -Original Message-
 From: Nikolai Zhubr [mailto:[EMAIL PROTECTED]
 Sent: Sunday, May 02, 2004 3:54 PM
 To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: GC issue in mono 0.31
 
 
 Hello,
 I have a problem running mono 0.31 - it segfaults almost
 immediately in GC startup code, as far as I can see. Here
 is the call sequence:
 GC_malloc -
  GC_generic_malloc_inner -
   GC_init_inner -
GC_try_to_collect_inner -
 GC_stopped_mark -
  GC_mark_some -
   GC_mark_from -
   mark.c:line 769: defered = *limit; === segfault.
 Note: GC_mark_from() executes successfully some few times,
 then segfaults. The assembly file specified as an argument
 doesn't seem to be relevant. Starting with no arguments
 displays help screen normally.
 I'm using 686 linux 2.6.5, gcc 3.3.1. Let me know if I should
 provide more details.
 Thank you.
 (I'm not on ML, please CC me so I can get reply)
 -- 
 Best regards,
  Nikolai Zhubr
 
 
___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


[Mono-list] RE: GC issue in mono 0.31

2004-05-04 Thread Boehm, Hans
The problem is pretty clear:  The GC is treating the region
between __data_start (0x80497b0) and the end of the libmono data segment
(0x401851a0) as a single traceable data region.  That probably means
_end is somehow defined by the linker to be the end of the libmono
data segment instead of the end of the main data segment.  This
is not how Linux linkers are supposed to behave, though I've
seen similar behavior on other operating systems.  (This assumes that
libmono doesn't use some other mechanism for setting DATAEND.)

If your binutils and linker script came from a standard Linux distribution,
it would be nice to track down how this happened.  If not, that's almost
certainly the source of the problem.

Another possible cause of the problem might be if the gc is linked into
libmono, and that's linked with -Bsymbolic.  (That's probably
not an unreasonable thing to do.  If so, there's probably a way to
work around this.)

The output of nm on the main executable and libmono might be useful
in tracking this down further.

Hans

 -Original Message-
 From: Nikolai Zhubr [mailto:[EMAIL PROTECTED]
 Sent: Monday, May 03, 2004 2:09 PM
 To: Boehm, Hans
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: GC issue in mono 0.31
 
 
 Hello Hans,
 It looks like mono GC version doesn't have gctest and does
 nothing for make check. I suppose I'll therefore need to
 download full version first.
 Meanwhile, I've added more debugging prints now so all
 variables you mentioned are displayed just before segv.
 This output + /proc/mono pid/maps is attached here.
 I'd also note that NPTL is not present on this box, AFAIK.
 Well, the base system is pretty old, basically RedHat 7.1
 with only some specific packages updated, those which I was
 actually interested in.
 -- 
 Best regards,
  Nikolai Zhubr
 Monday, 03 May, 2004, 22:18:09, you wrote:
  This means that the collector tried to trace, i.e. look for 
 pointer in,
  a memory range that was not in fact mapped.  The interesting values
  to look at are local variables current_p and limit, as well 
 as the output
  of GC_dump() and a copy of /proc/pid/maps.
  Possible causes are:
  1) The collector is confused about the location of the cold end of
  the main stack.  You might check that GC_stackbottom looks 
 reasonable.
  2) The collector is confused about the location of a data segment.
  3) The collector was mistakenly not configured for thread support.
  I suspect the collector hasn't been as heavily tested with 
 2.6 kernels
  and NPTL as it should have been.  I'm also trying to so some of that
  as we speak.
  Does gctest (make check) in the gc directory work?  (I assume that
  should still work with the Mono version of the collector.)
  Once you're sure that this is not a Mono-specific issue, 
 it's good to
  copy the [EMAIL PROTECTED] mailing list.
  Hans
  -Original Message-
  From: Nikolai Zhubr [mailto:[EMAIL PROTECTED]
  Sent: Sunday, May 02, 2004 3:54 PM
  To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]
  Subject: GC issue in mono 0.31
 
___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list