Re: D Concurrent GC

2010-10-10 Thread Leandro Lucarella
Leandro Lucarella, el  8 de octubre a las 01:44 me escribiste:
 Denis Koroskin, el  8 de octubre a las 05:14 me escribiste:
  I tried using your GC under D2/Windows, and unfortunately it crashes
  with Access Violation (I used a version modified by Sean as a
  starting point with little changes to fix compilation error).
  
  Are there any plans supporting it? I'm willing to help with testing
  it as much as I can.
 
 As I told to Sean, I tried my best to not break Windows support very
 much, but I don't have a Windows environment to test with (and to very
 interested on creating one to be honest, sorry :S). But I am very
 interested on accepting patches to make Windows work (and improvements).
 
 Testing the GC on Windows might not be very tempting though, as the main
 improvement is based on the fork(2) system call, which is not available
 on Windows, so all the fun options will be disabled there. The only
 thing that might worth trying in Windows is the precise scanning (well,
 there are some other minor optimizations that proved useful).

BTW, I've put up a small (Linux/POSIX) HOWTO to easily try CDGC. People
interested on trying CDGC might find it useful:

http://llucax.com.ar/blog/blog/post/-4d1fefb4


-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
Cuando el Mártir estaba siendo perseguido y aglutinado por los
citronetos, aquellos perversos que pretendian, en su maldad, piononizar
las enseñanzas de Peperino.
-- Peperino Pómoro


Re: D Concurrent GC

2010-10-07 Thread Leandro Lucarella
Leandro Lucarella, el 10 de septiembre a las 09:26 me escribiste:
 Bernard Helyer, el 10 de septiembre a las 04:49 me escribiste:
  Very nice. I've been reading your posts on this with interest.
  
  How much work would be involved in porting this to druntime?
 
 Is hard to tell since I didn't followed druntime development very
 closely lately, but both are based on the same code, so probably not too
 much work should be involved.

BTW, I just saw there were quite a few replies to this threads while my
internet connection was down for a couple of weeks (just after I posted
this), that I didn't received.

I'm sorry about the silence, I'll try to reply the things that might
still be of interest, and I'll take this opportunity to say that CDGC is
finished, see the blog post for (a few) more details:
http://llucax.com.ar/blog/blog/post/-7454ab1d


Now the replies.

David Simcha asked:

 Been meaning to ask, what does this GC do with regard to precise
 scanning?  Is it substantially less conservative than the current D2
 GC?

Yes, if the compiler support is there. It uses the patch to DMD posted
by wm4 to scan the heap precisely. See bug 3463 for details:
http://d.puremagic.com/issues/show_bug.cgi?id=3463


Andrei Alexandrescu said:

 There's a discussion on digitalmars.D about a D program being slower
 than the equivalent Python script because of the GC. Would be a good
 test bed!

 http://rounin.livejournal.com/21815.html

I'll try it when I have some time. I wouldn't expect much improvement
because it only uses AAs that doesn't get scanned precisely yet (IIRC
from wm4 patch), but there are a few optimizations that could help with
dynamic arrays (which are also used frequently). We'll see :)

OTOH, Dil does a lot of string processing using too AAs and dynamic
arrays (AFAIK) and the improvement is huge using CGGC :). As a sample,
attached are the results for a Dil run generating all the Tango docs,
using 1 CPU (using more CPUs is a little faster), for the Tango basic GC
(TBGC) and several configurations of the CDGC. The file named time
measures the total run time (i.e. the throughput), the one named stw
measures the maximum stop-the-world pause (i.e. the time no thread can
run) and the file named pause measures the maximum time the GC will
actually block the program (i.e. the maximum real latency). All in
seconds, for 50 runs (the run time) and 20 runs (the pause time),
minimum is in black, maximum in white, and the mean is centered between
2 std deviations (in grey).


-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
Hey you, out there on your own
Sitting naked by the phone
Would you touch me?


pause-dil-1cpu.pdf
Description: Adobe PDF document


stw-dil-1cpu.pdf
Description: Adobe PDF document


time-dil-1cpu.pdf
Description: Adobe PDF document


Re: D Concurrent GC

2010-10-07 Thread Denis Koroskin
On Fri, 08 Oct 2010 04:25:30 +0400, Leandro Lucarella l...@llucax.com.ar  
wrote:



Leandro Lucarella, el 10 de septiembre a las 09:26 me escribiste:

Bernard Helyer, el 10 de septiembre a las 04:49 me escribiste:
 Very nice. I've been reading your posts on this with interest.

 How much work would be involved in porting this to druntime?

Is hard to tell since I didn't followed druntime development very
closely lately, but both are based on the same code, so probably not too
much work should be involved.


BTW, I just saw there were quite a few replies to this threads while my
internet connection was down for a couple of weeks (just after I posted
this), that I didn't received.

I'm sorry about the silence, I'll try to reply the things that might
still be of interest, and I'll take this opportunity to say that CDGC is
finished, see the blog post for (a few) more details:
http://llucax.com.ar/blog/blog/post/-7454ab1d


Now the replies.

David Simcha asked:


Been meaning to ask, what does this GC do with regard to precise
scanning?  Is it substantially less conservative than the current D2
GC?


Yes, if the compiler support is there. It uses the patch to DMD posted
by wm4 to scan the heap precisely. See bug 3463 for details:
http://d.puremagic.com/issues/show_bug.cgi?id=3463


Andrei Alexandrescu said:


There's a discussion on digitalmars.D about a D program being slower
than the equivalent Python script because of the GC. Would be a good
test bed!

http://rounin.livejournal.com/21815.html


I'll try it when I have some time. I wouldn't expect much improvement
because it only uses AAs that doesn't get scanned precisely yet (IIRC
from wm4 patch), but there are a few optimizations that could help with
dynamic arrays (which are also used frequently). We'll see :)

OTOH, Dil does a lot of string processing using too AAs and dynamic
arrays (AFAIK) and the improvement is huge using CGGC :). As a sample,
attached are the results for a Dil run generating all the Tango docs,
using 1 CPU (using more CPUs is a little faster), for the Tango basic GC
(TBGC) and several configurations of the CDGC. The file named time
measures the total run time (i.e. the throughput), the one named stw
measures the maximum stop-the-world pause (i.e. the time no thread can
run) and the file named pause measures the maximum time the GC will
actually block the program (i.e. the maximum real latency). All in
seconds, for 50 runs (the run time) and 20 runs (the pause time),
minimum is in black, maximum in white, and the mean is centered between
2 std deviations (in grey).




Hi, Leandro!

I tried using your GC under D2/Windows, and unfortunately it crashes with  
Access Violation (I used a version modified by Sean as a starting point  
with little changes to fix compilation error).


Are there any plans supporting it? I'm willing to help with testing it as  
much as I can.


Thanks!


Re: D Concurrent GC

2010-10-07 Thread Leandro Lucarella
Denis Koroskin, el  8 de octubre a las 05:14 me escribiste:
 I tried using your GC under D2/Windows, and unfortunately it crashes
 with Access Violation (I used a version modified by Sean as a
 starting point with little changes to fix compilation error).
 
 Are there any plans supporting it? I'm willing to help with testing
 it as much as I can.

As I told to Sean, I tried my best to not break Windows support very
much, but I don't have a Windows environment to test with (and to very
interested on creating one to be honest, sorry :S). But I am very
interested on accepting patches to make Windows work (and improvements).

Testing the GC on Windows might not be very tempting though, as the main
improvement is based on the fork(2) system call, which is not available
on Windows, so all the fun options will be disabled there. The only
thing that might worth trying in Windows is the precise scanning (well,
there are some other minor optimizations that proved useful).

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
Yo soy Peperino, mártir latino, venid al asado pero traed el vino.
-- Peperino Pómoro


Re: D Concurrent GC

2010-09-16 Thread Andrei Alexandrescu

On 9/15/10 22:22 CDT, Andrei Alexandrescu wrote:

On 09/15/2010 07:00 PM, Sean Kelly wrote:

Lutger Wrote:


Sean Kelly wrote:


Sean Kelly Wrote:


Leandro Lucarella Wrote:


Not quite ready for prime-time yet, but I think it's in a stage
when it
could be interesting anyway (at least for developers or people
that want
to experiment):
http://llucax.com.ar/blog/blog/post/-1a4bdfba


Nice work! I've gotten this to compile as the GC for druntime using
D2 but
have ran into a snag.


Okay, all fixed. Porting the GC to D2 was more work than porting it to
druntime specifically. There are still a few hacks in place to work
around
const issues and I faked the PointerMap stuff, but the GC seems to
work just
great. I'll have to find a reasonable performance test for
comparison. Oh,
I'll send you the modified code as well.


Are you going to integrate it in druntime?


Dunno. For now, I've just sent my modified copy to Leandro and the
druntime list for people to play with. It seems quite promising though.


There's a discussion on digitalmars.D about a D program being slower
than the equivalent Python script because of the GC. Would be a good
test bed!


This is what I was referring to.

http://rounin.livejournal.com/21815.html

The author makes quite a few good points comparing D and Python in 
expressiveness and performance.



Andrei



Re: D Concurrent GC

2010-09-16 Thread Sean Kelly
Andrei Alexandrescu Wrote:
 
 There's a discussion on digitalmars.D about a D program being slower 
 than the equivalent Python script because of the GC. Would be a good 
 test bed!

I was playing with that yesterday, but there was too much setup required for a 
quick test (it compared an MD5 has of a directory tree to the current directory 
tree).  I'll revisit it though, since it seems a good starting point.


Re: D Concurrent GC

2010-09-16 Thread Sean Kelly
dsimcha Wrote:

 == Quote from Sean Kelly (s...@invisibleduck.org)'s article

  Dunno.  For now, I've just sent my modified copy to Leandro and the druntime
 list for people to play with.  It seems quite promising though.
 
 Been meaning to ask, what does this GC do with regard to precise scanning?  
 Is it
 substantially less conservative than the current D2 GC?

Its gc_malloc, etc, accept a PointerMap type which I assume is used for precise 
scanning, but I couldn't find a definition of PointerMap in the GC code or 
other obvious locations so I disabled it for now.  I assume it's from the 
precise scanning patch and that I can get whatever else is needed from there.  
For now, I just wanted a functional port to test.


Re: D Concurrent GC

2010-09-15 Thread Steven Schveighoffer
On Tue, 14 Sep 2010 19:09:18 -0400, Sean Kelly s...@invisibleduck.org  
wrote:



Leandro Lucarella Wrote:


Not quite ready for prime-time yet, but I think it's in a stage when it
could be interesting anyway (at least for developers or people that want
to experiment):
http://llucax.com.ar/blog/blog/post/-1a4bdfba


Nice work!  I've gotten this to compile as the GC for druntime using D2  
but have ran into a snag.  I'm using OSX (ie. no usable debug info) but  
near as I can tell the issue is:


private T locked(T, alias Code)()
{
if (thread_needLock())
synchronized (gc.lock) return Code();
else
   return Code();
}

void* gc_malloc(size_t size, uint attrs = 0)
{
if (size == 0)
return null;
return locked!(void*, () {
assert (Invariant()); scope (exit) assert (Invariant());
return malloc(size, attrs, null);
})();
}

In the code above, it appears that the anonymous delegate being passed  
as an alias to locked() is having its stack data dynamically allocated  
(ie. as a dynamic closure).  For normal delegate calls this can be  
avoided by accepting scope delegate as the function parameter, but I  
haven't found an analog when using the alias approach.  Obviously, what  
happens is that a call to gc_malloc() ends up needing GCed memory, so  
gc_malloc() is recursively called, and on until the stack explodes.   
I'll see if I can come up with a workaround that continues using the  
alias template parameter.


What if you passed it through a scope delegate function?

i.e.

void *lockedproxy(scope void* delegate() dg)
{
  return locked!(void *, dg);
}

-Steve


Re: D Concurrent GC

2010-09-15 Thread Sean Kelly
Steven Schveighoffer schvei...@yahoo.com wrote:
 On Tue, 14 Sep 2010 19:09:18 -0400, Sean Kelly
 s...@invisibleduck.org  wrote:
 
 Leandro Lucarella Wrote:
 
 Not quite ready for prime-time yet, but I think it's in a stage when
   it
 could be interesting anyway (at least for developers or people that
   want
 to experiment):
 http://llucax.com.ar/blog/blog/post/-1a4bdfba
 
 Nice work!  I've gotten this to compile as the GC for druntime using
  D2   but have ran into a snag.  I'm using OSX (ie. no usable debug
  info) but   near as I can tell the issue is:
 
 private T locked(T, alias Code)()
 {
 if (thread_needLock())
 synchronized (gc.lock) return Code();
 else
return Code();
 }
 
 void* gc_malloc(size_t size, uint attrs = 0)
 {
 if (size == 0)
 return null;
 return locked!(void*, () {
 assert (Invariant()); scope (exit) assert (Invariant());
 return malloc(size, attrs, null);
 })();
 }
 
 In the code above, it appears that the anonymous delegate being
  passed   as an alias to locked() is having its stack data
  dynamically allocated   (ie. as a dynamic closure).  For normal
  delegate calls this can be   avoided by accepting scope delegate
  as the function parameter, but I   haven't found an analog when
  using the alias approach.  Obviously, what   happens is that a call
  to gc_malloc() ends up needing GCed memory, so   gc_malloc() is
  recursively called, and on until the stack explodes.I'll see if
  I can come up with a workaround that continues using the   alias
  template parameter.
 
 What if you passed it through a scope delegate function?
 
 i.e.
 
 void *lockedproxy(scope void* delegate() dg)
 {
   return locked!(void *, dg);
 }

I could simply change it to a function that accepts a scope delegate. It
just isn't as efficient as the alias approach.  But an indirect call is
pretty small potatoes in this case anyway. I tried this and it worked,
and the I ran into an issue where the object used for the lock
(classinfo for GCLock) was apparently null. I'll look into that today.


Re: D Concurrent GC

2010-09-15 Thread Sean Kelly
Sean Kelly Wrote:

 Leandro Lucarella Wrote:
 
  Not quite ready for prime-time yet, but I think it's in a stage when it
  could be interesting anyway (at least for developers or people that want
  to experiment):
  http://llucax.com.ar/blog/blog/post/-1a4bdfba
 
 Nice work!  I've gotten this to compile as the GC for druntime using D2 but 
 have ran into a snag.

Okay, all fixed.  Porting the GC to D2 was more work than porting it to 
druntime specifically.  There are still a few hacks in place to work around 
const issues and I faked the PointerMap stuff, but the GC seems to work just 
great.  I'll have to find a reasonable performance test for comparison.  Oh, 
I'll send you the modified code as well.


Re: D Concurrent GC

2010-09-15 Thread Lutger
Sean Kelly wrote:

 Sean Kelly Wrote:
 
 Leandro Lucarella Wrote:
 
  Not quite ready for prime-time yet, but I think it's in a stage when it
  could be interesting anyway (at least for developers or people that want
  to experiment):
  http://llucax.com.ar/blog/blog/post/-1a4bdfba
 
 Nice work!  I've gotten this to compile as the GC for druntime using D2 but
 have ran into a snag.
 
 Okay, all fixed.  Porting the GC to D2 was more work than porting it to
 druntime specifically.  There are still a few hacks in place to work around
 const issues and I faked the PointerMap stuff, but the GC seems to work just
 great.  I'll have to find a reasonable performance test for comparison.  Oh,
 I'll send you the modified code as well.

Are you going to integrate it in druntime? 


Re: D Concurrent GC

2010-09-15 Thread Sean Kelly
Lutger Wrote:

 Sean Kelly wrote:
 
  Sean Kelly Wrote:
  
  Leandro Lucarella Wrote:
  
   Not quite ready for prime-time yet, but I think it's in a stage when it
   could be interesting anyway (at least for developers or people that want
   to experiment):
   http://llucax.com.ar/blog/blog/post/-1a4bdfba
  
  Nice work!  I've gotten this to compile as the GC for druntime using D2 but
  have ran into a snag.
  
  Okay, all fixed.  Porting the GC to D2 was more work than porting it to
  druntime specifically.  There are still a few hacks in place to work around
  const issues and I faked the PointerMap stuff, but the GC seems to work just
  great.  I'll have to find a reasonable performance test for comparison.  Oh,
  I'll send you the modified code as well.
 
 Are you going to integrate it in druntime? 

Dunno.  For now, I've just sent my modified copy to Leandro and the druntime 
list for people to play with.  It seems quite promising though.


Re: D Concurrent GC

2010-09-15 Thread dsimcha
== Quote from Sean Kelly (s...@invisibleduck.org)'s article
 Lutger Wrote:
  Sean Kelly wrote:
 
   Sean Kelly Wrote:
  
   Leandro Lucarella Wrote:
  
Not quite ready for prime-time yet, but I think it's in a stage when it
could be interesting anyway (at least for developers or people that 
want
to experiment):
http://llucax.com.ar/blog/blog/post/-1a4bdfba
  
   Nice work!  I've gotten this to compile as the GC for druntime using D2 
   but
   have ran into a snag.
  
   Okay, all fixed.  Porting the GC to D2 was more work than porting it to
   druntime specifically.  There are still a few hacks in place to work 
   around
   const issues and I faked the PointerMap stuff, but the GC seems to work 
   just
   great.  I'll have to find a reasonable performance test for comparison.  
   Oh,
   I'll send you the modified code as well.
 
  Are you going to integrate it in druntime?
 Dunno.  For now, I've just sent my modified copy to Leandro and the druntime
list for people to play with.  It seems quite promising though.

Been meaning to ask, what does this GC do with regard to precise scanning?  Is 
it
substantially less conservative than the current D2 GC?


Re: D Concurrent GC

2010-09-15 Thread Andrei Alexandrescu

On 09/15/2010 07:00 PM, Sean Kelly wrote:

Lutger Wrote:


Sean Kelly wrote:


Sean Kelly Wrote:


Leandro Lucarella Wrote:


Not quite ready for prime-time yet, but I think it's in a stage when it
could be interesting anyway (at least for developers or people that want
to experiment):
http://llucax.com.ar/blog/blog/post/-1a4bdfba


Nice work!  I've gotten this to compile as the GC for druntime using D2 but
have ran into a snag.


Okay, all fixed.  Porting the GC to D2 was more work than porting it to
druntime specifically.  There are still a few hacks in place to work around
const issues and I faked the PointerMap stuff, but the GC seems to work just
great.  I'll have to find a reasonable performance test for comparison.  Oh,
I'll send you the modified code as well.


Are you going to integrate it in druntime?


Dunno.  For now, I've just sent my modified copy to Leandro and the druntime 
list for people to play with.  It seems quite promising though.


There's a discussion on digitalmars.D about a D program being slower 
than the equivalent Python script because of the GC. Would be a good 
test bed!


Andrei


Re: D Concurrent GC

2010-09-14 Thread Sean Kelly
Leandro Lucarella Wrote:

 Not quite ready for prime-time yet, but I think it's in a stage when it
 could be interesting anyway (at least for developers or people that want
 to experiment):
 http://llucax.com.ar/blog/blog/post/-1a4bdfba

Nice work!  I've gotten this to compile as the GC for druntime using D2 but 
have ran into a snag.  I'm using OSX (ie. no usable debug info) but near as I 
can tell the issue is:

private T locked(T, alias Code)()
{
if (thread_needLock())
synchronized (gc.lock) return Code();
else
   return Code();
}

void* gc_malloc(size_t size, uint attrs = 0)
{
if (size == 0)
return null;
return locked!(void*, () {
assert (Invariant()); scope (exit) assert (Invariant());
return malloc(size, attrs, null);
})();
}

In the code above, it appears that the anonymous delegate being passed as an 
alias to locked() is having its stack data dynamically allocated (ie. as a 
dynamic closure).  For normal delegate calls this can be avoided by accepting 
scope delegate as the function parameter, but I haven't found an analog when 
using the alias approach.  Obviously, what happens is that a call to 
gc_malloc() ends up needing GCed memory, so gc_malloc() is recursively called, 
and on until the stack explodes.  I'll see if I can come up with a workaround 
that continues using the alias template parameter.


Re: D Concurrent GC

2010-09-10 Thread Leandro Lucarella
Bernard Helyer, el 10 de septiembre a las 04:49 me escribiste:
 Very nice. I've been reading your posts on this with interest.
 
 How much work would be involved in porting this to druntime?

Is hard to tell since I didn't followed druntime development very
closely lately, but both are based on the same code, so probably not too
much work should be involved.

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
In a world without fences, who need gates?


Re: D Concurrent GC

2010-09-09 Thread Bernard Helyer
Very nice. I've been reading your posts on this with interest.

How much work would be involved in porting this to druntime?