Re: D Concurrent GC
Leandro Lucarella, el 8 de octubre a las 01:44 me escribiste: Denis Koroskin, el 8 de octubre a las 05:14 me escribiste: I tried using your GC under D2/Windows, and unfortunately it crashes with Access Violation (I used a version modified by Sean as a starting point with little changes to fix compilation error). Are there any plans supporting it? I'm willing to help with testing it as much as I can. As I told to Sean, I tried my best to not break Windows support very much, but I don't have a Windows environment to test with (and to very interested on creating one to be honest, sorry :S). But I am very interested on accepting patches to make Windows work (and improvements). Testing the GC on Windows might not be very tempting though, as the main improvement is based on the fork(2) system call, which is not available on Windows, so all the fun options will be disabled there. The only thing that might worth trying in Windows is the precise scanning (well, there are some other minor optimizations that proved useful). BTW, I've put up a small (Linux/POSIX) HOWTO to easily try CDGC. People interested on trying CDGC might find it useful: http://llucax.com.ar/blog/blog/post/-4d1fefb4 -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ -- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) -- Cuando el Mártir estaba siendo perseguido y aglutinado por los citronetos, aquellos perversos que pretendian, en su maldad, piononizar las enseñanzas de Peperino. -- Peperino Pómoro
Re: D Concurrent GC
Leandro Lucarella, el 10 de septiembre a las 09:26 me escribiste: Bernard Helyer, el 10 de septiembre a las 04:49 me escribiste: Very nice. I've been reading your posts on this with interest. How much work would be involved in porting this to druntime? Is hard to tell since I didn't followed druntime development very closely lately, but both are based on the same code, so probably not too much work should be involved. BTW, I just saw there were quite a few replies to this threads while my internet connection was down for a couple of weeks (just after I posted this), that I didn't received. I'm sorry about the silence, I'll try to reply the things that might still be of interest, and I'll take this opportunity to say that CDGC is finished, see the blog post for (a few) more details: http://llucax.com.ar/blog/blog/post/-7454ab1d Now the replies. David Simcha asked: Been meaning to ask, what does this GC do with regard to precise scanning? Is it substantially less conservative than the current D2 GC? Yes, if the compiler support is there. It uses the patch to DMD posted by wm4 to scan the heap precisely. See bug 3463 for details: http://d.puremagic.com/issues/show_bug.cgi?id=3463 Andrei Alexandrescu said: There's a discussion on digitalmars.D about a D program being slower than the equivalent Python script because of the GC. Would be a good test bed! http://rounin.livejournal.com/21815.html I'll try it when I have some time. I wouldn't expect much improvement because it only uses AAs that doesn't get scanned precisely yet (IIRC from wm4 patch), but there are a few optimizations that could help with dynamic arrays (which are also used frequently). We'll see :) OTOH, Dil does a lot of string processing using too AAs and dynamic arrays (AFAIK) and the improvement is huge using CGGC :). As a sample, attached are the results for a Dil run generating all the Tango docs, using 1 CPU (using more CPUs is a little faster), for the Tango basic GC (TBGC) and several configurations of the CDGC. The file named time measures the total run time (i.e. the throughput), the one named stw measures the maximum stop-the-world pause (i.e. the time no thread can run) and the file named pause measures the maximum time the GC will actually block the program (i.e. the maximum real latency). All in seconds, for 50 runs (the run time) and 20 runs (the pause time), minimum is in black, maximum in white, and the mean is centered between 2 std deviations (in grey). -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ -- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) -- Hey you, out there on your own Sitting naked by the phone Would you touch me? pause-dil-1cpu.pdf Description: Adobe PDF document stw-dil-1cpu.pdf Description: Adobe PDF document time-dil-1cpu.pdf Description: Adobe PDF document
Re: D Concurrent GC
On Fri, 08 Oct 2010 04:25:30 +0400, Leandro Lucarella l...@llucax.com.ar wrote: Leandro Lucarella, el 10 de septiembre a las 09:26 me escribiste: Bernard Helyer, el 10 de septiembre a las 04:49 me escribiste: Very nice. I've been reading your posts on this with interest. How much work would be involved in porting this to druntime? Is hard to tell since I didn't followed druntime development very closely lately, but both are based on the same code, so probably not too much work should be involved. BTW, I just saw there were quite a few replies to this threads while my internet connection was down for a couple of weeks (just after I posted this), that I didn't received. I'm sorry about the silence, I'll try to reply the things that might still be of interest, and I'll take this opportunity to say that CDGC is finished, see the blog post for (a few) more details: http://llucax.com.ar/blog/blog/post/-7454ab1d Now the replies. David Simcha asked: Been meaning to ask, what does this GC do with regard to precise scanning? Is it substantially less conservative than the current D2 GC? Yes, if the compiler support is there. It uses the patch to DMD posted by wm4 to scan the heap precisely. See bug 3463 for details: http://d.puremagic.com/issues/show_bug.cgi?id=3463 Andrei Alexandrescu said: There's a discussion on digitalmars.D about a D program being slower than the equivalent Python script because of the GC. Would be a good test bed! http://rounin.livejournal.com/21815.html I'll try it when I have some time. I wouldn't expect much improvement because it only uses AAs that doesn't get scanned precisely yet (IIRC from wm4 patch), but there are a few optimizations that could help with dynamic arrays (which are also used frequently). We'll see :) OTOH, Dil does a lot of string processing using too AAs and dynamic arrays (AFAIK) and the improvement is huge using CGGC :). As a sample, attached are the results for a Dil run generating all the Tango docs, using 1 CPU (using more CPUs is a little faster), for the Tango basic GC (TBGC) and several configurations of the CDGC. The file named time measures the total run time (i.e. the throughput), the one named stw measures the maximum stop-the-world pause (i.e. the time no thread can run) and the file named pause measures the maximum time the GC will actually block the program (i.e. the maximum real latency). All in seconds, for 50 runs (the run time) and 20 runs (the pause time), minimum is in black, maximum in white, and the mean is centered between 2 std deviations (in grey). Hi, Leandro! I tried using your GC under D2/Windows, and unfortunately it crashes with Access Violation (I used a version modified by Sean as a starting point with little changes to fix compilation error). Are there any plans supporting it? I'm willing to help with testing it as much as I can. Thanks!
Re: D Concurrent GC
Denis Koroskin, el 8 de octubre a las 05:14 me escribiste: I tried using your GC under D2/Windows, and unfortunately it crashes with Access Violation (I used a version modified by Sean as a starting point with little changes to fix compilation error). Are there any plans supporting it? I'm willing to help with testing it as much as I can. As I told to Sean, I tried my best to not break Windows support very much, but I don't have a Windows environment to test with (and to very interested on creating one to be honest, sorry :S). But I am very interested on accepting patches to make Windows work (and improvements). Testing the GC on Windows might not be very tempting though, as the main improvement is based on the fork(2) system call, which is not available on Windows, so all the fun options will be disabled there. The only thing that might worth trying in Windows is the precise scanning (well, there are some other minor optimizations that proved useful). -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ -- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) -- Yo soy Peperino, mártir latino, venid al asado pero traed el vino. -- Peperino Pómoro
Re: D Concurrent GC
On 9/15/10 22:22 CDT, Andrei Alexandrescu wrote: On 09/15/2010 07:00 PM, Sean Kelly wrote: Lutger Wrote: Sean Kelly wrote: Sean Kelly Wrote: Leandro Lucarella Wrote: Not quite ready for prime-time yet, but I think it's in a stage when it could be interesting anyway (at least for developers or people that want to experiment): http://llucax.com.ar/blog/blog/post/-1a4bdfba Nice work! I've gotten this to compile as the GC for druntime using D2 but have ran into a snag. Okay, all fixed. Porting the GC to D2 was more work than porting it to druntime specifically. There are still a few hacks in place to work around const issues and I faked the PointerMap stuff, but the GC seems to work just great. I'll have to find a reasonable performance test for comparison. Oh, I'll send you the modified code as well. Are you going to integrate it in druntime? Dunno. For now, I've just sent my modified copy to Leandro and the druntime list for people to play with. It seems quite promising though. There's a discussion on digitalmars.D about a D program being slower than the equivalent Python script because of the GC. Would be a good test bed! This is what I was referring to. http://rounin.livejournal.com/21815.html The author makes quite a few good points comparing D and Python in expressiveness and performance. Andrei
Re: D Concurrent GC
Andrei Alexandrescu Wrote: There's a discussion on digitalmars.D about a D program being slower than the equivalent Python script because of the GC. Would be a good test bed! I was playing with that yesterday, but there was too much setup required for a quick test (it compared an MD5 has of a directory tree to the current directory tree). I'll revisit it though, since it seems a good starting point.
Re: D Concurrent GC
dsimcha Wrote: == Quote from Sean Kelly (s...@invisibleduck.org)'s article Dunno. For now, I've just sent my modified copy to Leandro and the druntime list for people to play with. It seems quite promising though. Been meaning to ask, what does this GC do with regard to precise scanning? Is it substantially less conservative than the current D2 GC? Its gc_malloc, etc, accept a PointerMap type which I assume is used for precise scanning, but I couldn't find a definition of PointerMap in the GC code or other obvious locations so I disabled it for now. I assume it's from the precise scanning patch and that I can get whatever else is needed from there. For now, I just wanted a functional port to test.
Re: D Concurrent GC
On Tue, 14 Sep 2010 19:09:18 -0400, Sean Kelly s...@invisibleduck.org wrote: Leandro Lucarella Wrote: Not quite ready for prime-time yet, but I think it's in a stage when it could be interesting anyway (at least for developers or people that want to experiment): http://llucax.com.ar/blog/blog/post/-1a4bdfba Nice work! I've gotten this to compile as the GC for druntime using D2 but have ran into a snag. I'm using OSX (ie. no usable debug info) but near as I can tell the issue is: private T locked(T, alias Code)() { if (thread_needLock()) synchronized (gc.lock) return Code(); else return Code(); } void* gc_malloc(size_t size, uint attrs = 0) { if (size == 0) return null; return locked!(void*, () { assert (Invariant()); scope (exit) assert (Invariant()); return malloc(size, attrs, null); })(); } In the code above, it appears that the anonymous delegate being passed as an alias to locked() is having its stack data dynamically allocated (ie. as a dynamic closure). For normal delegate calls this can be avoided by accepting scope delegate as the function parameter, but I haven't found an analog when using the alias approach. Obviously, what happens is that a call to gc_malloc() ends up needing GCed memory, so gc_malloc() is recursively called, and on until the stack explodes. I'll see if I can come up with a workaround that continues using the alias template parameter. What if you passed it through a scope delegate function? i.e. void *lockedproxy(scope void* delegate() dg) { return locked!(void *, dg); } -Steve
Re: D Concurrent GC
Steven Schveighoffer schvei...@yahoo.com wrote: On Tue, 14 Sep 2010 19:09:18 -0400, Sean Kelly s...@invisibleduck.org wrote: Leandro Lucarella Wrote: Not quite ready for prime-time yet, but I think it's in a stage when it could be interesting anyway (at least for developers or people that want to experiment): http://llucax.com.ar/blog/blog/post/-1a4bdfba Nice work! I've gotten this to compile as the GC for druntime using D2 but have ran into a snag. I'm using OSX (ie. no usable debug info) but near as I can tell the issue is: private T locked(T, alias Code)() { if (thread_needLock()) synchronized (gc.lock) return Code(); else return Code(); } void* gc_malloc(size_t size, uint attrs = 0) { if (size == 0) return null; return locked!(void*, () { assert (Invariant()); scope (exit) assert (Invariant()); return malloc(size, attrs, null); })(); } In the code above, it appears that the anonymous delegate being passed as an alias to locked() is having its stack data dynamically allocated (ie. as a dynamic closure). For normal delegate calls this can be avoided by accepting scope delegate as the function parameter, but I haven't found an analog when using the alias approach. Obviously, what happens is that a call to gc_malloc() ends up needing GCed memory, so gc_malloc() is recursively called, and on until the stack explodes.I'll see if I can come up with a workaround that continues using the alias template parameter. What if you passed it through a scope delegate function? i.e. void *lockedproxy(scope void* delegate() dg) { return locked!(void *, dg); } I could simply change it to a function that accepts a scope delegate. It just isn't as efficient as the alias approach. But an indirect call is pretty small potatoes in this case anyway. I tried this and it worked, and the I ran into an issue where the object used for the lock (classinfo for GCLock) was apparently null. I'll look into that today.
Re: D Concurrent GC
Sean Kelly Wrote: Leandro Lucarella Wrote: Not quite ready for prime-time yet, but I think it's in a stage when it could be interesting anyway (at least for developers or people that want to experiment): http://llucax.com.ar/blog/blog/post/-1a4bdfba Nice work! I've gotten this to compile as the GC for druntime using D2 but have ran into a snag. Okay, all fixed. Porting the GC to D2 was more work than porting it to druntime specifically. There are still a few hacks in place to work around const issues and I faked the PointerMap stuff, but the GC seems to work just great. I'll have to find a reasonable performance test for comparison. Oh, I'll send you the modified code as well.
Re: D Concurrent GC
Sean Kelly wrote: Sean Kelly Wrote: Leandro Lucarella Wrote: Not quite ready for prime-time yet, but I think it's in a stage when it could be interesting anyway (at least for developers or people that want to experiment): http://llucax.com.ar/blog/blog/post/-1a4bdfba Nice work! I've gotten this to compile as the GC for druntime using D2 but have ran into a snag. Okay, all fixed. Porting the GC to D2 was more work than porting it to druntime specifically. There are still a few hacks in place to work around const issues and I faked the PointerMap stuff, but the GC seems to work just great. I'll have to find a reasonable performance test for comparison. Oh, I'll send you the modified code as well. Are you going to integrate it in druntime?
Re: D Concurrent GC
Lutger Wrote: Sean Kelly wrote: Sean Kelly Wrote: Leandro Lucarella Wrote: Not quite ready for prime-time yet, but I think it's in a stage when it could be interesting anyway (at least for developers or people that want to experiment): http://llucax.com.ar/blog/blog/post/-1a4bdfba Nice work! I've gotten this to compile as the GC for druntime using D2 but have ran into a snag. Okay, all fixed. Porting the GC to D2 was more work than porting it to druntime specifically. There are still a few hacks in place to work around const issues and I faked the PointerMap stuff, but the GC seems to work just great. I'll have to find a reasonable performance test for comparison. Oh, I'll send you the modified code as well. Are you going to integrate it in druntime? Dunno. For now, I've just sent my modified copy to Leandro and the druntime list for people to play with. It seems quite promising though.
Re: D Concurrent GC
== Quote from Sean Kelly (s...@invisibleduck.org)'s article Lutger Wrote: Sean Kelly wrote: Sean Kelly Wrote: Leandro Lucarella Wrote: Not quite ready for prime-time yet, but I think it's in a stage when it could be interesting anyway (at least for developers or people that want to experiment): http://llucax.com.ar/blog/blog/post/-1a4bdfba Nice work! I've gotten this to compile as the GC for druntime using D2 but have ran into a snag. Okay, all fixed. Porting the GC to D2 was more work than porting it to druntime specifically. There are still a few hacks in place to work around const issues and I faked the PointerMap stuff, but the GC seems to work just great. I'll have to find a reasonable performance test for comparison. Oh, I'll send you the modified code as well. Are you going to integrate it in druntime? Dunno. For now, I've just sent my modified copy to Leandro and the druntime list for people to play with. It seems quite promising though. Been meaning to ask, what does this GC do with regard to precise scanning? Is it substantially less conservative than the current D2 GC?
Re: D Concurrent GC
On 09/15/2010 07:00 PM, Sean Kelly wrote: Lutger Wrote: Sean Kelly wrote: Sean Kelly Wrote: Leandro Lucarella Wrote: Not quite ready for prime-time yet, but I think it's in a stage when it could be interesting anyway (at least for developers or people that want to experiment): http://llucax.com.ar/blog/blog/post/-1a4bdfba Nice work! I've gotten this to compile as the GC for druntime using D2 but have ran into a snag. Okay, all fixed. Porting the GC to D2 was more work than porting it to druntime specifically. There are still a few hacks in place to work around const issues and I faked the PointerMap stuff, but the GC seems to work just great. I'll have to find a reasonable performance test for comparison. Oh, I'll send you the modified code as well. Are you going to integrate it in druntime? Dunno. For now, I've just sent my modified copy to Leandro and the druntime list for people to play with. It seems quite promising though. There's a discussion on digitalmars.D about a D program being slower than the equivalent Python script because of the GC. Would be a good test bed! Andrei
Re: D Concurrent GC
Leandro Lucarella Wrote: Not quite ready for prime-time yet, but I think it's in a stage when it could be interesting anyway (at least for developers or people that want to experiment): http://llucax.com.ar/blog/blog/post/-1a4bdfba Nice work! I've gotten this to compile as the GC for druntime using D2 but have ran into a snag. I'm using OSX (ie. no usable debug info) but near as I can tell the issue is: private T locked(T, alias Code)() { if (thread_needLock()) synchronized (gc.lock) return Code(); else return Code(); } void* gc_malloc(size_t size, uint attrs = 0) { if (size == 0) return null; return locked!(void*, () { assert (Invariant()); scope (exit) assert (Invariant()); return malloc(size, attrs, null); })(); } In the code above, it appears that the anonymous delegate being passed as an alias to locked() is having its stack data dynamically allocated (ie. as a dynamic closure). For normal delegate calls this can be avoided by accepting scope delegate as the function parameter, but I haven't found an analog when using the alias approach. Obviously, what happens is that a call to gc_malloc() ends up needing GCed memory, so gc_malloc() is recursively called, and on until the stack explodes. I'll see if I can come up with a workaround that continues using the alias template parameter.
Re: D Concurrent GC
Bernard Helyer, el 10 de septiembre a las 04:49 me escribiste: Very nice. I've been reading your posts on this with interest. How much work would be involved in porting this to druntime? Is hard to tell since I didn't followed druntime development very closely lately, but both are based on the same code, so probably not too much work should be involved. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ -- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) -- In a world without fences, who need gates?
Re: D Concurrent GC
Very nice. I've been reading your posts on this with interest. How much work would be involved in porting this to druntime?