Re: State of and plans for the garbage collector
On Thu, 15 Jul 2010 00:18:38 -0700, Jonathan M Davis wrote: > Okay. I really don't know much about garbage collectors, how they work, > or what makes one particularly good or bad Me too. But its a central part of the D langauge, its design and runtime. Even though D programmmers can replace allocators and do manual memory management, I would like to know if anyone has tried to do this in a big way. Without GC, programming with the DPL would be a different beast. It would seem to me that type information, GC design, memory allocation schemes are by dependence necessarily integrated together to try to get efficient performance. There must be loads of academic research projects done on this already, that would show what the trade-offs are. The first thing I notice is that memory blocks allocated by GC are either marked as either needing to be checked or are not checked at all. As far as I know there is no futher refining meta data, for instance a list of 32/64 bit offsets which can be excluded or included for being pointers to data. Such a list would be of the form [offset-array length, exclusive/inclusive], offset1, offset2, offsetn. Its slower code to walk through such list than the method of the current GC, which from my remembrance of an earlier newsgroup post, just bit-ands every value to see if looks like a memory aligned address or not. Having the option of a more careful check may resolve the residual cases. There is a trade off between the amount of information available to the GC, and how effective it will be. More information also means a cost of attaching another pointer for the information to each memory block type, with implications on how the memory blocks might be pooled. I read that the .NET garbage collector uses much more complete metadata stored in the assembly to know what part of objects contain addresses. I am sure its a state of the art GC, but I know nothing of its guts. So the GC depends on how the D compiler may compile in a accessible memory descriptor it to each typeinfo, type allocation pool, and integrate this into the runtime allocation design and memory block scanning. This gives the GC an option of resolving the harder cases of false pointers for objects not released by the conservative scan which does not use type information. Without the type information, the GC and memory allocation experimenter is limited to a reduced subset of models and algorithms, and so the possibility of having a choice of something better than the Boehm conservative collector (I think it actually can use some form of meta data), is restricted. -->= -->=
Re: State of and plans for the garbage collector
== Quote from Leandro Lucarella (l...@llucax.com.ar)'s article > dsimcha, el 15 de julio a las 19:23 me escribiste: > > == Quote from Bane (branimir.milosavlje...@gmail.com)'s article > > > Anyway, I'm here bitching myself :) Just want to say that idea to have > > > more than > > one GC type to chose when compiling would be very interesting thing, if > > single > > implementation can't be good for all cases. > > > > If I had to chose one topic with most bitchin' on this newsgroup I have > > impression it would be the one about GC. They usually goes from 'GC managed > > programs are slow, D ain't good enough', to 'language X has better GC than > > D', to > > ' GC that D has is bad at Z'. > > > > > > > > Why not make D summer of code - write your own GC optimized for special > > > > case > > of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. > > That is > > only way to really compare what is best for special case. > > > > > > > > If/when we have enough manpower to write/maintain multiple GC's, here are > > some GC > > designs that can be either optimizations or pessimizations depending on use > > case: > > > > 1. Precise heap scanning (i.e. in the absence of copying/heap compaction). > > If > > you allocate mostly small objects, you're probably very seldom bitten by > > false > > pointers and the O(1) overhead per block needed to store type info may be > > significant. If you allocate mostly large objects, you've probably been > > eaten > > alive by false pointers and the O(1) per block overhead is negligible to > > you. > > > > 2. Concurrent collection. If you use threads for concurreny/latency > > reasons, > > this can be a boon. If you use threads for parallelism/throughput reasons, > > or > > write single-threaded code, this is a complete waste. > Not completely, if you have a multi-core, suddenly your program becomes > multi-threaded and uses more than 1 core. In that case, a concurrent GC > could improve the overall performance of the application. In theory. I thought that in practice concurrent GC's (I'm not talking about the case of thread-local heaps) are always slower throughput-wise than stop-the-world.
Re: State of and plans for the garbage collector
dsimcha, el 15 de julio a las 19:23 me escribiste: > == Quote from Bane (branimir.milosavlje...@gmail.com)'s article > > Anyway, I'm here bitching myself :) Just want to say that idea to have more > > than > one GC type to chose when compiling would be very interesting thing, if single > implementation can't be good for all cases. > > > If I had to chose one topic with most bitchin' on this newsgroup I have > impression it would be the one about GC. They usually goes from 'GC managed > programs are slow, D ain't good enough', to 'language X has better GC than > D', to > ' GC that D has is bad at Z'. > > > > > > Why not make D summer of code - write your own GC optimized for special > > > case > of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That > is > only way to really compare what is best for special case. > > > > > If/when we have enough manpower to write/maintain multiple GC's, here are > some GC > designs that can be either optimizations or pessimizations depending on use > case: > > 1. Precise heap scanning (i.e. in the absence of copying/heap compaction). > If > you allocate mostly small objects, you're probably very seldom bitten by false > pointers and the O(1) overhead per block needed to store type info may be > significant. If you allocate mostly large objects, you've probably been eaten > alive by false pointers and the O(1) per block overhead is negligible to you. > > 2. Concurrent collection. If you use threads for concurreny/latency reasons, > this can be a boon. If you use threads for parallelism/throughput reasons, or > write single-threaded code, this is a complete waste. Not completely, if you have a multi-core, suddenly your program becomes multi-threaded and uses more than 1 core. In that case, a concurrent GC could improve the overall performance of the application. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ -- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) -- - Tata Dios lo creó a usté solamente pa despertar al pueblo y fecundar las gayinas. - Otro constrasentido divino... Quieren que yo salga de joda con las hembras y después quieren que madrugue. -- Inodoro Pereyra y un gallo
Re: State of and plans for the garbage collector
Bane, el 15 de julio a las 14:34 me escribiste: > If I had to chose one topic with most bitchin' on this newsgroup > I have impression it would be the one about GC. They usually goes from > 'GC managed programs are slow, D ain't good enough', to 'language > X has better GC than D', to ' GC that D has is bad at Z'. > > Why not make D summer of code - write your own GC optimized for > special case of XYZ, send it, bundle all up in D with compiler switch > '--useGC=XYZ'. That is only way to really compare what is best for > special case. The GC I'm working on is configurable at startup (runtime) via environment variables. The idea is to be able to tune the GC for different programs without even recompiling. I already made the actual compile time options for memory stomping and sentinel configurable at runtime and it works great, there is no noticeable performance penalty when you don't use them either. I think this is really the way to go. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ -- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) -- Los pobres buscan su destino. Acá está; ¿no lo ven? -- Emilio Vaporeso. Marzo de 1914
Re: State of and plans for the garbage collector
== Quote from Bane (branimir.milosavlje...@gmail.com)'s article > Anyway, I'm here bitching myself :) Just want to say that idea to have more > than one GC type to chose when compiling would be very interesting thing, if single implementation can't be good for all cases. > > If I had to chose one topic with most bitchin' on this newsgroup I have impression it would be the one about GC. They usually goes from 'GC managed programs are slow, D ain't good enough', to 'language X has better GC than D', to ' GC that D has is bad at Z'. > > > > Why not make D summer of code - write your own GC optimized for special case of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That is only way to really compare what is best for special case. > > If/when we have enough manpower to write/maintain multiple GC's, here are some GC designs that can be either optimizations or pessimizations depending on use case: 1. Precise heap scanning (i.e. in the absence of copying/heap compaction). If you allocate mostly small objects, you're probably very seldom bitten by false pointers and the O(1) overhead per block needed to store type info may be significant. If you allocate mostly large objects, you've probably been eaten alive by false pointers and the O(1) per block overhead is negligible to you. 2. Concurrent collection. If you use threads for concurreny/latency reasons, this can be a boon. If you use threads for parallelism/throughput reasons, or write single-threaded code, this is a complete waste. 3. Thread local allocators. Quite simply, it's a space-speed tradeoff, and it depends which is more important to you.
Re: State of and plans for the garbage collector
On Thu, 15 Jul 2010 21:34:36 +0300, Bane wrote: Why not make D summer of code - write your own GC optimized for special case of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That is only way to really compare what is best for special case. In D1, the garbage collector is actually compiled to a stand-alone library, but then it's statically linked into Phobos. If you edit the Phobos makefile a bit, you should then be able to specify the GC .lib to link to on the compiler/linker command-line. In D2 it looks like the GC is clumped together with the runtime. -- Best regards, Vladimirmailto:vladi...@thecybershadow.net
Re: State of and plans for the garbage collector
Anyway, I'm here bitching myself :) Just want to say that idea to have more than one GC type to chose when compiling would be very interesting thing, if single implementation can't be good for all cases. > If I had to chose one topic with most bitchin' on this newsgroup I have > impression it would be the one about GC. They usually goes from 'GC managed > programs are slow, D ain't good enough', to 'language X has better GC than > D', to ' GC that D has is bad at Z'. > > Why not make D summer of code - write your own GC optimized for special case > of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That > is only way to really compare what is best for special case. >
Re: State of and plans for the garbage collector
If I had to chose one topic with most bitchin' on this newsgroup I have impression it would be the one about GC. They usually goes from 'GC managed programs are slow, D ain't good enough', to 'language X has better GC than D', to ' GC that D has is bad at Z'. Why not make D summer of code - write your own GC optimized for special case of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That is only way to really compare what is best for special case. > Okay. I really don't know much about garbage collectors, how they work, or > what > makes one particularly good or bad (other than the fact that it needs to be > efficient execution-wise and manage memory wisely so that you don't use too > much > of it or do anything else that would be an overall negative for performance). > However, from the comments here - both recent and in the past - it's pretty > clear that D's garbage collector is fairly outdated. I would assume that that > would be negative for performance - certainly it would mean that significant > improvements could be made. > > So, my question is this: what are the plans for the garbage collector? Is the > intention to continue to improve it bit by bit, to give it a major overhaul > at > some point, to outright replace it at a later date, or something else > entirely? > > If D is going to compete with C and C++, it needs to be highly efficient, and > if > the garbage collector isn't up to snuff, that's going to be a big problem. > I'm > not looking to complain about the current garbage collector - I really don't > know how good or bad it is - but if it is rather poor (as I've gotten the > impresison that it is - at least in some respects - from various discussions > on > it here), then I'd assume that it needs a major overhaul or replacement at > some > point. So, are there any specific plans with regards to that, or is that just > something that may be considered in the future? > > - Jonathan M Davis
Re: State of and plans for the garbage collector
On Thu, 15 Jul 2010 04:28:43 -0400, Vladimir Panteleev wrote: On Thu, 15 Jul 2010 10:18:38 +0300, Jonathan M Davis wrote: Okay. I really don't know much about garbage collectors, how they work, or what makes one particularly good or bad (other than the fact that it needs to be efficient execution-wise and manage memory wisely so that you don't use too much of it or do anything else that would be an overall negative for performance). However, from the comments here - both recent and in the past - it's pretty clear that D's garbage collector is fairly outdated. I would assume that that would be negative for performance - certainly it would mean that significant improvements could be made. IMO the D GC isn't bad, it's just mediocre. (It could have been way worse.) I would like to use this opportunity to bring up a GC implementation written by Jeremie Pelletier, which seems to have gone mostly unnoticed when it was posted as a reply to D.announce: http://pastebin.com/f7a3b4c4a Aside from multiple optimizations across the board, this GC employs an interesting different strategy. The gist of it is that it iteratively destroys only objects that have no immediate references. In the case of long linked lists, this trades destruction complexity with scan complexity, which is a very good change - most times deeply-nested structures such as linked lists survive multiple generational cycles. Wouldn't that mean it can't handle cycles? Jeremie, if you're reading this: how goes your D2 runtime project? (I also have an unfinished generational GC lying around, which is still unknown if it's viable performance-wise - I should really try to finish it one day.)
Re: State of and plans for the garbage collector
Jonathan M Davis, el 15 de julio a las 00:18 me escribiste: > Okay. I really don't know much about garbage collectors, how they work, or > what > makes one particularly good or bad (other than the fact that it needs to be > efficient execution-wise and manage memory wisely so that you don't use too > much > of it or do anything else that would be an overall negative for performance). > However, from the comments here - both recent and in the past - it's pretty > clear that D's garbage collector is fairly outdated. I would assume that that > would be negative for performance - certainly it would mean that significant > improvements could be made. > > So, my question is this: what are the plans for the garbage collector? Is the > intention to continue to improve it bit by bit, to give it a major overhaul > at > some point, to outright replace it at a later date, or something else > entirely? > > If D is going to compete with C and C++, it needs to be highly efficient, and > if > the garbage collector isn't up to snuff, that's going to be a big problem. > I'm > not looking to complain about the current garbage collector - I really don't > know how good or bad it is - but if it is rather poor (as I've gotten the > impresison that it is - at least in some respects - from various discussions > on > it here), then I'd assume that it needs a major overhaul or replacement at > some > point. So, are there any specific plans with regards to that, or is that just > something that may be considered in the future? I'm working on a concurrent GC, things are going really slow, but I plan to give it more attention this months, and I hope it will be finished (finished in my own terms, as it is my thesis) by the end of the year. I would like to explore merging the precise scanning patch and some other optimizations, collection strategies, but I'm not sure I will have the time (probably not). I'm not sure how it would turn out, even when I'm doing regular benchmarks to ensure, at least, that the current performance is not degraded (I didn't started doing the collection concurrently). One more note: I'm working with D1, but using the Tango runtime, so I guess it should be not to hard to port to D2. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ -- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) -- Every day 21 new born babies will be given to the wrong parents
Re: State of and plans for the garbage collector
On Thu, 15 Jul 2010 10:18:38 +0300, Jonathan M Davis wrote: Okay. I really don't know much about garbage collectors, how they work, or what makes one particularly good or bad (other than the fact that it needs to be efficient execution-wise and manage memory wisely so that you don't use too much of it or do anything else that would be an overall negative for performance). However, from the comments here - both recent and in the past - it's pretty clear that D's garbage collector is fairly outdated. I would assume that that would be negative for performance - certainly it would mean that significant improvements could be made. IMO the D GC isn't bad, it's just mediocre. (It could have been way worse.) I would like to use this opportunity to bring up a GC implementation written by Jeremie Pelletier, which seems to have gone mostly unnoticed when it was posted as a reply to D.announce: http://pastebin.com/f7a3b4c4a Aside from multiple optimizations across the board, this GC employs an interesting different strategy. The gist of it is that it iteratively destroys only objects that have no immediate references. In the case of long linked lists, this trades destruction complexity with scan complexity, which is a very good change - most times deeply-nested structures such as linked lists survive multiple generational cycles. Jeremie, if you're reading this: how goes your D2 runtime project? (I also have an unfinished generational GC lying around, which is still unknown if it's viable performance-wise - I should really try to finish it one day.) -- Best regards, Vladimirmailto:vladi...@thecybershadow.net