Re: State of and plans for the garbage collector

2010-07-18 Thread Michael Rynn
On Thu, 15 Jul 2010 00:18:38 -0700, Jonathan M Davis wrote:

> Okay. I really don't know much about garbage collectors, how they work,
> or what makes one particularly good or bad

Me too.

But its a central part of the D langauge, its design and runtime.
Even though D programmmers can replace allocators and do manual memory 
management, I would like to know if anyone has tried to do this in a big 
way.

Without GC, programming with the DPL would be a different beast.

It would seem to me that type information, GC design, memory allocation 
schemes are by dependence necessarily integrated together to try to get 
efficient performance. There must be loads of academic research projects 
done on this already, that would show what the trade-offs are.

The first thing I notice is that memory blocks allocated by GC are
either marked as either needing to be checked or are not checked at all.  
As far as I know there is no futher refining meta data, for instance a 
list of 32/64 bit offsets which can be excluded or included for being 
pointers to data.  Such a list would be of the form

[offset-array length, exclusive/inclusive], offset1, offset2, offsetn.


Its slower code to walk through such list than the method of the current 
GC, which from my remembrance of an earlier newsgroup post, just bit-ands 
every value to see if looks like a memory aligned address or not.

Having the option of a more careful check may resolve the residual cases.

There is a trade off between the amount of information available to the 
GC, and how effective it will be.  More information also means a cost of 
attaching another pointer for the information to each memory block type,
with implications on how the memory blocks might be pooled.

I read that the .NET garbage collector uses much more complete metadata 
stored in the assembly to know what part of objects contain addresses.
I am sure its a state of the art GC, but I know nothing of its guts.

So the GC depends on how the D compiler may compile in a accessible 
memory descriptor  it to each typeinfo, type allocation pool, and 
integrate this into the runtime allocation design and memory block 
scanning.

This gives the GC an option of resolving the harder cases of false 
pointers for objects not released by the conservative scan which does not 
use type information. 

Without the type information, the GC and memory allocation experimenter 
is limited to a reduced subset of models and algorithms, and so the 
possibility of having a choice of something better than the Boehm 
conservative collector (I think it actually can use some form of meta 
data), is restricted.

-->= -->=









 


Re: State of and plans for the garbage collector

2010-07-15 Thread dsimcha
== Quote from Leandro Lucarella (l...@llucax.com.ar)'s article
> dsimcha, el 15 de julio a las 19:23 me escribiste:
> > == Quote from Bane (branimir.milosavlje...@gmail.com)'s article
> > > Anyway, I'm here bitching myself :) Just want to say that idea to have 
> > > more than
> > one GC type to chose when compiling would be very interesting thing, if 
> > single
> > implementation can't be good for all cases.
> > > > If I had to chose one topic with most bitchin' on this newsgroup I have
> > impression it would be the one about GC. They usually goes from 'GC managed
> > programs are slow, D ain't good enough', to 'language X has better GC than 
> > D', to
> > ' GC that D has is bad at Z'.
> > > >
> > > > Why not make D summer of code - write your own GC optimized for special 
> > > > case
> > of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. 
> > That is
> > only way to really compare what is best for special case.
> > > >
> >
> > If/when we have enough manpower to write/maintain multiple GC's, here are 
> > some GC
> > designs that can be either optimizations or pessimizations depending on use 
> > case:
> >
> > 1.  Precise heap scanning (i.e. in the absence of copying/heap compaction). 
> >  If
> > you allocate mostly small objects, you're probably very seldom bitten by 
> > false
> > pointers and the O(1) overhead per block needed to store type info may be
> > significant.  If you allocate mostly large objects, you've probably been 
> > eaten
> > alive by false pointers and the O(1) per block overhead is negligible to 
> > you.
> >
> > 2.  Concurrent collection.  If you use threads for concurreny/latency 
> > reasons,
> > this can be a boon.  If you use threads for parallelism/throughput reasons, 
> > or
> > write single-threaded code, this is a complete waste.
> Not completely, if you have a multi-core, suddenly your program becomes
> multi-threaded and uses more than 1 core. In that case, a concurrent GC
> could improve the overall performance of the application.

In theory.  I thought that in practice concurrent GC's (I'm not talking about 
the
case of thread-local heaps) are always slower throughput-wise than 
stop-the-world.


Re: State of and plans for the garbage collector

2010-07-15 Thread Leandro Lucarella
dsimcha, el 15 de julio a las 19:23 me escribiste:
> == Quote from Bane (branimir.milosavlje...@gmail.com)'s article
> > Anyway, I'm here bitching myself :) Just want to say that idea to have more 
> > than
> one GC type to chose when compiling would be very interesting thing, if single
> implementation can't be good for all cases.
> > > If I had to chose one topic with most bitchin' on this newsgroup I have
> impression it would be the one about GC. They usually goes from 'GC managed
> programs are slow, D ain't good enough', to 'language X has better GC than 
> D', to
> ' GC that D has is bad at Z'.
> > >
> > > Why not make D summer of code - write your own GC optimized for special 
> > > case
> of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That 
> is
> only way to really compare what is best for special case.
> > >
> 
> If/when we have enough manpower to write/maintain multiple GC's, here are 
> some GC
> designs that can be either optimizations or pessimizations depending on use 
> case:
> 
> 1.  Precise heap scanning (i.e. in the absence of copying/heap compaction).  
> If
> you allocate mostly small objects, you're probably very seldom bitten by false
> pointers and the O(1) overhead per block needed to store type info may be
> significant.  If you allocate mostly large objects, you've probably been eaten
> alive by false pointers and the O(1) per block overhead is negligible to you.
> 
> 2.  Concurrent collection.  If you use threads for concurreny/latency reasons,
> this can be a boon.  If you use threads for parallelism/throughput reasons, or
> write single-threaded code, this is a complete waste.

Not completely, if you have a multi-core, suddenly your program becomes
multi-threaded and uses more than 1 core. In that case, a concurrent GC
could improve the overall performance of the application.

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
- Tata Dios lo creó a usté solamente pa despertar al pueblo y fecundar
  las gayinas.
- Otro constrasentido divino... Quieren que yo salga de joda con las
  hembras y después quieren que madrugue.
-- Inodoro Pereyra y un gallo


Re: State of and plans for the garbage collector

2010-07-15 Thread Leandro Lucarella
Bane, el 15 de julio a las 14:34 me escribiste:
> If I had to chose one topic with most bitchin' on this newsgroup
> I have impression it would be the one about GC. They usually goes from
> 'GC managed programs are slow, D ain't good enough', to 'language
> X has better GC than D', to ' GC that D has is bad at Z'.
> 
> Why not make D summer of code - write your own GC optimized for
> special case of XYZ, send it, bundle all up in D with compiler switch
> '--useGC=XYZ'. That is only way to really compare what is best for
> special case.

The GC I'm working on is configurable at startup (runtime) via
environment variables. The idea is to be able to tune the GC for
different programs without even recompiling.

I already made the actual compile time options for memory stomping and
sentinel configurable at runtime and it works great, there is no
noticeable performance penalty when you don't use them either.

I think this is really the way to go.

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
Los pobres buscan su destino. Acá está; ¿no lo ven?
-- Emilio Vaporeso. Marzo de 1914


Re: State of and plans for the garbage collector

2010-07-15 Thread dsimcha
== Quote from Bane (branimir.milosavlje...@gmail.com)'s article
> Anyway, I'm here bitching myself :) Just want to say that idea to have more 
> than
one GC type to chose when compiling would be very interesting thing, if single
implementation can't be good for all cases.
> > If I had to chose one topic with most bitchin' on this newsgroup I have
impression it would be the one about GC. They usually goes from 'GC managed
programs are slow, D ain't good enough', to 'language X has better GC than D', 
to
' GC that D has is bad at Z'.
> >
> > Why not make D summer of code - write your own GC optimized for special case
of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That is
only way to really compare what is best for special case.
> >

If/when we have enough manpower to write/maintain multiple GC's, here are some 
GC
designs that can be either optimizations or pessimizations depending on use 
case:

1.  Precise heap scanning (i.e. in the absence of copying/heap compaction).  If
you allocate mostly small objects, you're probably very seldom bitten by false
pointers and the O(1) overhead per block needed to store type info may be
significant.  If you allocate mostly large objects, you've probably been eaten
alive by false pointers and the O(1) per block overhead is negligible to you.

2.  Concurrent collection.  If you use threads for concurreny/latency reasons,
this can be a boon.  If you use threads for parallelism/throughput reasons, or
write single-threaded code, this is a complete waste.

3.  Thread local allocators.  Quite simply, it's a space-speed tradeoff, and it
depends which is more important to you.


Re: State of and plans for the garbage collector

2010-07-15 Thread Vladimir Panteleev
On Thu, 15 Jul 2010 21:34:36 +0300, Bane  
 wrote:


Why not make D summer of code - write your own GC optimized for special  
case of XYZ, send it, bundle all up in D with compiler switch  
'--useGC=XYZ'. That is only way to really compare what is best for  
special case.


In D1, the garbage collector is actually compiled to a stand-alone  
library, but then it's statically linked into Phobos. If you edit the  
Phobos makefile a bit, you should then be able to specify the GC .lib to  
link to on the compiler/linker command-line.


In D2 it looks like the GC is clumped together with the runtime.

--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: State of and plans for the garbage collector

2010-07-15 Thread Bane
Anyway, I'm here bitching myself :) Just want to say that idea to have more 
than one GC type to chose when compiling would be very interesting thing, if 
single implementation can't be good for all cases.

> If I had to chose one topic with most bitchin' on this newsgroup I have 
> impression it would be the one about GC. They usually goes from 'GC managed 
> programs are slow, D ain't good enough', to 'language X has better GC than 
> D', to ' GC that D has is bad at Z'.
> 
> Why not make D summer of code - write your own GC optimized for special case 
> of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That 
> is only way to really compare what is best for special case.
> 


Re: State of and plans for the garbage collector

2010-07-15 Thread Bane
If I had to chose one topic with most bitchin' on this newsgroup I have 
impression it would be the one about GC. They usually goes from 'GC managed 
programs are slow, D ain't good enough', to 'language X has better GC than D', 
to ' GC that D has is bad at Z'.

Why not make D summer of code - write your own GC optimized for special case of 
XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That is 
only way to really compare what is best for special case.

> Okay. I really don't know much about garbage collectors, how they work, or 
> what 
> makes one particularly good or bad (other than the fact that it needs to be 
> efficient execution-wise and manage memory wisely so that you don't use too 
> much 
> of it or do anything else that would be an overall negative for performance). 
> However, from the comments here - both recent and in the past - it's pretty 
> clear that D's garbage collector is fairly outdated. I would assume that that 
> would be negative for performance - certainly it would mean that significant 
> improvements could be made.
> 
> So, my question is this: what are the plans for the garbage collector? Is the 
> intention to continue to improve it bit by bit, to give it a major overhaul 
> at 
> some point, to outright replace it at a later date, or something else 
> entirely?
> 
> If D is going to compete with C and C++, it needs to be highly efficient, and 
> if 
> the garbage collector isn't up to snuff, that's going to be a big problem. 
> I'm 
> not looking to complain about the current garbage collector - I really don't 
> know how good or bad it is - but if it is rather poor (as I've gotten the 
> impresison that it is - at least in some respects - from various discussions 
> on 
> it here), then I'd assume that it needs a major overhaul or replacement at 
> some 
> point. So, are there any specific plans with regards to that, or is that just 
> something that may be considered in the future?
> 
> - Jonathan M Davis



Re: State of and plans for the garbage collector

2010-07-15 Thread Robert Jacques
On Thu, 15 Jul 2010 04:28:43 -0400, Vladimir Panteleev  
 wrote:


On Thu, 15 Jul 2010 10:18:38 +0300, Jonathan M Davis  
 wrote:


Okay. I really don't know much about garbage collectors, how they work,  
or what makes one particularly good or bad (other than the fact that it  
needs to be efficient execution-wise and manage memory wisely so that  
you don't use too much of it or do anything else that would be an  
overall negative for performance). However, from the comments here -  
both recent and in the past - it's pretty clear that D's garbage  
collector is fairly outdated. I would assume that that would be  
negative for performance - certainly it would mean that significant  
improvements could be made.


IMO the D GC isn't bad, it's just mediocre. (It could have been way  
worse.)


I would like to use this opportunity to bring up a GC implementation  
written by Jeremie Pelletier, which seems to have gone mostly unnoticed  
when it was posted as a reply to D.announce:

http://pastebin.com/f7a3b4c4a

Aside from multiple optimizations across the board, this GC employs an  
interesting different strategy. The gist of it is that it iteratively  
destroys only objects that have no immediate references. In the case of  
long linked lists, this trades destruction complexity with scan  
complexity, which is a very good change - most times deeply-nested  
structures such as linked lists survive multiple generational cycles.


Wouldn't that mean it can't handle cycles?


Jeremie, if you're reading this: how goes your D2 runtime project?

(I also have an unfinished generational GC lying around, which is still  
unknown if it's viable performance-wise - I should really try to finish  
it one day.)


Re: State of and plans for the garbage collector

2010-07-15 Thread Leandro Lucarella
Jonathan M Davis, el 15 de julio a las 00:18 me escribiste:
> Okay. I really don't know much about garbage collectors, how they work, or 
> what 
> makes one particularly good or bad (other than the fact that it needs to be 
> efficient execution-wise and manage memory wisely so that you don't use too 
> much 
> of it or do anything else that would be an overall negative for performance). 
> However, from the comments here - both recent and in the past - it's pretty 
> clear that D's garbage collector is fairly outdated. I would assume that that 
> would be negative for performance - certainly it would mean that significant 
> improvements could be made.
> 
> So, my question is this: what are the plans for the garbage collector? Is the 
> intention to continue to improve it bit by bit, to give it a major overhaul 
> at 
> some point, to outright replace it at a later date, or something else 
> entirely?
> 
> If D is going to compete with C and C++, it needs to be highly efficient, and 
> if 
> the garbage collector isn't up to snuff, that's going to be a big problem. 
> I'm 
> not looking to complain about the current garbage collector - I really don't 
> know how good or bad it is - but if it is rather poor (as I've gotten the 
> impresison that it is - at least in some respects - from various discussions 
> on 
> it here), then I'd assume that it needs a major overhaul or replacement at 
> some 
> point. So, are there any specific plans with regards to that, or is that just 
> something that may be considered in the future?

I'm working on a concurrent GC, things are going really slow, but I plan
to give it more attention this months, and I hope it will be finished
(finished in my own terms, as it is my thesis) by the end of the year.
I would like to explore merging the precise scanning patch and some
other optimizations, collection strategies, but I'm not sure I will have
the time (probably not).

I'm not sure how it would turn out, even when I'm doing regular
benchmarks to ensure, at least, that the current performance is not
degraded (I didn't started doing the collection concurrently).

One more note: I'm working with D1, but using the Tango runtime, so
I guess it should be not to hard to port to D2.

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
Every day 21 new born babies will be given to the wrong parents


Re: State of and plans for the garbage collector

2010-07-15 Thread Vladimir Panteleev
On Thu, 15 Jul 2010 10:18:38 +0300, Jonathan M Davis  
 wrote:


Okay. I really don't know much about garbage collectors, how they work,  
or what makes one particularly good or bad (other than the fact that it  
needs to be efficient execution-wise and manage memory wisely so that  
you don't use too much of it or do anything else that would be an  
overall negative for performance). However, from the comments here -  
both recent and in the past - it's pretty clear that D's garbage  
collector is fairly outdated. I would assume that that would be negative  
for performance - certainly it would mean that significant improvements  
could be made.


IMO the D GC isn't bad, it's just mediocre. (It could have been way worse.)

I would like to use this opportunity to bring up a GC implementation  
written by Jeremie Pelletier, which seems to have gone mostly unnoticed  
when it was posted as a reply to D.announce:

http://pastebin.com/f7a3b4c4a

Aside from multiple optimizations across the board, this GC employs an  
interesting different strategy. The gist of it is that it iteratively  
destroys only objects that have no immediate references. In the case of  
long linked lists, this trades destruction complexity with scan  
complexity, which is a very good change - most times deeply-nested  
structures such as linked lists survive multiple generational cycles.


Jeremie, if you're reading this: how goes your D2 runtime project?

(I also have an unfinished generational GC lying around, which is still  
unknown if it's viable performance-wise - I should really try to finish it  
one day.)


--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net