Re: GC conservatism -- again

2011-01-02 Thread Ulrik Mikaelsson
> Sorry for not replying, I've had some personal issues recently that have 
> taken up all of my time.  Your suggestion seemed workable for solving the 
> dereferencing issue, but leaving the destroyed objects in an invalid state 
> seems likely to cause weird bugs.  And the objects can't safely be reset to 
> their initial state (ala. clear) after destruction for concurrency reasons.  
> So I'm not sure it will really help all that much in practice.  It wouldn't 
> be a tremendous amount of work to try this out though.
On the druntime-list, it was also pointed out that the assumption that
related objects are collected either in the same run or in
reference-order are not true for all GC-types, I.E. not for
incremental GC:s. Therefore it seems like a bad semantic to impose on
the GC, in case someone wants other GC:s some day.

>> The problem I've encountered in D, is that support for complementary
>> predictable allocation schemes which could alleviate some of the
>> problems (i.e. reference-counting and tree-allocation), is quite weak.
>> By weak, I mean undocumented and no supporting framework from the
>> stdlib. In a perfect world, this could even work hand-in-hand with the
>> GC, such that references to refcounted-object could be voided from the
>> destruction of the reference-counted object.
> There's been some talk of doing this, but other things have had priority.
>
>> Is this a discussion that should be held in Phobos/Tango, druntime, or
>> on this list?
> The druntime list would be most appropriate, but it has very few members 
> (partially because so few people are interested in this level of code, I 
> suspect).  The most expedient solution would be to just write the code and 
> submit a patch for evaluation.
>
FYI: I don't have access to build-env for D2/druntime, but I've
created an implementation for RefCounting for Tango, in
http://www.dsource.org/projects/tango/ticket/2024. It might be of
interest to port to druntime, with the mentioned D2-improvements to
SmartRef.


Re: GC conservatism -- again

2010-12-30 Thread Simen kjaeraas

Andrei Alexandrescu  wrote:

I'm thinking of an API that allows people to dynamically add "to do"  
stuff, a la C's atexit(). Yours above is dynamic (because client code  
can append to toFree) but is hand-written by the client.


So like this:


import std.stdio;

void delegate()[] terminators;

void atExit( void delegate() dg ) {
terminators ~= dg;
}

static ~this( ) {
foreach ( t; terminators ) {
t( );
}
}

void main( ) {
atExit( {writeln("OH HAI! I BROKE UR THREAD");} );
}


--
Simen


Re: GC conservatism -- again

2010-12-30 Thread Walter Bright

Steven Schveighoffer wrote:
This doesn't help.  The requirement is that it is one contiguous piece 
of memory.


I don't know if we can solve the GC problems (although it seems that is 
all people want to talk about here), I was wondering if we can provide a 
library/language supported solution to enable better manual memory 
management in these extreme circumstances.


I suspect that in general when one has such gigantic allocations, there are only 
a couple of them in a program, and it can be managed manually by using malloc/free.


Re: GC conservatism -- again

2010-12-30 Thread Sean Kelly
Andrei Alexandrescu Wrote:

> On 12/30/10 12:38 PM, Sean Kelly wrote:
> > Andrei Alexandrescu Wrote:
> >
> >> On 12/30/10 11:19 AM, Sean Kelly wrote:
> >>> Adam Ruppe Wrote:
> >>>
>  On 12/27/10, Steven Schveighoffer   wrote:
> > What about tools to make deallocation easier?  For example, we have
> > scope(exit) that you could potentially use to ensure a memory block is
> > deallocated on exit from a scope, what about a thread exit?
> 
>  It seems to me that the simplest thing might simply be a list of 
>  delegates stored
>  with the thread:
> 
>  thread.onexit ~= { free(whatever); };
> >>>
> >>> Already possible via static dtors.
> >>
> >> Not dynamically...
> >
> > void*[] toFree;
> >
> > static ~this() {
> >  foreach(e; toFree)
> >  free(e);
> > }
> >
> > What am I missing?
> 
> I'm thinking of an API that allows people to dynamically add "to do" 
> stuff, a la C's atexit(). Yours above is dynamic (because client code 
> can append to toFree) but is hand-written by the client.

Hand written but trivial.  It would be easy enough to add the feature to 
core.runtime or core.thread though.


Re: GC conservatism -- again

2010-12-30 Thread Vladimir Panteleev
On Thu, 30 Dec 2010 18:38:00 +0200, Adam Ruppe   
wrote:



Something along these lines:


That code is "something along the lines" of data.d (but for D2) :)
https://github.com/CyberShadow/data.d

--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: GC conservatism -- again

2010-12-30 Thread Andrei Alexandrescu

On 12/30/10 12:38 PM, Sean Kelly wrote:

Andrei Alexandrescu Wrote:


On 12/30/10 11:19 AM, Sean Kelly wrote:

Adam Ruppe Wrote:


On 12/27/10, Steven Schveighoffer   wrote:

What about tools to make deallocation easier?  For example, we have
scope(exit) that you could potentially use to ensure a memory block is
deallocated on exit from a scope, what about a thread exit?


It seems to me that the simplest thing might simply be a list of delegates 
stored
with the thread:

thread.onexit ~= { free(whatever); };


Already possible via static dtors.


Not dynamically...


void*[] toFree;

static ~this() {
 foreach(e; toFree)
 free(e);
}

What am I missing?


I'm thinking of an API that allows people to dynamically add "to do" 
stuff, a la C's atexit(). Yours above is dynamic (because client code 
can append to toFree) but is hand-written by the client.


Andrei


Re: GC conservatism -- again

2010-12-30 Thread Sean Kelly
Andrei Alexandrescu Wrote:

> On 12/30/10 11:19 AM, Sean Kelly wrote:
> > Adam Ruppe Wrote:
> >
> >> On 12/27/10, Steven Schveighoffer  wrote:
> >>> What about tools to make deallocation easier?  For example, we have
> >>> scope(exit) that you could potentially use to ensure a memory block is
> >>> deallocated on exit from a scope, what about a thread exit?
> >>
> >> It seems to me that the simplest thing might simply be a list of delegates 
> >> stored
> >> with the thread:
> >>
> >> thread.onexit ~= { free(whatever); };
> >
> > Already possible via static dtors.
> 
> Not dynamically...

void*[] toFree;

static ~this() {
foreach(e; toFree)
free(e);
}

What am I missing?


Re: GC conservatism -- again

2010-12-30 Thread Andrei Alexandrescu

On 12/30/10 11:19 AM, Sean Kelly wrote:

Adam Ruppe Wrote:


On 12/27/10, Steven Schveighoffer  wrote:

What about tools to make deallocation easier?  For example, we have
scope(exit) that you could potentially use to ensure a memory block is
deallocated on exit from a scope, what about a thread exit?


It seems to me that the simplest thing might simply be a list of delegates 
stored
with the thread:

thread.onexit ~= { free(whatever); };


Already possible via static dtors.


Not dynamically...

Andrei


Re: GC conservatism -- again

2010-12-30 Thread Sean Kelly
Adam Ruppe Wrote:

> On 12/27/10, Steven Schveighoffer  wrote:
> > What about tools to make deallocation easier?  For example, we have
> > scope(exit) that you could potentially use to ensure a memory block is
> > deallocated on exit from a scope, what about a thread exit?
> 
> It seems to me that the simplest thing might simply be a list of delegates 
> stored
> with the thread:
> 
> thread.onexit ~= { free(whatever); };

Already possible via static dtors.



Re: GC conservatism -- again

2010-12-30 Thread Adam Ruppe
On 12/27/10, Steven Schveighoffer  wrote:
> What about tools to make deallocation easier?  For example, we have
> scope(exit) that you could potentially use to ensure a memory block is
> deallocated on exit from a scope, what about a thread exit?

It seems to me that the simplest thing might simply be a list of delegates 
stored
with the thread:

thread.onexit ~= { free(whatever); };

> Any other ideas?

Perhaps a new array type that behaves mostly like a built in (offering slices,
appending, etc.) except it is built off C's malloc and realloc and thus can and
must be manually freed. If the contents are fancy, they can still be garbage
collected, but the array itself is manual.

I guess you could also use D's own new and delete on arrays, but delete is
discouraged I guess so sticking to C's functions for that is probably for the
best. Besides, with a custom struct, we can disable some
operations that might make manual management hard so you don't
accidentally keep bad pointers around.

Something along these lines:


import std.traits;
import core.stdc.stdlib;

ManualArray!(T) manual(T)(size_t size, size_t capacity = 0) if( 
!hasAliasing!(T) ) {
if(capacity == 0)
capacity = size; // or whatever a sane default is

// the array and room for length and capacity right before it
ManualArray!(T) arr;

arr.length = size;
arr.capacity = capacity;
arr.payload = malloc(T.sizeof * capacity);

if(arr.payload is null)
throw new OutOfMemoryError();

return arr;
}

// FIXME: if it does have aliasing, add this block as a gc root too so the 
objects
// within get regular treatment

struct ManualArray(T) {
// @disable this() {} // if we could disable the default constructor...
@disable this(this) { } // don't copy me without specific intention

typeof(this) opBinary(string op)(T rhs) if( op == "~" ) {
static assert(0, "This kind of concat might get messy so it is
disabled.");
}

typeof(this) opOpAssign(string op)(T rhs) if( op == "~") {
if(length + 1 > capacity) {
capacity = capacity * 2; // or whatever is sane
auto npayload = realloc(payload, capacity * T.sizeof);
if(npayload is null)
throw new OutOfMemoryError();

payload = npayload;
}

payload[ length++ ] = rhs;

return this;
}

// flesh out other ops so this thing doesn't suck, like append array,
// append typeof(this), slicing, index, dup, etc. If you take a slice of
the whole
// array, you'd be able to pass that around as a D array. Use care if 
you do.

size_t length;
size_t capacity;

T* payload; // a C style array
}

void free(ref ManualArray!T item) {
.free(item.payload);
item.length = item.capacity = 0;
item.payload = null;
}



That probably won't even compile, but you can see where I'm going - give a C
style array some D nicities, but it is still managed as in C, with the exception
that some operations are disabled in an attempt to keep unintended aliasing
to a minimum. Naturally, slicing would work against that, but it's just too 
useful
to not provide. Maybe a const slice is a good compromise - we don't
want them to append to a slice since I don't know how that would break things,
being on the C heap and all - but you'd still have access to the contents for D
functions.


Re: GC conservatism -- again

2010-12-30 Thread Steven Schveighoffer
On Wed, 29 Dec 2010 19:45:54 -0500, Walter Bright  
 wrote:



Steven Schveighoffer wrote:

Any other ideas?


Garbage collection tends to break down when you have enormous blocks of  
memory allocated - 200Mb certainly qualifies.


I suggest breaking up the data structure into smaller pieces, like  
making it an array of arrays.


This doesn't help.  The requirement is that it is one contiguous piece of  
memory.


I don't know if we can solve the GC problems (although it seems that is  
all people want to talk about here), I was wondering if we can provide a  
library/language supported solution to enable better manual memory  
management in these extreme circumstances.


-Steve


Re: GC conservatism -- again

2010-12-29 Thread Walter Bright

Steven Schveighoffer wrote:

Any other ideas?


Garbage collection tends to break down when you have enormous blocks of memory 
allocated - 200Mb certainly qualifies.


I suggest breaking up the data structure into smaller pieces, like making it an 
array of arrays.


Re: GC conservatism -- again

2010-12-29 Thread Robert Jacques
On Wed, 29 Dec 2010 12:27:02 -0700, Steven Schveighoffer  
 wrote:
On Wed, 29 Dec 2010 14:00:17 -0500, Robert Jacques   
wrote:
On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer  
 wrote:
On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques   
wrote:
Second, the false pointer problem disappears (for practical purposes)  
when you move to 64-bit.


I'm not sure I like this "solution", but you are correct.  This is  
somewhat mitigated however by the way memory is allocated (I'm  
assuming not sparsely throughout the address space, and also low in  
the address space).  It certainly makes it less likely that a 64-bit  
random long points at data, but it's not inconceivable to have 32-bits  
of 0 interspersed with non-zero data.  It might be likely to have a  
struct with two ints back to back, where one int is frequently 0.


Ah, but the GC can allocate ram in any section of the address space it  
wants, so it would be easy for the upper 32-bits to be always non-zero  
by design.


huh?  How does the GC control whether you set one int to 0 and the other  
not?


Hmm, perhaps an example would be best. Consider a 64-bit thread-local GC.  
It might allocate ram in the following pattern:

[2-bit Shared/Local][16-bit thread ID][6-bit tag][40-bit pointer]
So first, the address space is divided up into 4 regions, [11...] and  
[00...] are left free for user use (external allocation, memory-mapped  
files, etc), and [01...]/[10...] are used to denote shared/thread-local.
Next you have the thread ID, which is one way to support thread-local  
allocation/thread-local GCs.
Then you might have a 6-bit region that lock-free algorithms could use as  
a tag (and would be ignored by the shared GC), or local GCs could use for  
internal purposes.
Finally, you have a 40-bit region with what we normally think of as a  
'pointer'.


The thing to understand, is that 64-bit computers are really 40-bit  
computers, currently. And given that 40-bits will hold us until we get to  
peta-bytes, we should have 24-bits to add meta-info to our pointers for  
some time to come. So, as long as we choose this meta-info carefully, we  
can avoid common bit patterns and the associated false pointers.




Third, modern GCs (i.e. thread-local GCs) can further reduce the  
false pointer issue.


I'd rather have precise scanning :)  There are issues with  
thread-local GCs.


The only issue with thread-local GCs is that you can't cast to  
immutable and then shared the result across threads. And eventually,  
well have a) better ways of constructing immutable and b) a deep idup,  
to mitigate this.


I'm talking about collection cycles -- they necessarily need to scan  
both thread local and shared heaps because of the possibility of  
cross-heap pointing.  Which means you gain very little for thread-local  
heaps.  The only thing it buys you is you can assume the shared heap has  
no pointers into local heaps, and local heaps have no pointers into  
other local heaps.  But if you can't run a collection on just one heap,  
there is no point in separating them.


The defining feature of thread-local GCs is that they _can_ run collection  
cycles independently from each other.


Plus it's easy to cast without moving the data, which is not undefined  
currently if you take the necessary precautions, but would cause large  
problems with separate heaps.


Casting from immutable/shared would be memory valid, it's casting to  
immutable and shared where movement has to occur. As casting between  
immutable/shared and  mutable is logically invalid, I'd expect that these  
cases would be rare (once we solve the problem of safely constructing  
complex immutable data)




If we have the typeinfo of a memory block for the GC to parse, you can  
also rule out cross-thread pointers without thread-local GCs (for  
unshared data).


Which basically creates all the problems of thread-local GC, with very  
few of the advantages.


What are the advantages?  I'm not being sarcastic, I really don't know.

-Steve


The major advantage is that they match or out-perform the modern  
generational/concurrent GCs, even when backed by conservative mark-sweep  
collectors (according to Apple). (TLGCs can be backed by modern  
collectors) Essentially, TLGCs work by separately allocating and  
collecting objects which are not-shared between threads. Since these  
collectors don't have to be thread safe, they can be more efficient in  
their implementation, and collections only have to deal with their subset  
of the heap and their own stack. This reduces pause times and false  
pointers, etc. TLGCs are inherently parallel and have interleaved pause  
times, which can greatly reduce the "embarrassing pause" effect (i.e. your  
worker thread running out of ram won't pause your GUI, or stop other  
workers from fulfilling http requests).


In D, they become even more important, since to the best of my knowledge D  
can never support generational GCs and only supports concu

Re: GC conservatism -- again

2010-12-29 Thread Steven Schveighoffer
On Wed, 29 Dec 2010 14:00:17 -0500, Robert Jacques   
wrote:


On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer  
 wrote:


On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques   
wrote:


First, I'd like to point out that precise scanning of the heap (and  
I'll assume this can be extended to globals), is a long standing  
enhancement request.


Yes, I know.  Does it also do precise scanning of the stack and  
global/TLS data?  Because that also needs to happen (I think you need a  
lot more compiler support for that) to really fix this problem.


Globals and TLS should be possible, but the stack isn't without some  
major architectural changes (tagging+filtering, dual-stacks, etc).


That is good, everything but stack is a step forward.



Second, the false pointer problem disappears (for practical purposes)  
when you move to 64-bit.


I'm not sure I like this "solution", but you are correct.  This is  
somewhat mitigated however by the way memory is allocated (I'm assuming  
not sparsely throughout the address space, and also low in the address  
space).  It certainly makes it less likely that a 64-bit random long  
points at data, but it's not inconceivable to have 32-bits of 0  
interspersed with non-zero data.  It might be likely to have a struct  
with two ints back to back, where one int is frequently 0.


Ah, but the GC can allocate ram in any section of the address space it  
wants, so it would be easy for the upper 32-bits to be always non-zero  
by design.


huh?  How does the GC control whether you set one int to 0 and the other  
not?




Third, modern GCs (i.e. thread-local GCs) can further reduce the false  
pointer issue.


I'd rather have precise scanning :)  There are issues with thread-local  
GCs.


The only issue with thread-local GCs is that you can't cast to immutable  
and then shared the result across threads. And eventually, well have a)  
better ways of constructing immutable and b) a deep idup, to mitigate  
this.


I'm talking about collection cycles -- they necessarily need to scan both  
thread local and shared heaps because of the possibility of cross-heap  
pointing.  Which means you gain very little for thread-local heaps.  The  
only thing it buys you is you can assume the shared heap has no pointers  
into local heaps, and local heaps have no pointers into other local  
heaps.  But if you can't run a collection on just one heap, there is no  
point in separating them.


Plus it's easy to cast without moving the data, which is not undefined  
currently if you take the necessary precautions, but would cause large  
problems with separate heaps.




If we have the typeinfo of a memory block for the GC to parse, you can  
also rule out cross-thread pointers without thread-local GCs (for  
unshared data).


Which basically creates all the problems of thread-local GC, with very  
few of the advantages.


What are the advantages?  I'm not being sarcastic, I really don't know.

-Steve


Re: GC conservatism -- again

2010-12-29 Thread Robert Jacques
On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer  
 wrote:


On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques   
wrote:


First, I'd like to point out that precise scanning of the heap (and  
I'll assume this can be extended to globals), is a long standing  
enhancement request.


Yes, I know.  Does it also do precise scanning of the stack and  
global/TLS data?  Because that also needs to happen (I think you need a  
lot more compiler support for that) to really fix this problem.


Globals and TLS should be possible, but the stack isn't without some major  
architectural changes (tagging+filtering, dual-stacks, etc).


Second, the false pointer problem disappears (for practical purposes)  
when you move to 64-bit.


I'm not sure I like this "solution", but you are correct.  This is  
somewhat mitigated however by the way memory is allocated (I'm assuming  
not sparsely throughout the address space, and also low in the address  
space).  It certainly makes it less likely that a 64-bit random long  
points at data, but it's not inconceivable to have 32-bits of 0  
interspersed with non-zero data.  It might be likely to have a struct  
with two ints back to back, where one int is frequently 0.


Ah, but the GC can allocate ram in any section of the address space it  
wants, so it would be easy for the upper 32-bits to be always non-zero by  
design.


Third, modern GCs (i.e. thread-local GCs) can further reduce the false  
pointer issue.


I'd rather have precise scanning :)  There are issues with thread-local  
GCs.


The only issue with thread-local GCs is that you can't cast to immutable  
and then shared the result across threads. And eventually, well have a)  
better ways of constructing immutable and b) a deep idup, to mitigate this.


If we have the typeinfo of a memory block for the GC to parse, you can  
also rule out cross-thread pointers without thread-local GCs (for  
unshared data).


Which basically creates all the problems of thread-local GC, with very few  
of the advantages.


For clarity's sake, I assume that thread-local GCs would be used in  
conjunction with a standard shared GC, in order to handle immutable and  
shared heap data correctly.


Re: GC conservatism -- again

2010-12-29 Thread bearophile
Vladimir Panteleev:

> Yes, this was the main motivation behind my data.d module :) If you need  
> to manage large blocks of plain data, the simplest solution at the moment  
> is not to keep it in the managed heap.

Having to manually manage large chunks of boring memory, like very large arrays 
of numbers, is not a shame :-) Sometimes you have to do that even with very old 
CLisp compilers (see ITA Software). But I have to manage memory manually I'd 
like to have some helpers in Phobos, like a hierarchical allocator (I have used 
hierarchical allocators in C and I like them).

Bye,
bearophile


Re: GC conservatism -- again

2010-12-29 Thread Vladimir Panteleev
On Wed, 29 Dec 2010 16:28:03 +0200, Steven Schveighoffer  
 wrote:


On Wed, 29 Dec 2010 01:00:01 -0500, Vladimir Panteleev  
 wrote:


On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev  
 wrote:



when the total size of pointers in the managed heap


Correction: total size of *stray* pointers. This is a very important  
distinction. If you have over 300 MB of stray pointers in your managed  
heap, then it's almost certain that some binary data is being marked as  
having pointers. Common culprits are void[] allocations, and large  
structs/classes that mix pointers and non-pointers.


This is cool stuff.  This is what I normally was considering to be the  
problem with conservative GC.  However, I discovered that the reverse  
relationship also causes problems -- if you have a large block *not*  
containing pointers (say, a char[]), the chances that some 4-byte part  
of a pointer-containing struct "points" to it is relatively high as  
well.  The probability of a random integer pointing to a 200MB block is  
1/20 (logically, 200MB is 1/20 of 2^32).  I think this might be a worse  
problem than the unlikely scenario that you allocate 200 or 300 MB of  
void[] data, since that isn't very common.  Essentially, you have no  
recourse but to manually manage that memory, there isn't a magic bullet  
(such as reallocating as ubyte[]).


Especially when it is frowned upon to forcefully deallocate memory...

-Steve


Yes, this was the main motivation behind my data.d module :) If you need  
to manage large blocks of plain data, the simplest solution at the moment  
is not to keep it in the managed heap.


--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: GC conservatism -- again

2010-12-29 Thread Steven Schveighoffer
On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques   
wrote:


First, I'd like to point out that precise scanning of the heap (and I'll  
assume this can be extended to globals), is a long standing enhancement  
request.


Yes, I know.  Does it also do precise scanning of the stack and global/TLS  
data?  Because that also needs to happen (I think you need a lot more  
compiler support for that) to really fix this problem.


Second, the false pointer problem disappears (for practical purposes)  
when you move to 64-bit.


I'm not sure I like this "solution", but you are correct.  This is  
somewhat mitigated however by the way memory is allocated (I'm assuming  
not sparsely throughout the address space, and also low in the address  
space).  It certainly makes it less likely that a 64-bit random long  
points at data, but it's not inconceivable to have 32-bits of 0  
interspersed with non-zero data.  It might be likely to have a struct with  
two ints back to back, where one int is frequently 0.


Third, modern GCs (i.e. thread-local GCs) can further reduce the false  
pointer issue.


I'd rather have precise scanning :)  There are issues with thread-local  
GCs.


If we have the typeinfo of a memory block for the GC to parse, you can  
also rule out cross-thread pointers without thread-local GCs (for unshared  
data).


-Steve


Re: GC conservatism -- again

2010-12-29 Thread Steven Schveighoffer
On Wed, 29 Dec 2010 01:00:01 -0500, Vladimir Panteleev  
 wrote:


On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev  
 wrote:



when the total size of pointers in the managed heap


Correction: total size of *stray* pointers. This is a very important  
distinction. If you have over 300 MB of stray pointers in your managed  
heap, then it's almost certain that some binary data is being marked as  
having pointers. Common culprits are void[] allocations, and large  
structs/classes that mix pointers and non-pointers.


This is cool stuff.  This is what I normally was considering to be the  
problem with conservative GC.  However, I discovered that the reverse  
relationship also causes problems -- if you have a large block *not*  
containing pointers (say, a char[]), the chances that some 4-byte part of  
a pointer-containing struct "points" to it is relatively high as well.   
The probability of a random integer pointing to a 200MB block is 1/20  
(logically, 200MB is 1/20 of 2^32).  I think this might be a worse problem  
than the unlikely scenario that you allocate 200 or 300 MB of void[] data,  
since that isn't very common.  Essentially, you have no recourse but to  
manually manage that memory, there isn't a magic bullet (such as  
reallocating as ubyte[]).


Especially when it is frowned upon to forcefully deallocate memory...

-Steve


Re: GC conservatism -- again

2010-12-28 Thread Vladimir Panteleev
On Wed, 29 Dec 2010 05:13:10 +0200, Vladimir Panteleev  
 wrote:



when the total size of pointers in the managed heap


Correction: total size of *stray* pointers. This is a very important  
distinction. If you have over 300 MB of stray pointers in your managed  
heap, then it's almost certain that some binary data is being marked as  
having pointers. Common culprits are void[] allocations, and large  
structs/classes that mix pointers and non-pointers.


--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: GC conservatism -- again

2010-12-28 Thread Vladimir Panteleev
On Wed, 29 Dec 2010 02:01:14 +0200, Sean Kelly   
wrote:


The lists are more topical and receive SVN commit messages and such.  I  
guess this could all be done via usenet, but somehow it seems overkill.


FWIW, Gmane offers NNTP access to the list, and from personal experience  
it seems to work just fine (including posting).


news://news.gmane.org/gmane.comp.lang.d.runtime

--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: GC conservatism -- again

2010-12-28 Thread Vladimir Panteleev
On Tue, 28 Dec 2010 20:07:03 +0200, Ulrik Mikaelsson  
 wrote:



Considering the "don't count on collection"-principle I mentioned
above, is this approach safe, especially if other parts of the program
grows large in address-space?


There is a simple formula with which you can calculate the probability of  
a stray pointer keeping your object uncollected. The probability of an  
object of size M containing random data to contain a pointer inside an  
object of size N is:

P(M,N) = 1 - (1 - N/2^32)^(M/4)
Let's suppose that we want to calculate at which point does there become a  
danger for a Data object to be erroneously not collected. Taking an  
approximation of a DataWrapper and a Data object taking about 32 bytes of  
heap, the point at which the probability goes over 0.5 is when the total  
size of pointers in the managed heap goes over 372 MB. If you have over  
300 MB of pointers in your managed heap, your program may need a bit of  
restructuring anyway :)


Some more details about the formula in my graduation thesis paper, page 30:
http://thecybershadow.net/d/Memory_Management_in_the_D_Programming_Language.pdf

--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: GC conservatism -- again

2010-12-28 Thread Sean Kelly
Johan Granberg Wrote:
> 
> Acctually I think one reason for low intrest on the druntime list is that
> people simply have not heard of it. I for one hasn't. I personally think
> more people would read it if it was just another news group on the
> digitalmars server. There might be other benefits for doing it the current
> way ofcurse.

The lists are more topical and receive SVN commit messages and such.  I guess 
this could all be done via usenet, but somehow it seems overkill.  For what 
it's worth, the druntime list is available on the same page as the Phobos list.


Re: GC conservatism -- again

2010-12-28 Thread Johan Granberg
Sean Kelly wrote:

> Ulrik Mikaelsson Wrote:
> 
>> > I have posted about this problem several times. Never got any replies.
>> > I think memory management is D's "elephant in the room" in this regard.
>> I'm sorry to agree. I recently brought up this general question in a
>> thread called "Mixing GC and non-GC in D." in both druntime and here.
>> Only got one answer each in both druntime and this group. The problem
>> basically, from all I've read about GC:s in general and the D GC in
>> particular, is the problem of non-deterministic collection, both when
>> and more importantly IF the object will ever be collected. This makes
>> the GC unsuitable for "expensive" resources, such as file descriptors,
>> or very big chunks of memory.
> 
> Sorry for not replying, I've had some personal issues recently that have
> taken up all of my time.  Your suggestion seemed workable for solving the
> dereferencing issue, but leaving the destroyed objects in an invalid state
> seems likely to cause weird bugs.  And the objects can't safely be reset
> to their initial state (ala. clear) after destruction for concurrency
> reasons.  So I'm not sure it will really help all that much in practice. 
> It wouldn't be a tremendous amount of work to try this out though.
> 
>> The problem I've encountered in D, is that support for complementary
>> predictable allocation schemes which could alleviate some of the
>> problems (i.e. reference-counting and tree-allocation), is quite weak.
>> By weak, I mean undocumented and no supporting framework from the
>> stdlib. In a perfect world, this could even work hand-in-hand with the
>> GC, such that references to refcounted-object could be voided from the
>> destruction of the reference-counted object.
> 
> There's been some talk of doing this, but other things have had priority.
> 
>> Is this a discussion that should be held in Phobos/Tango, druntime, or
>> on this list?
> 
> The druntime list would be most appropriate, but it has very few members
> (partially because so few people are interested in this level of code, I
> suspect).  The most expedient solution would be to just write the code and
> submit a patch for evaluation.

Acctually I think one reason for low intrest on the druntime list is that
people simply have not heard of it. I for one hasn't. I personally think
more people would read it if it was just another news group on the
digitalmars server. There might be other benefits for doing it the current
way ofcurse.


Re: GC conservatism -- again

2010-12-28 Thread Sean Kelly
Ulrik Mikaelsson Wrote:

> > I have posted about this problem several times. Never got any replies.
> > I think memory management is D's "elephant in the room" in this regard.
> I'm sorry to agree. I recently brought up this general question in a
> thread called "Mixing GC and non-GC in D." in both druntime and here.
> Only got one answer each in both druntime and this group. The problem
> basically, from all I've read about GC:s in general and the D GC in
> particular, is the problem of non-deterministic collection, both when
> and more importantly IF the object will ever be collected. This makes
> the GC unsuitable for "expensive" resources, such as file descriptors,
> or very big chunks of memory.

Sorry for not replying, I've had some personal issues recently that have taken 
up all of my time.  Your suggestion seemed workable for solving the 
dereferencing issue, but leaving the destroyed objects in an invalid state 
seems likely to cause weird bugs.  And the objects can't safely be reset to 
their initial state (ala. clear) after destruction for concurrency reasons.  So 
I'm not sure it will really help all that much in practice.  It wouldn't be a 
tremendous amount of work to try this out though.

> The problem I've encountered in D, is that support for complementary
> predictable allocation schemes which could alleviate some of the
> problems (i.e. reference-counting and tree-allocation), is quite weak.
> By weak, I mean undocumented and no supporting framework from the
> stdlib. In a perfect world, this could even work hand-in-hand with the
> GC, such that references to refcounted-object could be voided from the
> destruction of the reference-counted object.

There's been some talk of doing this, but other things have had priority.

> Is this a discussion that should be held in Phobos/Tango, druntime, or
> on this list?

The druntime list would be most appropriate, but it has very few members 
(partially because so few people are interested in this level of code, I 
suspect).  The most expedient solution would be to just write the code and 
submit a patch for evaluation.


Re: GC conservatism -- again

2010-12-28 Thread Ulrik Mikaelsson
> I have posted about this problem several times. Never got any replies.
> I think memory management is D's "elephant in the room" in this regard.
I'm sorry to agree. I recently brought up this general question in a
thread called "Mixing GC and non-GC in D." in both druntime and here.
Only got one answer each in both druntime and this group. The problem
basically, from all I've read about GC:s in general and the D GC in
particular, is the problem of non-deterministic collection, both when
and more importantly IF the object will ever be collected. This makes
the GC unsuitable for "expensive" resources, such as file descriptors,
or very big chunks of memory.

The problem I've encountered in D, is that support for complementary
predictable allocation schemes which could alleviate some of the
problems (i.e. reference-counting and tree-allocation), is quite weak.
By weak, I mean undocumented and no supporting framework from the
stdlib. In a perfect world, this could even work hand-in-hand with the
GC, such that references to refcounted-object could be voided from the
destruction of the reference-counted object.

I understand these schemes are possible by some manual allocator and
GC-root trickery, but this is really something I, as a fairly new
d-programmer, expect from the stdlib/runtime of a language like D.
Implementing this for guarding memory "correctly" requires quite a lot
of understanding of destructors and the GC, that productive developers
should not need to know. For refcounting (which is the scheme I've had
time to really think about), I propose:
 - an interface "IRefCounted", for any resource that should be refcounted.
 - a ready implementation for memory allocation "RefCounted", which
allocates from a GC-scanned but non-GC-controlled pool
 - for D2 (not possible for D1 due to missing struct-dtor:s) a
C++-style "SmartRef" template-struct, which encapsulates a reference
to a typed IRefCounted, automatically releasing on reference-change or
struct-destruction.
 - Documentation on how to use the "RefCounted" implementation, as
well as implementing other resource-guards (I.E. for opening/closing
FD:s)

Is this a discussion that should be held in Phobos/Tango, druntime, or
on this list?

> 1) Diamond, the memory debugger that helps trace leaks to their source. I
> recently added a feature long overdue - show the exact shortest path of
> pointers / heap objects that cause a specified object to remain uncollected.
Great! I've been unsuccessfully looking for such a thing for D, good
to know it exists.

> 2) data.d - a module to allow simple handling of large amounts of plain
> data. The gist is that a simple proxy object keeps a pointer to a malloc'd
> region, which is free'd when the proxy object is collected. Works like a
> charm - practically no leaks (the proxy object is tiny), and keeps the
> managed heap small so collections run quickly.
Considering the "don't count on collection"-principle I mentioned
above, is this approach safe, especially if other parts of the program
grows large in address-space?


Re: GC conservatism -- again

2010-12-27 Thread Robert Jacques
On Mon, 27 Dec 2010 09:12:53 -0700, Steven Schveighoffer  
 wrote:
While fixing a design issue in druntime, I re-discovered how crappy the  
conservative GC can be in certain situations.


The issue involves the array appending cache, which is used to  
significantly speed up array appends.  Essentially, since the array  
appending cache was scanned as containing pointers, it would 'hold  
hostage' any arrays that were appended to.  As it turns out, this was  
necessary, because there was no mechanism to update the cache when the  
blocks were collected by the GC.  I added this mechanism, and discovered  
-- it didn't help much :)


The test case I was using was posted to the newsgroup a few months  
back.  Basically, the test was to append to an array until it consumed  
at least 200MB.  A single test takes a while, but what's more disturbing  
is, if you run the same test again, the memory used for the first test  
*isn't released*.  My first thought was that the array append cache was  
holding this data hostage, but that was not the problem.


The problem is that when you allocate 1/20th the address space of the  
process to one contiguous memory block, the chances that the  
conservative GC will detect a false pointer into that block are very  
high.  What's worse, if the 'pointer' into the block is somewhere in  
TLS, global data, or high on the stack, that block is stuck for pretty  
much the life of the program or thread.


So I was thinking of possible ways to solve this problem.  Solving it  
perfectly is not really possible unless we implement precise scanning in  
all areas of memory (heap, stack, global data).  While that certainly  
*could* be a possibility, it's not likely to happen any time soon.


What about tools to make deallocation easier?  For example, we have  
scope(exit) that you could potentially use to ensure a memory block is  
deallocated on exit from a scope, what about a thread exit?  What about  
declaring a scope object at a high level that nested scopes could use to  
deallocate from?  Making this a bit easier might be a good alternative  
while precise scanning hasn't been adopted yet.


Any other ideas?

-Steve


First, I'd like to point out that precise scanning of the heap (and I'll  
assume this can be extended to globals), is a long standing enhancement  
request. It's issue 3463  
(http://d.puremagic.com/issues/show_bug.cgi?id=3463). It does have a  
patch, but it's now out of date and needs someone to update it (hint,  
hint). Second, the false pointer problem disappears (for practical  
purposes) when you move to 64-bit. Third, modern GCs (i.e. thread-local  
GCs) can further reduce the false pointer issue.


Re: GC conservatism -- again

2010-12-27 Thread Vladimir Panteleev
On Tue, 28 Dec 2010 05:25:59 +0200, Vladimir Panteleev  
 wrote:


the entire program - including the standard library, libc, etc. - are  
written in Safe-D


Sorry, that made no sense.

--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: GC conservatism -- again

2010-12-27 Thread Vladimir Panteleev
On Mon, 27 Dec 2010 18:12:53 +0200, Steven Schveighoffer  
 wrote:


While fixing a design issue in druntime, I re-discovered how crappy the  
conservative GC can be in certain situations.


I have posted about this problem several times. Never got any replies. I  
think memory management is D's "elephant in the room" in this regard.


I firmly believe that precise scanning is a pipe dream, unless the entire  
program - including the standard library, libc, etc. - are written in  
Safe-D (or at least just D). Precise scanning also leads to a further  
decrease of GC performance, and I'm not very enthusiastic about that.


Instead, I have created other solutions for this problem:

1) Diamond, the memory debugger that helps trace leaks to their source. I  
recently added a feature long overdue - show the exact shortest path of  
pointers / heap objects that cause a specified object to remain  
uncollected.


https://github.com/CyberShadow/Diamond

2) data.d - a module to allow simple handling of large amounts of plain  
data. The gist is that a simple proxy object keeps a pointer to a malloc'd  
region, which is free'd when the proxy object is collected. Works like a  
charm - practically no leaks (the proxy object is tiny), and keeps the  
managed heap small so collections run quickly.


https://github.com/CyberShadow/data.d

--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


GC conservatism -- again

2010-12-27 Thread Steven Schveighoffer
While fixing a design issue in druntime, I re-discovered how crappy the  
conservative GC can be in certain situations.


The issue involves the array appending cache, which is used to  
significantly speed up array appends.  Essentially, since the array  
appending cache was scanned as containing pointers, it would 'hold  
hostage' any arrays that were appended to.  As it turns out, this was  
necessary, because there was no mechanism to update the cache when the  
blocks were collected by the GC.  I added this mechanism, and discovered  
-- it didn't help much :)


The test case I was using was posted to the newsgroup a few months back.   
Basically, the test was to append to an array until it consumed at least  
200MB.  A single test takes a while, but what's more disturbing is, if you  
run the same test again, the memory used for the first test *isn't  
released*.  My first thought was that the array append cache was holding  
this data hostage, but that was not the problem.


The problem is that when you allocate 1/20th the address space of the  
process to one contiguous memory block, the chances that the conservative  
GC will detect a false pointer into that block are very high.  What's  
worse, if the 'pointer' into the block is somewhere in TLS, global data,  
or high on the stack, that block is stuck for pretty much the life of the  
program or thread.


So I was thinking of possible ways to solve this problem.  Solving it  
perfectly is not really possible unless we implement precise scanning in  
all areas of memory (heap, stack, global data).  While that certainly  
*could* be a possibility, it's not likely to happen any time soon.


What about tools to make deallocation easier?  For example, we have  
scope(exit) that you could potentially use to ensure a memory block is  
deallocated on exit from a scope, what about a thread exit?  What about  
declaring a scope object at a high level that nested scopes could use to  
deallocate from?  Making this a bit easier might be a good alternative  
while precise scanning hasn't been adopted yet.


Any other ideas?

-Steve