Re: Precise GC state

2018-02-03 Thread Temtaime via Digitalmars-d
And 2k18 passes and there will be no precise gc this year it 
seems.

Great D language. Better to think to move from it away.


Re: Precise GC state

2017-11-27 Thread Ola Fosheim Grostad via Digitalmars-d
On Monday, 27 November 2017 at 18:32:39 UTC, Ola Fosheim Grøstad 
wrote:

You get this:

shared_ptr -> control_block -> object



Actually, seems like the common implementation uses 16 bytes, so 
that it has a direct pointer as well. So twice the size of 
unique_ptr.





Re: Precise GC state

2017-11-27 Thread Ola Fosheim Grøstad via Digitalmars-d
On Monday, 27 November 2017 at 20:13:35 UTC, Dmitry Olshansky 
wrote:
I’ve seen a tech giant that works on uber high-performance 
things making heavy use of STL, and being fond of C++14 
“high-level” features.


Look, I am not against "high level" features, but shared_ptr is 
nothing like the thread local ref-counting you were talking about 
which was at least introduced by Walter and Andrei as a competing 
solution to tracking of borrowed pointers in Rust.



That must be why you seriously have no idea how people use it. 
Otherwise you’d know that nobody shares a reference to shared 
pointer but rather a copy. The whole point of smart pointer is 
to avoid naked references.


I have an idea of how people use it… why all the ad hominem?

I just don't find it very useful. In practice I'm usually better 
off exchanging through a facade and having the exchanged resource 
wrapped up in a custom type.


It is more maintainable and easy to deal with in terms of 
correctness.



They are if you use them as intended - value types, that 
pretend to be pointers.


Sure. Not that this is how they are defined to be used. It is a 
very limited use case.



a shared control block is. (Technically you share a pointer, 
not an object of type T, but that is a minor detail.)


No, you actually share an object and the last one to decrement 
the ref will destroy it. Now that is actually thread-safe 
thanks to atomic counter.


Ok, sure, you share an ownership relation to an object, but it 
isn't general enough, which was what I tried to suggest. For 
instance, if you have a resource on a GPU with an integer ID that 
is also a resource, but shared_ptr does not work (unless you 
accept extra indirections).


So, I'd rather see an owned, which also can handle 
non-pointed-to-objects. Then one might ask, why not just have 
"unique>" or "shared>" that also would 
work with "unique" and "shared"?


Then you can build generic ADTs with a generic notion of 
ownership .


The basic idea is that an owned resource that has to go through a 
pointer isn't really necessary or as generic a type as it could 
have been.




That is my original point, which you now violently agree with :)


Ok, then I didn't get your original point.


Just underlines that you don’t understand how it is supposed to 
be used.


I know how it can be used. That doesn't mean that it is as 
generally useful as it should be to be useful for what I want to 
do.



I also don’t understand what are you trying to prove. My point 
was: C++ has to do atomic counting, because it has no concept 
of shared vs local.


That I disagree with. In C++ correct syncing is entirely 
dependent on the programmer. Even if atomic_shared_ptr provide 
some guarantees these guarantees will not hold if you depend on 
more than one entity. So there is no mechanism in C++ that 
provide something to ensure correctness when it comes to 
concurrency. I.e. you totally depend on the programmer's ability.


You can create a concept of D-style local/shared by just creating 
a wrapper type and use a lint-style verification. This is the 
route C++ is heading in general for various properties related to 
pointer types it seems. (e.g. wrap with borrowed, owned, nullable 
etc).


Which is also basically what D has, except shared is a builtin, 
but semantically I see no difference. To get a semantic 
difference that matter you need concurrency to be part of the 
language itself. If you cannot express propositions about 
concurrency in the language, then there is a very limited ability 
to do something interesting with it.



You did imply it’s useless on single thread and not used in 
concurrency because you need manual sync. I’d argue you don’t 
need a sync to access shared_ptr, you’d need it for object it 
points to.


Listen, if you build a shared graph with shared_ptrs then you are 
in trouble. So you would need atomic_shared_ptrs. That's a 
perfectly reasonable use case.


You are presuming that the shared_ptr objects are thread local 
and that they are not embedded in other objects that are 
reachable by another thread.


Those a very limiting presumptions.


It still solves the ownership and deterministic destruction in 
the presense of concurrent shared_ptrs of the same object.


But it doesn't solve correctness. You still rely on the 
programmer to get correctness. Which is a tall order when you use 
encapsulation.


That only works if you never embed shared_ptr in a class.


Copy & destruction is actually fine, which you seem to ignore. 
Also accessing const methods of payload is fine. New C++ 
implies const == thread safe btw, at least in all of STL.


As far as I can see from the description at cppreference.com 
there is no guarantee of freedom for races for anything if both 
threads reference the same shared_ptr object (not the control 
block but the object pointing to the control block).


You can obviously transmit it to another thread using a 
mutex-like/messaging-like setup if that is what 

Re: Precise GC state

2017-11-27 Thread Dmitry Olshansky via Digitalmars-d
On Monday, 27 November 2017 at 18:29:56 UTC, Ola Fosheim Grøstad 
wrote:
On Monday, 27 November 2017 at 17:16:50 UTC, Dmitry Olshansky 
wrote:
Really, shared_ptr is the most contagious primitive of modern 
C++.


Not really. Unique_ptr is, though.

To quote MS STL guy “I’m surprised we had no non-inteusive 
ref-counted ptr in std lib for so long”.

Going Native videos are full of questions on that.


Yeah, they tend to pretend that C++ is a high level language, 
it makes for good talks.


Let me put it this way, C++ is an acceptable low level 
language, but it is a rather poor high level language.


So if you talk with people who use C++ as a high level language 
you probably see different usage  patterns.


I’ve seen a tech giant that works on uber high-performance things 
making heavy use of STL, and being fond of C++14 “high-level” 
features. I almost took their job offer, it is a nice place.


Comparatively I’ve worked at Google where none of the new goodies 
are in use. There are some great things in their “std”, but most 
have horrible interfaces and people routinelly discuss how having 
at least STL-like primitives would be a huge boon.


Err... That was my point... Only assignment and reset is 
protected in shared_ptr, all other methods require manual 
sync.


For the benefit of others, let me destroy that:


Oh well, looks like reset and assignment isn't required per the 
spec to be protected against races either. That doesn't weaken 
my point, the control block is protected but not the methods of 
shared_ptr.


I hardly ever use shared_ptr. In practical programming you


That must be why you seriously have no idea how people use it. 
Otherwise you’d know that nobody shares a reference to shared 
pointer but rather a copy. The whole point of smart pointer is to 
avoid naked references.


- shared_ptr allows to share T with thread-safe ownership, 
ref-counts are accounted atomically (sharing copies of 
shared_ptr pointing to the same block). Copy of shared_ptr is 
thread safe and does the count.


Not sure if I follow that description. Shared shared_ptr's are 
not thread safe


They are if you use them as intended - value types, that pretend 
to be pointers.


a shared control block is. (Technically you share a pointer, 
not an object of type T, but that is a minor detail.)


No, you actually share an object and the last one to decrement 
the ref will destroy it. Now that is actually thread-safe thanks 
to atomic counter.


shared_ptr is pointing to a shared control block that in turn 
has a pointer that points to the resource. This control block  
contains two counters. These counters are incremented and 
decremented atomically.


That is my original point, which you now violently agree with :)

If you access the same shared_ptr from two threads then you 
have a potential race condition per the spec.


Just underlines that you don’t understand how it is supposed to 
be used.


- atomic_shared_prt would also allow one to initialize a 
shared variable (eg global) of type shared_ptr safely from 
multiple threads


The main point is that the methods of atomic_shared_ptr are 
protected against races.


It is needed because you usually would have have ownership 
pointers embedded in a another shared object.


So having a protected control-block is not sufficient outside 
the trivial toy-program.


Sufficient for what? Smart pointers are about ownership not 
protecting from races.


I also don’t understand what are you trying to prove. My point 
was: C++ has to do atomic counting, because it has no concept of 
shared vs local.


All of things you’ve said just prove it or are talking about 
something I’m not discussing here.




The manual synchronization part comes in if you try to work 
with payload T itself. THAT is manual.


No, none of the methods on shared_ptr guarantees that races 
won't happen.


Just as you quoted below - all const are, including copy & 
destructor.


Since C++ doesn’t see a difference at type level of shared vs 
local, there is no thread-local variation of shared_ptr. It 
would be too unsafe even for those guys, contrary to what Ola 
responses imply.


Sigh. I implied that shared_ptr on a single thread is mostly 
useless.


You did imply it’s useless on single thread and not used in 
concurrency because you need manual sync. I’d argue you don’t 
need a sync to access shared_ptr, you’d need it for object it 
points to.


It still solves the ownership and deterministic destruction in 
the presense of concurrent shared_ptrs of the same object.


But again - you countered a different point all together, see 
above.




But this:

struct OurSharedCache {
   shared_ptr something;
   shared_ptr somethingelse
}


Not that I talked about it. This has nothing to do with 
shared_ptr and/or things I stated originally.


Well I can clearly see misinformation and describe it as such. 
See your point about atomic_shared_ptr.


What about it?

«If multiple threads of execution access the same s

Re: Precise GC state

2017-11-27 Thread Ola Fosheim Grøstad via Digitalmars-d
On Monday, 27 November 2017 at 18:00:59 UTC, Jonathan M Davis 
wrote:
I don't understand this. I would expect most modern C++ 
programs to be using shared_ptr as the default for most 
pointers and thus use it heavily.


You get this:

shared_ptr -> control_block -> object

Instead of this:

unique_ptr -> object
my_ref_ptr ->object



shared_ptr is a _huge_ boon for avoiding memory problems.


Yes, if you don't mind the inefficiency, i.e. if you are doing 
high-level programming in C++.





Re: Precise GC state

2017-11-27 Thread Ola Fosheim Grøstad via Digitalmars-d
On Monday, 27 November 2017 at 17:16:50 UTC, Dmitry Olshansky 
wrote:
Really, shared_ptr is the most contagious primitive of modern 
C++.


Not really. Unique_ptr is, though.

To quote MS STL guy “I’m surprised we had no non-inteusive 
ref-counted ptr in std lib for so long”.

Going Native videos are full of questions on that.


Yeah, they tend to pretend that C++ is a high level language, it 
makes for good talks.


Let me put it this way, C++ is an acceptable low level language, 
but it is a rather poor high level language.


So if you talk with people who use C++ as a high level language 
you probably see different usage  patterns.


Err... That was my point... Only assignment and reset is 
protected in shared_ptr, all other methods require manual sync.


For the benefit of others, let me destroy that:


Oh well, looks like reset and assignment isn't required per the 
spec to be protected against races either. That doesn't weaken my 
point, the control block is protected but not the methods of 
shared_ptr.


I hardly ever use shared_ptr. In practical programming you 
usually need to exchange more than a single entity so you need 
manual sync anyway.



- shared_ptr allows to share T with thread-safe ownership, 
ref-counts are accounted atomically (sharing copies of 
shared_ptr pointing to the same block). Copy of shared_ptr is 
thread safe and does the count.


Not sure if I follow that description. Shared shared_ptr's are 
not thread safe, a shared control block is. (Technically you 
share a pointer, not an object of type T, but that is a minor 
detail.)


shared_ptr is pointing to a shared control block that in turn has 
a pointer that points to the resource. This control block  
contains two counters. These counters are incremented and 
decremented atomically.


If you access the same shared_ptr from two threads then you have 
a potential race condition per the spec.


- atomic_shared_prt would also allow one to initialize a shared 
variable (eg global) of type shared_ptr safely from multiple 
threads


The main point is that the methods of atomic_shared_ptr are 
protected against races.


It is needed because you usually would have have ownership 
pointers embedded in a another shared object.


So having a protected control-block is not sufficient outside the 
trivial toy-program.


The manual synchronization part comes in if you try to work 
with payload T itself. THAT is manual.


No, none of the methods on shared_ptr guarantees that races won't 
happen. And in general you usually want to access more than one 
unit in a critical section.


So in C++ you typically need manual sync.

There are no suitable language features to get around that in C++ 
and library features are inadequate in the general sense.


Go and Pony have language features that are helpful. C++ doesn't.

Since C++ doesn’t see a difference at type level of shared vs 
local, there is no thread-local variation of shared_ptr. It 
would be too unsafe even for those guys, contrary to what Ola 
responses imply.


Sigh. I implied that shared_ptr on a single thread is mostly 
useless.


But this:

struct OurSharedCache {
   shared_ptr something;
   shared_ptr somethingelse
}

Not safe.


Well I can clearly see misinformation and describe it as such. 
See your point about atomic_shared_ptr.


What about it?

«If multiple threads of execution access the same shared_ptr 
without synchronization and any of those accesses uses a 
non-const member function of shared_ptr then a data race will 
occur»


http://en.cppreference.com/w/cpp/memory/shared_ptr

«The class template atomic_shared_ptr provides thread-safe atomic 
pointer operations over a std::shared_ptr.»


http://en.cppreference.com/w/cpp/experimental/atomic_shared_ptr



Re: Precise GC state

2017-11-27 Thread Jonathan M Davis via Digitalmars-d
On Monday, November 27, 2017 15:56:09 Ola Fosheim Grostad via Digitalmars-d 
wrote:
> On Monday, 27 November 2017 at 14:35:03 UTC, Dmitry Olshansky
>
> wrote:
> > Then watch Herb’s Sutter recent talk “Leak freedom by default”.
> > Now THAT guy must be out of his mind :)
>
> He could be, I havent seen it... Shared_ptr isnt frequently used,
> it is a last resort,

I don't understand this. I would expect most modern C++ programs to be using
shared_ptr as the default for most pointers and thus use it heavily. Any
time that I've been able to use at least TR1 for C++, I've used it
everywhere, and when I've been stuck with C++98, I've used a home-grown
equivalent. shared_ptr is a _huge_ boon for avoiding memory problems.

- Jonathan M Davis




Re: Precise GC state

2017-11-27 Thread Dmitry Olshansky via Digitalmars-d
On Monday, 27 November 2017 at 15:56:09 UTC, Ola Fosheim Grostad 
wrote:
On Monday, 27 November 2017 at 14:35:03 UTC, Dmitry Olshansky 
wrote:
Then watch Herb’s Sutter recent talk “Leak freedom by 
default”. Now THAT guy must be out of his mind :)


He could be, I havent seen it... Shared_ptr isnt frequently 
used, it is a last resort,


Ahhhaaahhha.
Really, shared_ptr is the most contagious primitive of modern C++.

To quote MS STL guy “I’m surprised we had no non-inteusive 
ref-counted ptr in std lib for so long”.

Going Native videos are full of questions on that.



atomic_shared_pointer is nothing but what you seem to imply. 
It’s not manual sync for one.


Err... That was my point... Only assignment and reset is 
protected in shared_ptr, all other methods require manual sync.


For the benefit of others, let me destroy that:

- shared_ptr allows to share T with thread-safe ownership, 
ref-counts are accounted atomically (sharing copies of shared_ptr 
pointing to the same block). Copy of shared_ptr is thread safe 
and does the count.


- atomic_shared_prt would also allow one to initialize a shared 
variable (eg global) of type shared_ptr safely from multiple 
threads


The manual synchronization part comes in if you try to work with 
payload T itself. THAT is manual.


Since C++ doesn’t see a difference at type level of shared vs 
local, there is no thread-local variation of shared_ptr. It would 
be too unsafe even for those guys, contrary to what Ola responses 
imply.




You keep spreading FUD on this forum, I’m still not sure of 
your reasons though.


And there we go ad hominem as usual... With no argument to back 
it up... Bye.


Well I can clearly see misinformation and describe it as such. 
See your point about atomic_shared_ptr.







Re: Precise GC state

2017-11-27 Thread Ola Fosheim Grostad via Digitalmars-d
Btw, it would improve the discourse if people tried to 
distinguish between language constructs and library constructs...


Re: Precise GC state

2017-11-27 Thread Ola Fosheim Grostad via Digitalmars-d
On Monday, 27 November 2017 at 14:35:03 UTC, Dmitry Olshansky 
wrote:
Then watch Herb’s Sutter recent talk “Leak freedom by default”. 
Now THAT guy must be out of his mind :)


He could be, I havent seen it... Shared_ptr isnt frequently used, 
it is a last resort,


atomic_shared_pointer is nothing but what you seem to imply. 
It’s not manual sync for one.


Err... That was my point... Only assignment and reset is 
protected in shared_ptr, all other methods require manual sync.


You keep spreading FUD on this forum, I’m still not sure of 
your reasons though.


And there we go ad hominem as usual... With no argument to back 
it up... Bye.





Re: Precise GC state

2017-11-27 Thread Dmitry Olshansky via Digitalmars-d
On Monday, 27 November 2017 at 07:03:01 UTC, Ola Fosheim Grostad 
wrote:
On Monday, 27 November 2017 at 06:47:00 UTC, Dmitry Olshansky 
wrote:
Last time I check shared_ptr can be safely shared across 
threads, hence RC is takling synchronization and most likely 
atomics since locks won’t be any better.


The controlblock can, but it is crazy to use shared_ptr for 
anything more than high level ownership. It is a general 
solution with weak pointers and extra indirection, not a 
typical RC implementation for datastructures.


Then watch Herb’s Sutter recent talk “Leak freedom by default”. 
Now THAT guy must be out of his mind :)


I have no idea what are your typical C++ programmers.




In C++ sync is manual, which is the only efficient way to do


??? shared_ptr is nowhere manual.


There is an upcoming atomic_shared_ptr, but it is not in the 
standard yet.


atomic_shared_pointer is nothing but what you seem to imply. It’s 
not manual sync for one.




My post is about particular primitive in C++ std, what could 
be done instead or in addition to is not important.


Oh, but it is.


You keep spreading FUD on this forum, I’m still not sure of your 
reasons though.




1. D currently does not provide what you says it does.


RefCounted!T



2. Sane C++ programmers rarely use shared_ptr for more than 
exchsnging ownership (suitable for sharing things like bitmap 
textures).


Including across threads btw.

There are plenty of other RC implementations for tracking 
memory.


Like what? In standard C++ all you’ve got is unique_ptr and 
shared_ptr (and weak but that besides the point). There is maybe 
a ton of reimplementations of said things.





Re: Precise GC state

2017-11-27 Thread Ola Fosheim Grostad via Digitalmars-d

On Monday, 27 November 2017 at 10:13:41 UTC, codephantom wrote:
But in a discussion about GC, some technical details might 
prove to be very useful to those of us following this 
discussion.


Precise scanning of pointers makes sense when you have many 
cachelines on the GC with no pointers in them. But if you mostly 
have pointers (a large graph or a tree) then it makes little 
difference.


You need to add a more extensive whole program type analysis 
where you prove that the GC memory heap isnt reachable from a 
type... I.e. "pointers reachable through class T can provably 
never point into the a GC heap in this specific program, so 
therefore we can ignore all pointers to T".





Re: Precise GC state

2017-11-27 Thread codephantom via Digitalmars-d

On Monday, 27 November 2017 at 09:38:52 UTC, Temtaime wrote:


Current GC in D is shit


Can you elaborate?

"D is totally useless"..."Dlang is a toy in outer space"... "GC 
in D is shit" ..


I'm very open minded to these different argument styles, and 
occassionaly make use of them myself. But in a discussion about 
GC, some technical details might prove to be very useful to those 
of us following this discussion.


I encourage you to further refine your argument...  ;-)

https://en.wikipedia.org/wiki/Argument



Re: Precise GC state

2017-11-27 Thread Ola Fosheim Grostad via Digitalmars-d

On Monday, 27 November 2017 at 09:38:52 UTC, Temtaime wrote:

Please stop this flame


There is no flaming.

Current GC in D is shit and all this speaking won't improve 
situation.


If so, why are you here? But you are fundamentally wrong. Precise 
GC will not bring a general improvement, for that you need 
advanced pointer analysis. So you need a change of philosophy to 
get a performant GC: semantic changes.





Re: Precise GC state

2017-11-27 Thread Temtaime via Digitalmars-d
Please stop this flame and make first real step into bringing 
precise GC to us.
Current GC in D is shit and all this speaking won't improve 
situation.

The PR is not merged although it passed all the tests.


Re: Precise GC state

2017-11-27 Thread Ola Fosheim Grøstad via Digitalmars-d
On Monday, 27 November 2017 at 07:09:25 UTC, Ola Fosheim Grostad 
wrote:
But it kinda is missing the point that if it only is in a 
single thread then it would typically only have only one 
assignment. Shared_ptr is for holding a resource not for using 
it...


Just to expand a bit on this: What is lost here is that what has 
been proposed for D is to have a RC solution to solve what Rust 
does with borrowed pointers. Not unlike Swift.


In C++ life time management of borrowed pointers is fully manual, 
so it relies on algorithmic design rather than a programming 
mechanism. Although for C++ there is upcoming wrapper-types to 
allow for separate static analysis tooling that is comparable to 
Rust.


The intended usage is not comparable. (In C++ you typically will 
have per-type or per-application RC, wrapped up where you need it 
by encapsulation and move-semantics.)




Re: Precise GC state

2017-11-26 Thread Ola Fosheim Grostad via Digitalmars-d
On Monday, 27 November 2017 at 06:59:30 UTC, Petar Kirov 
[ZombineDev] wrote:
the shared_ptr itself) and you can't opt out of that even if 
you're not sharing the shared_ptr with other threads.


Well, the compiler can in theory ellide atomics if it csn prove 
that the memory cannot be accessed by another thread.


But it kinda is missing the point that if it only is in a single 
thread then it would typically only have only one assignment. 
Shared_ptr is for holding a resource not for using it...




Re: Precise GC state

2017-11-26 Thread Ola Fosheim Grostad via Digitalmars-d
On Monday, 27 November 2017 at 06:47:00 UTC, Dmitry Olshansky 
wrote:
Last time I check shared_ptr can be safely shared across 
threads, hence RC is takling synchronization and most likely 
atomics since locks won’t be any better.


The controlblock can, but it is crazy to use shared_ptr for 
anything more than high level ownership. It is a general solution 
with weak pointers and extra indirection, not a typical RC 
implementation for datastructures.



In C++ sync is manual, which is the only efficient way to do


??? shared_ptr is nowhere manual.


There is an upcoming atomic_shared_ptr, but it is not in the 
standard yet.


My post is about particular primitive in C++ std, what could be 
done instead or in addition to is not important.


Oh, but it is.

1. D currently does not provide what you says it does.

2. Sane C++ programmers rarely use shared_ptr for more than 
exchsnging ownership (suitable for sharing things like bitmap 
textures). There are plenty of other RC implementations for 
tracking memory. So you compare apples and oranges.






Re: Precise GC state

2017-11-26 Thread Petar via Digitalmars-d
On Monday, 27 November 2017 at 06:36:27 UTC, Ola Fosheim Grostad 
wrote:
On Monday, 27 November 2017 at 05:47:49 UTC, Dmitry Olshansky 
wrote:
likely via RAII. Not to mention cheap (thread-local) Ref 
Counting, C++ and many other language have to use atomics 
which makes RC costly.


No, you dont. Nobody in their right mind would do so in C++ as 
a general solution. Seems there is trend in doing D-advocacy 
based on the assumption that programmers using other languages 
are crazy these days.


In C++ sync is manual, which is the only efficient way to do 
it. Proving correctness for an efficient general solution is an 
unsolved theoretical problem. You can do it for high level 
mechanisms, but not low level atm.


Rust and Pony claims to have solutions, but they are not 
general. D most certainly does not have it and never will.


When threading is a libray type then you cannot achieve more in 
D than you can achieve in C++, i.e. Shared is not going to do 
more than a C++ library type with a separate static analysis 
tool.


What I think Dmitry meant is that shared_ptr uses atomic 
instructions for the reference counting (though you need to use 
atomic_shared_ptr, or atomic_* function if you want to modify the 
shared_ptr itself) and you can't opt out of that even if you're 
not sharing the shared_ptr with other threads. On the other hand 
in D, a properly designed SharedPtr(T) will use atomic 
instructions for reference counting, only if it's payload type T 
is has the 'shared' type qualifier. And if you have 
'shared(SharedPtr(T))', only then you'll have atomic instructions 
for the SharedPtr struct itself, but unlike shared_ptr, you 
won't have access to the non-thread safe methods.


Re: Precise GC state

2017-11-26 Thread Dmitry Olshansky via Digitalmars-d
On Monday, 27 November 2017 at 06:36:27 UTC, Ola Fosheim Grostad 
wrote:
On Monday, 27 November 2017 at 05:47:49 UTC, Dmitry Olshansky 
wrote:
likely via RAII. Not to mention cheap (thread-local) Ref 
Counting, C++ and many other language have to use atomics 
which makes RC costly.


No, you dont. Nobody in their right mind would do so in C++ as 
a general solution. Seems there is trend in doing D-advocacy 
based on the assumption that programmers using other languages 
are crazy these days.


Last time I check shared_ptr can be safely shared across threads, 
hence RC is takling synchronization and most likely atomics since 
locks won’t be any better.




In C++ sync is manual, which is the only efficient way to do


??? shared_ptr is nowhere manual.

Sorry but the rest of the of post is take a direction into the 
fog, that I don’t want to follow.


My post is about particular primitive in C++ std, what could be 
done instead or in addition to is not important.





Re: Precise GC state

2017-11-26 Thread Ola Fosheim Grostad via Digitalmars-d
On Monday, 27 November 2017 at 05:47:49 UTC, Dmitry Olshansky 
wrote:
likely via RAII. Not to mention cheap (thread-local) Ref 
Counting, C++ and many other language have to use atomics which 
makes RC costly.


No, you dont. Nobody in their right mind would do so in C++ as a 
general solution. Seems there is trend in doing D-advocacy based 
on the assumption that programmers using other languages are 
crazy these days.


In C++ sync is manual, which is the only efficient way to do it. 
Proving correctness for an efficient general solution is an 
unsolved theoretical problem. You can do it for high level 
mechanisms, but not low level atm.


Rust and Pony claims to have solutions, but they are not general. 
D most certainly does not have it and never will.


When threading is a libray type then you cannot achieve more in D 
than you can achieve in C++, i.e. Shared is not going to do more 
than a C++ library type with a separate static analysis tool.


Re: Precise GC state

2017-11-26 Thread Dmitry Olshansky via Digitalmars-d

On Sunday, 26 November 2017 at 18:58:04 UTC, jmh530 wrote:
On Sunday, 26 November 2017 at 08:49:42 UTC, Dmitry Olshansky 
wrote:


If all of the code is 100% @safe (not system and not trusted) 
you have a different language where write barriers would be 
cheaper to implement.


Sadly you can’t “skip” write barriers in your @system code 
because it may run as part of larger @safe. Which is where 
they are the most costly.


I was thinking you would use a generational or precise GC for 
@safe code and then fall back to the normal GC with 
@system/@trusted code.


Wishful thinking. Memory flows through a program and things 
allocated in @safe get passed to @system (including the last 
reference cases) and vise versa.


Also in D generational hypothesis may not bear as much value due 
to stack-allocation, libc heap allocations for temporaries likely 
via RAII. Not to mention cheap (thread-local) Ref Counting, C++ 
and many other language have to use atomics which makes RC costly.


Basically the GC heap is not all of the heap, so a young objects 
is even smaller part of that.


Lastly it is telling that new low-latency concurrent GCs do away 
without explicit generations.


I’m thinking a more efficient way to takle temporaries would be 
to add Regions to GC.
So the moment you enter a region all GC things are allocated in a 
special segment, once you leave it free the segment. The last 
trick is to tweak memory protection as guard pages over it. Boom! 
Now if somebody touches it - DanglingPointerException :)


Not a panacea but:
- libraries using too much of GC can be controlled (unless they 
build global state)

- manual memory management without memory corruption
- fast bump allocation, which is what all of sensible VM 
languages do


And no need to create entire separate heap with its own 
generational schema.


Not sure if that's possible or not, but in my head it would be 
a separate heap from the @safe code.


And then I do append in @system on array allocated in @safe. Boom!


Certainly would be added complexity.





Re: Precise GC state

2017-11-26 Thread jmh530 via Digitalmars-d
On Sunday, 26 November 2017 at 19:11:08 UTC, Jonathan M Davis 
wrote:


It wouldn't work. @safe code and @system code call each other 
all the time (using @trusted where necessary), and they freely 
exchange stuff that was allocated on the GC heap. [snip]


I see. Fair enough.


Re: Precise GC state

2017-11-26 Thread Ola Fosheim Grostad via Digitalmars-d
On Sunday, 26 November 2017 at 19:11:08 UTC, Jonathan M Davis 
wrote:
We can't even have different heaps for immutable and mutable 
stuff, because it's very common to construct something as 
mutable and then cast it to immutable (either explicitly or


This is easy to fix, introduce a uniquely owned type (isolated) 
that only can transition to immutable.


So it is more about being willing to tighten up the semantics. 
Same thing with GC, but everything has a cost.


That said Adam has a point with getting more users, it isnt 
obvious that the costs wouldnt be offset by increased interest. 
Anyway, it seems like C# and Swift are pursuing the domain D is 
in by gradually expanding into more performance oriented 
programming mechanisms... We'll see.


Re: Precise GC state

2017-11-26 Thread Jonathan M Davis via Digitalmars-d
On Sunday, November 26, 2017 18:58:04 jmh530 via Digitalmars-d wrote:
> On Sunday, 26 November 2017 at 08:49:42 UTC, Dmitry Olshansky
>
> wrote:
> > If all of the code is 100% @safe (not system and not trusted)
> > you have a different language where write barriers would be
> > cheaper to implement.
> >
> > Sadly you can’t “skip” write barriers in your @system code
> > because it may run as part of larger @safe. Which is where they
> > are the most costly.
>
> I was thinking you would use a generational or precise GC for
> @safe code and then fall back to the normal GC with
> @system/@trusted code. Not sure if that's possible or not, but in
> my head it would be a separate heap from the @safe code.
> Certainly would be added complexity.

It wouldn't work. @safe code and @system code call each other all the time
(using @trusted where necessary), and they freely exchange stuff that was
allocated on the GC heap. Heck, it's not even possible to have per-thread
heaps (much as that would be desirable), because stuff can be passed between
threads, and you can freely cast between shared and non-shared. You do have
to be careful about it if you don't want to have problems, but just like
@safe code can call @trusted code such that it's possible for @safe code be
calling code that was vetted for memory safety by a programmer rather than
the compiler, you can have code that casts to and from shared and works
perfectly well so long as the programmer vets it properly.

We can't even have different heaps for immutable and mutable stuff, because
it's very common to construct something as mutable and then cast it to
immutable (either explicitly or because it's returned from a pure function
which is able to do the cast implicitly, because it knows that the return
value couldn't possibly have been passed into the function and thus had to
have been allocated inside it).

D's type system protects you from all kinds of dumb mistakes, but it has way
too many backdoors for the kind of stuff that requires that things be locked
down like they would be inside a VM like Java has. Ultimately, if you're
talking about a GC and what it can and can't do, I expect that there's very
little difference between C and D in terms of what you types of GC you can
get away with. D does protect the programmer, but ultimately, it lets you do
just about anything that C lets you do if you really want to, and any GC
that we use has to work with that.

- Jonathan M Davis




Re: Precise GC state

2017-11-26 Thread jmh530 via Digitalmars-d
On Sunday, 26 November 2017 at 08:49:42 UTC, Dmitry Olshansky 
wrote:


If all of the code is 100% @safe (not system and not trusted) 
you have a different language where write barriers would be 
cheaper to implement.


Sadly you can’t “skip” write barriers in your @system code 
because it may run as part of larger @safe. Which is where they 
are the most costly.


I was thinking you would use a generational or precise GC for 
@safe code and then fall back to the normal GC with 
@system/@trusted code. Not sure if that's possible or not, but in 
my head it would be a separate heap from the @safe code. 
Certainly would be added complexity.


Re: Precise GC state

2017-11-26 Thread Ola Fosheim Grostad via Digitalmars-d
On Sunday, 26 November 2017 at 08:49:42 UTC, Dmitry Olshansky 
wrote:
Sadly you can’t “skip” write barriers in your @system code 
because it may run as part of larger @safe. Which is where they


Well, you can if you carefully lock the gc runtime or if you dont 
modify existing scannable pointers that points to existing 
objects (e.g. You could fill an empty array of pointers in unsafe 
code or pointers that point to something unscannable.), but all 
unsafe code would need vetting.


So it isnt impossible technically, but it is impossible without a 
change of philosophy.


Re: Precise GC state

2017-11-26 Thread Dmitry Olshansky via Digitalmars-d

On Sunday, 26 November 2017 at 04:01:31 UTC, jmh530 wrote:
On Friday, 24 November 2017 at 05:53:37 UTC, Dmitry Olshansky 
wrote:
A better GC is a great direction. Generational one is not 
feasible unless we disallow quite a few of our features.


What about @safe?


If all of the code is 100% @safe (not system and not trusted) you 
have a different language where write barriers would be cheaper 
to implement.


Sadly you can’t “skip” write barriers in your @system code 
because it may run as part of larger @safe. Which is where they 
are the most costly.


Re: Precise GC state

2017-11-25 Thread jmh530 via Digitalmars-d
On Friday, 24 November 2017 at 05:53:37 UTC, Dmitry Olshansky 
wrote:
A better GC is a great direction. Generational one is not 
feasible unless we disallow quite a few of our features.


What about @safe?


Re: Precise GC state

2017-11-24 Thread codephantom via Digitalmars-d
On Friday, 24 November 2017 at 07:48:03 UTC, Ola Fosheim Grøstad 
wrote:


But I am not sure if Walter's goal is to attract as many users 
as possible.


Given all the bullshit bugs I have to deal with, I'm starting to 
think it's the opposite.


Re: Precise GC state

2017-11-24 Thread Dmitry Olshansky via Digitalmars-d

On Thursday, 23 November 2017 at 20:13:31 UTC, Adam Wilson wrote:


a precise GC will enable data with isolated or immutable 
indirections to

be safely moved between threads


I think you could in any case. What precise allows is to copy 
object during collection if there is no conservative pointers to 
the object, of course.


I would focus on a generational GC first for two reasons. The 
first is that you can delay the scans of the later gens if the 
Gen0 (nursery) has enough space, so for a lot of programs it 
would result in a significant cut-down in collection times.


This is like saying that having a large heap will allow you to 
cut-down
collection time. The reason gen0 is fast is that it's mostly dead 
and you limit the amount of memory Gen0 scans to that of live set 
in Gen0 heap + remebered set in old gen.


It has nothing to do with delaying the scan, in fact Gen0 scans 
are way more frequent then scans of a simple single-gen GC.


The second is that you still typically have to stop the 
execution of the thread on the Gen0 collection (the objects 
most likely to be hot).


If your Gen0 collector is Stop the World, like all of Java 
collectors except for the most recent ones - ZGC and Shenandoah.


Ironically both are not generational :)

So with a non-generational concurrent collector you have to 
stop the thread for the entirety of the scan,



False again. Any "mostly" concurrent GC can scan with a super 
small pause that is typically used only to mark stacks of 
Threads. See also the 2 collectors I mentioned.


Go's GC is also non-generational (and SUUUPER popular) to put the 
last nail in the coffin of "I need generational GC because you 
can't do X without it?".


because you have no way to know which objects are hot and which 
are cold.


Why would that stop collector? There is no need to know what is 
hot, and garbage is always cold.





Re: Precise GC state

2017-11-23 Thread Ola Fosheim Grøstad via Digitalmars-d

On Friday, 24 November 2017 at 05:34:14 UTC, Adam Wilson wrote:
RAII+stack allocations make sense when I care about WHEN an 
object is released and wish to provide some semblance of 
control over deallocation (although as Andrei has pointed out 
numerous time, you have no idea how many objects are hiding 
under that RAII pointer). But there are plenty of use cases 
where I really don't care when memory is released, or even how 
long it takes to release.


A GC makes most sense when the compiler fail at disproving that 
an object can be part of  cycle of references that can be 
detached.


So it makes most sense for typically long lived objects. Imagine 
if you spend all the effort you would need to put into getting a 
generational GC to work well into implementing pointer analysis 
to improve automatic deallocation instead…


In order to speed up generated code that allows a generational GC 
to run you still would need a massive amount of work put into 
pointer analysis + all the work of making a state of the art GC 
runtime.


Most people consider designing and implementing pointer analysis 
to be difficult. The regular data flow analysis that the D 
compiler has now is trivial in comparison. Might need a new IR 
for it, not sure.



Obviously my pattern isn't "wrong" or else DMD itself is 
"wrong". It's just not your definition of "correct".


Well, you could redefine the semantics of D so that you disallow 
unsafe code and possibly some other things. Then maybe have 
generational GC would be easy to implement if you don't expect 
better performance than any other high level language.



Another use case where RAII makes no sense is GUI apps. The 
object graph required to maintain the state of all those 
widgets can be an absolute performance nightmare on 
destruction. Closing a window can result in the destruction of 
tens-of-thousands of objects (I've worked on codebases like 
that), all from the GUI thread, causing a hang, and a bad user 
experience. Or you could use a concurrent GC and pass off the 
collection to a different thread. (See .NET)


Sounds like someone didn't do design before the started coding 
and just kept adding stuff.


Keep in mind that OS-X and iOS use reference counting for all 
objects and it seems to work for them. But they also have put a 
significant effort into pointer-analysis to reduce ref-count 
overhead, so still quite a lot more work for the compiler 
designer than plain RAII.


Your arguments are based on a presupposition that D should only 
be used a certain way;


No, it is based on what the D language semantics are and the 
stated philosophy and the required changes that it would involve.


I have no problem with D switching to generational GC. Like you I 
think most programs can be made to work fine with the overhead, 
but then you would need to change the philosophy that Walter is 
following. You would also need to either invest a lot into 
pointer analysis to keep a clean separation between GC-references 
and non-GC references, or create a more unforgiving type system 
that ensure such separation.


I think that having a generational GC (or other high level 
low-latency solutions) probably would be a good idea, but I don't 
see how anyone could convince Walter to change his mind on such 
issues. Especially as there are quite a few semantic flaws in D 
that would be easy to fix, that Walter will not fix because he 
like D as it is or thinks it would be too much of a breaking 
change.


You would need to change the D philosophy from "performant with 
some convenience" to "convenience with some means to write 
performant code".


I agree with you that the latter philosophy probably would 
attract more users. It is hard to compete with C++ and Rust on 
the former.


But I am not sure if Walter's goal is to attract as many users as 
possible.




Re: Precise GC state

2017-11-23 Thread Dmitry Olshansky via Digitalmars-d

On Friday, 24 November 2017 at 05:34:14 UTC, Adam Wilson wrote:

On 11/23/17 13:40, Ola Fosheim Grøstad wrote:
On Thursday, 23 November 2017 at 20:13:31 UTC, Adam Wilson 
wrote:

I would focus on a generational GC first for two reasons. The


But generational GC only makes sense if many of your GC 
objects have a
short life span. I don't think this fits well with sensible 
use of a
language like D where you typically would try to put such 
allocations on

the stack and/or use RAII or even an arena.



Sensible to whom? The great thing about D is that it works just 
we well for low-level people who need to wring the most out of 
the metal, AND the high-level productivity minded apps people, 
who don't.




The problem with generational GC is “write barriers” - pieces of 
code compiler needs to insert around every write of a pointer.


Now given that we can cast pointer to integer and back (also 
unions), I don’t see how it would work unless we put write 
barriers everywhere on the off-chance it did wrote a pointer from 
OLD to NEW space.


A better GC is a great direction. Generational one is not 
feasible unless we disallow quite a few of our features.


Re: Precise GC state

2017-11-23 Thread Adam Wilson via Digitalmars-d

On 11/23/17 13:40, Ola Fosheim Grøstad wrote:

On Thursday, 23 November 2017 at 20:13:31 UTC, Adam Wilson wrote:

I would focus on a generational GC first for two reasons. The


But generational GC only makes sense if many of your GC objects have a
short life span. I don't think this fits well with sensible use of a
language like D where you typically would try to put such allocations on
the stack and/or use RAII or even an arena.



Sensible to whom? The great thing about D is that it works just we well 
for low-level people who need to wring the most out of the metal, AND 
the high-level productivity minded apps people, who don't.


RAII+stack allocations make sense when I care about WHEN an object is 
released and wish to provide some semblance of control over deallocation 
(although as Andrei has pointed out numerous time, you have no idea how 
many objects are hiding under that RAII pointer). But there are plenty 
of use cases where I really don't care when memory is released, or even 
how long it takes to release.


For example, all of the D apps that I've written and use in production, 
GC-allocate everything. I don't have a single struct in the code. But I 
don't care, because the program is so short lived and the working set so 
small that there will never be GC collection. And it's not like this is 
an uncommon or unwise pattern in D, DMD itself follows this exact 
pattern. Obviously my pattern isn't "wrong" or else DMD itself is 
"wrong". It's just not your definition of "correct".


Another use case where RAII makes no sense is GUI apps. The object graph 
required to maintain the state of all those widgets can be an absolute 
performance nightmare on destruction. Closing a window can result in the 
destruction of tens-of-thousands of objects (I've worked on codebases 
like that), all from the GUI thread, causing a hang, and a bad user 
experience. Or you could use a concurrent GC and pass off the collection 
to a different thread. (See .NET)



The second is that you still typically have to stop the execution of
the thread on the Gen0 collection (the objects most likely to be hot).
So with a non-generational concurrent collector you have to stop the
thread for the entirety of the scan, because you have no way to know
which objects are hot and which are cold.


How are you going to prove that references are all kept within the
generation in D? There is some very costly book keeping involved that
simply don't work well with D semantics.




Again, which semantics? If you compile with -betterc, the bookkeeping 
and it's overhead simply don't exist.


Your arguments are based on a presupposition that D should only be used 
a certain way; a way that, I am sure, mirrors your own usage patterns. D 
supports a multitude of different usage patterns, some of which look 
nothing like what you are holding up as "correct". And this is what 
makes D special. To remove or dismiss as invalid those usage patterns 
would be detrimental to those of us who use them and be catastrophic to 
D in general.


As a community, can we please stop it with the subjective judgements of 
what is and is not "sensible" in D, and start supporting the people 
using it, however they are using it, even if we are sure that they are 
"wrong"?


--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Precise GC state

2017-11-23 Thread Ola Fosheim Grøstad via Digitalmars-d

On Thursday, 23 November 2017 at 20:13:31 UTC, Adam Wilson wrote:

I would focus on a generational GC first for two reasons. The


But generational GC only makes sense if many of your GC objects 
have a short life span. I don't think this fits well with 
sensible use of a language like D where you typically would try 
to put such allocations on the stack and/or use RAII or even an 
arena.


The second is that you still typically have to stop the 
execution of the thread on the Gen0 collection (the objects 
most likely to be hot). So with a non-generational concurrent 
collector you have to stop the thread for the entirety of the 
scan, because you have no way to know which objects are hot and 
which are cold.


How are you going to prove that references are all kept within 
the generation in D? There is some very costly book keeping 
involved that simply don't work well with D semantics.





Re: Precise GC state

2017-11-23 Thread Adam Wilson via Digitalmars-d

On 11/23/17 02:47, Nordlöw wrote:

On Wednesday, 22 November 2017 at 13:44:22 UTC, Nicholas Wilson wrote:

Thats a linker(?) limitation for OMF (or whatever is the win32 object
file format).


Was just fixed!

What improvements to D's concurrency model is made possible with this
precise GC?

I recall Martin Nowak saying at DConf 2016 that

a precise GC will enable data with isolated or immutable indirections to
be safely moved between threads


Rainer is awesome!

That is certainly one aspect that it will enable.

I would focus on a generational GC first for two reasons. The first is 
that you can delay the scans of the later gens if the Gen0 (nursery) has 
enough space, so for a lot of programs it would result in a significant 
cut-down in collection times.


The second is that you still typically have to stop the execution of the 
thread on the Gen0 collection (the objects most likely to be hot). So 
with a non-generational concurrent collector you have to stop the thread 
for the entirety of the scan, because you have no way to know which 
objects are hot and which are cold.


--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Precise GC state

2017-11-23 Thread Nordlöw via Digitalmars-d
On Wednesday, 22 November 2017 at 13:44:22 UTC, Nicholas Wilson 
wrote:
Thats a linker(?) limitation for OMF (or whatever is the win32 
object file format).


Was just fixed!

What improvements to D's concurrency model is made possible with 
this precise GC?


I recall Martin Nowak saying at DConf 2016 that

a precise GC will enable data with isolated or immutable 
indirections to be safely moved between threads


Re: Precise GC state

2017-11-22 Thread Adam Wilson via Digitalmars-d

On 11/22/17 05:44, Nicholas Wilson wrote:

On Wednesday, 22 November 2017 at 13:23:54 UTC, Nordlöw wrote:

On Wednesday, 22 November 2017 at 10:53:45 UTC, Temtaime wrote:

Hi all !
https://github.com/dlang/druntime/pull/1603


Only the Win32 build fails as

Error: more than 32767 symbols in object file

What's wrong?


Thats a linker(?) limitation for OMF (or whatever is the win32 object
file format).


We really should be using the MSVC linker on Windows for both x86 and x64.

I propose we change the default behavior of -m32 to point to MSVC, keep 
the -m32mscoff, but mark it as deprecated, and add a -m32omf flag to 
retain the current behavior.


--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: Precise GC state

2017-11-22 Thread Nicholas Wilson via Digitalmars-d

On Wednesday, 22 November 2017 at 13:23:54 UTC, Nordlöw wrote:

On Wednesday, 22 November 2017 at 10:53:45 UTC, Temtaime wrote:

Hi all !
https://github.com/dlang/druntime/pull/1603


Only the Win32 build fails as

Error: more than 32767 symbols in object file

What's wrong?


Thats a linker(?) limitation for OMF (or whatever is the win32 
object file format).


Re: Precise GC state

2017-11-22 Thread Nordlöw via Digitalmars-d

On Wednesday, 22 November 2017 at 10:53:45 UTC, Temtaime wrote:

Hi all !
https://github.com/dlang/druntime/pull/1603


Only the Win32 build fails as

Error: more than 32767 symbols in object file

What's wrong?


Re: Precise GC state

2017-11-22 Thread Adam Wilson via Digitalmars-d

On 11/22/17 02:53, Temtaime wrote:

Hi all !
https://github.com/dlang/druntime/pull/1603

Can someone investigate and bring it to us ?
4 years passed from gsoc 2013 and there's still no gc.
Many apps suffers from false pointers and bringing such a gc will help
those who affected by it.
It seems all the tests are passed except win32 because of optlink failures.
Maybe there's some chance to accelerate this PR ?

Thanks all

+1

--
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;


Re: precise gc?

2012-11-11 Thread eskimo

> You could use it, and come up with something else which also 
> works on x86, etc., but I'd look into storing your weak reference 
> (or whatever) in a page with the NO_SCAN attribute set. It will 
> cause the GC to ignore it entirely.

Thanks but that's no option for me, because I need the pointers, to be
ignored by the GC, to be in the same page as pointers which should not
be ignored. (Actually even in the same struct so they are really
neighbors.) 



Re: precise gc?

2012-11-11 Thread David Nadlinger

On Saturday, 10 November 2012 at 22:17:10 UTC, eskimo wrote:

What is the current state? Is it enough to store a pointer in a
ptrdiff_t variable instead of a pointer for the GC to ignore it


For a precise GC (as in the thread title) yes, but not for the 
current D GC.



or is my
current trick of simply inverting its value required?


You could use it, and come up with something else which also 
works on x86, etc., but I'd look into storing your weak reference 
(or whatever) in a page with the NO_SCAN attribute set. It will 
cause the GC to ignore it entirely.


David


Re: precise gc?

2012-11-11 Thread eskimo
> Yeah, you are right if all pointer bits are actually used it is far too
> easy. On the other hand especially because less space is wasted for
> pointers on 32 bit, I can easily afford an extra variable to solve this
> problem (kind of). 

I guess it is a pretty safe bet to assume that the lowest 65535
addresses in memory space (mask: 0x) are not in GC memory?



Re: precise gc?

2012-11-11 Thread eskimo

> 
> I'm not sure I understand why you would hide a pointer from the GC.
As already suggested by Kapps, for weak references. I need that for my
new std.signals implementation.
> 
> > Are there memory models in use
> > where the inverted pointer value might also be in GC memory?
> > 
> 
> Yes, that can happen in 32-bit.
> 
Yeah, you are right if all pointer bits are actually used it is far too
easy. On the other hand especially because less space is wasted for
pointers on 32 bit, I can easily afford an extra variable to solve this
problem (kind of). Buah, I am starting to like 64 bit architectures ;-)

Thanks!



Re: precise gc?

2012-11-10 Thread Kapps
On Saturday, 10 November 2012 at 22:52:56 UTC, Nick Sabalausky 
wrote:

On Sat, 10 Nov 2012 23:17:41 +0100
eskimo  wrote:


Hey party people!

What is the current state? Is it enough to store a pointer in a
ptrdiff_t variable instead of a pointer for the GC to ignore 
it or is

my current trick of simply inverting its value required?



I'm not sure I understand why you would hide a pointer from the 
GC.



For weak references mainly. For example, caching, weak event 
subscribers, and a fair few other things.


Re: precise gc?

2012-11-10 Thread Thiez
On Saturday, 10 November 2012 at 22:52:56 UTC, Nick Sabalausky 
wrote:

On Sat, 10 Nov 2012 23:17:41 +0100
eskimo  wrote:


Hey party people!

What is the current state? Is it enough to store a pointer in a
ptrdiff_t variable instead of a pointer for the GC to ignore 
it or is

my current trick of simply inverting its value required?



I'm not sure I understand why you would hide a pointer from the 
GC.


It might happen by accident, like when you're doing something 
'clever' like using them good old fashioned xor-linked lists...


Re: precise gc?

2012-11-10 Thread Nick Sabalausky
On Sat, 10 Nov 2012 23:17:41 +0100
eskimo  wrote:

> Hey party people!
> 
> What is the current state? Is it enough to store a pointer in a
> ptrdiff_t variable instead of a pointer for the GC to ignore it or is
> my current trick of simply inverting its value required?
> 

I'm not sure I understand why you would hide a pointer from the GC.

> Are there memory models in use
> where the inverted pointer value might also be in GC memory?
> 

Yes, that can happen in 32-bit.



Re: Precise GC

2012-04-15 Thread Jacob Carlborg

On 2012-04-15 04:11, Antti-Ville Tuunainen wrote:

On Saturday, 14 April 2012 at 19:17:02 UTC, Jacob Carlborg wrote:


There is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas


That's the ideas list for proposals. I was asking if anyone else applied
for GSoC using one of those.


Aha, I see. The chosen projects have not been announced yet.

--
/Jacob Carlborg


Re: Precise GC

2012-04-14 Thread Antti-Ville Tuunainen

On Saturday, 14 April 2012 at 19:17:02 UTC, Jacob Carlborg wrote:


There is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas


That's the ideas list for proposals. I was asking if anyone else 
applied for GSoC using one of those.


Re: Precise GC

2012-04-14 Thread Jacob Carlborg

On 2012-04-14 19:46, Antti-Ville Tuunainen wrote:

On Sunday, 8 April 2012 at 02:49:34 UTC, Andrei Alexandrescu wrote:

Maybe we can get a GSoC project on that. We already have a related
proposal (lock-free GC).


That would be me.

Just though I should wave and say hello. Are there other GSoC proposals
in the GC area?


There is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas

--
/Jacob Carlborg


Re: Precise GC

2012-04-14 Thread Antti-Ville Tuunainen
On Sunday, 8 April 2012 at 02:49:34 UTC, Andrei Alexandrescu 
wrote:
Maybe we can get a GSoC project on that. We already have a 
related proposal (lock-free GC).


That would be me.

Just though I should wave and say hello. Are there other GSoC 
proposals in the GC area?





Re: Precise GC

2012-04-14 Thread Robert Jacques

On Sat, 14 Apr 2012 05:21:08 -0500, Manu  wrote:


On 13 April 2012 17:25, Kagamin  wrote:


once you prefetched the function, it will remain in the icache and be
reused from there the next time.



All depends how much you love object orientation. If you follow the C++
book and make loads of classes for everything, you'll thrash the hell out
of it. If you only have a couple of different object, maybe they'll coexist
in the icache.
The GC is a bit of a special case though because it runs in a tight loop.
That said, the pipeline hazards still exist regardless of the state of
icache.
Conventional virtuals are worse, since during the process of executing
regular code, there's not usually such a tight loop pattern.
(note: I was answering the prior question about virtual functions in
general, not as applied to the specific use case of a GC scan)

The latest 'turbo' ARM chips (Cortex, etc) and such may store a branch
target table, they are alleged to have improved prediction, but I haven't
checked.
Prior chips (standard Arm9 and down, note: most non-top-end androids fall
in this category, and all games consoles with arms in them) don't
technically have branch prediction at all. ARM has conditional execution
bits on instructions, so it can filter opcodes based on the state of the
flags register. This is a cool design for binary branching, or performing
'select' operations, but it can't do anything to help an indirect jump.

Point is, the GC is the most fundamental performance hazard to D, and I
just think it's important to make sure the design is such that it is
possible to write a GC loop which can do its job with generated data tables
if possible, instead of requiring generated marker functions.
It would seem to me that carefully crafted tables of data could technically
perform the same function as marker functions, but without any function
calls... and the performance characteristics may be better/worse for
different architectures.



Is there any reason to assume that the GC is actually using the mark function for 
marking? The mark function is only there to provide the GC with the information it 
needs in order to mark. What it actually is is defined by the runtime. So it may 
just return a bitmask. Or it might _be_ a bitmask (for objects <= 512 bytes) Or 
an 64-bit index into some internal table. Etc. If fact, real functions only start 
making sense with very large arrays and compound structs/objects.

My actual concern with this approach is that a DLL compiled with GC A will have 
problems interacting with a program with GC B.


Re: Precise GC

2012-04-14 Thread Manu
On 13 April 2012 17:25, Kagamin  wrote:
>
> once you prefetched the function, it will remain in the icache and be
> reused from there the next time.
>

All depends how much you love object orientation. If you follow the C++
book and make loads of classes for everything, you'll thrash the hell out
of it. If you only have a couple of different object, maybe they'll coexist
in the icache.
The GC is a bit of a special case though because it runs in a tight loop.
That said, the pipeline hazards still exist regardless of the state of
icache.
Conventional virtuals are worse, since during the process of executing
regular code, there's not usually such a tight loop pattern.
(note: I was answering the prior question about virtual functions in
general, not as applied to the specific use case of a GC scan)

The latest 'turbo' ARM chips (Cortex, etc) and such may store a branch
target table, they are alleged to have improved prediction, but I haven't
checked.
Prior chips (standard Arm9 and down, note: most non-top-end androids fall
in this category, and all games consoles with arms in them) don't
technically have branch prediction at all. ARM has conditional execution
bits on instructions, so it can filter opcodes based on the state of the
flags register. This is a cool design for binary branching, or performing
'select' operations, but it can't do anything to help an indirect jump.

Point is, the GC is the most fundamental performance hazard to D, and I
just think it's important to make sure the design is such that it is
possible to write a GC loop which can do its job with generated data tables
if possible, instead of requiring generated marker functions.
It would seem to me that carefully crafted tables of data could technically
perform the same function as marker functions, but without any function
calls... and the performance characteristics may be better/worse for
different architectures.


Re: Precise GC

2012-04-13 Thread Kagamin

On Friday, 13 April 2012 at 13:54:39 UTC, Manu wrote:

No other processors have branch prediction units anywhere near
the sophistication of modern x86. Any call through a function 
pointer
stalls the pipeline, pipelines are getting longer all the time, 
and PPC has

even more associated costs/hazards.
Most processors can only perform trivial binary branch 
prediction around an

'if'.
It also places burden on the icache (unable to prefetch), and 
of course the
dcache, both of which are much less sophisticated than x86 
aswell.


Allocation of small aggregated objects usually involves 
allocation of several equally small objects of different types in 
a row, so they sit one after another in heap and gc will visit 
them in a row every time calling function different from the 
previous time, so to x86 processor it would result in constant 
misprediction: AFAIK x86 processor caches only one target address 
per branch (ARM caches a flag?). And icache should not suffer in 
both cases: once you prefetched the function, it will remain in 
the icache and be reused from there the next time.


Re: Precise GC

2012-04-13 Thread James Miller
On 2012-04-13 16:54:28 +0300 Manu  wrote:
> While I'm at it. 'final:' and 'virtual' keyword please ;)

Hmmm, I thought we decided that was a good idea, anybody in the know if
this going to happen or not?

--
James Miller


Re: Precise GC

2012-04-13 Thread Manu
On 13 April 2012 15:53, Kagamin  wrote:

> On Sunday, 8 April 2012 at 12:02:10 UTC, Alex Rønne Petersen wrote:
>
>> This sounds important to me. If it is also possible to do the work with
>>> generated tables, and not calling thousands of indirect functions in
>>> someone's implementation, it would be nice to reserve that possibility.
>>> Indirect function calls in hot loops make me very nervous for non-x86
>>> machines.
>>>
>>
>> Yes, I agree here. The last thing we need is a huge amount of
>> kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine on
>> x86, but anywhere else, it's really not what you want in a GC.
>>
>
> What's the problem with virtual calls on ARM?
>

No other processors have branch prediction units anywhere near
the sophistication of modern x86. Any call through a function pointer
stalls the pipeline, pipelines are getting longer all the time, and PPC has
even more associated costs/hazards.
Most processors can only perform trivial binary branch prediction around an
'if'.
It also places burden on the icache (unable to prefetch), and of course the
dcache, both of which are much less sophisticated than x86 aswell.
Compiler can't do anything with code locality (improves icache usage),
since the target is unknown at compile time... there are also pipeline
stalls introduced by the sequence of indirect pointer lookups preceding any
virtual call.
Virtuals are possibly the worst hazard to modern CPU's, and the hardest to
detect/profile, since their cost is evenly spread throughout the entire
code base, you can never gauge their true impact on your performance. You
also can't easily measure the affect of icache misses on your code, suffice
to say, you will have MANY more in virtual-heavy code.

While I'm at it. 'final:' and 'virtual' keyword please ;)


Re: Precise GC

2012-04-13 Thread Kagamin
On Sunday, 8 April 2012 at 12:02:10 UTC, Alex Rønne Petersen 
wrote:
This sounds important to me. If it is also possible to do the 
work with
generated tables, and not calling thousands of indirect 
functions in
someone's implementation, it would be nice to reserve that 
possibility.
Indirect function calls in hot loops make me very nervous for 
non-x86

machines.


Yes, I agree here. The last thing we need is a huge amount of 
kinda-sorta-virtual function calls on ARM, MIPS, etc. It may 
work fine on x86, but anywhere else, it's really not what you 
want in a GC.


What's the problem with virtual calls on ARM?


Re: Precise GC

2012-04-10 Thread Andrei Alexandrescu

On 4/10/12 3:03 AM, deadalnix wrote:

For every type, a function template (let's call it GCscan) will be
instantiated to scan it. This function can be ANY code. ANY code include
the possibility for GCscan!A to call GCscan!B directly, without going
back to GC main loop and indirect call. If inlined, you can forget about
function call at all (and you can force that using mixin template for
example, but it is likely to massively generate code bloat).


That is correct. The code bloat is equal to that generated if the end 
user sat down and wrote by hand appropriate routines for collection for 
all types.



This can't be done for reference/pointer to polymorphic types, but for
any other it is an available option, and it can reduce dramatically the
number of indirect calls.


Indeed. For non-final class member variables, the template will fetch 
their Typeinfo, from that the pointer to the mark function, and will 
issue the call through pointer.



Andrei


Re: Precise GC

2012-04-10 Thread deadalnix

Le 10/04/2012 00:39, Manu a écrit :

It is, and I still don't follow. I can't imagine there are any indirect
function calls, except for the ones introduced by this proposal, where
you may register a function to mark the pointers in complex structs.
You seem to be suggesting that another one already exists anyway? Where
is it? Why is it there?


OK, back to basics.

For every type, a function template (let's call it GCscan) will be 
instantiated to scan it. This function can be ANY code. ANY code include 
the possibility for GCscan!A to call GCscan!B directly, without going 
back to GC main loop and indirect call. If inlined, you can forget about 
function call at all (and you can force that using mixin template for 
example, but it is likely to massively generate code bloat).


This can't be done for reference/pointer to polymorphic types, but for 
any other it is an available option, and it can reduce dramatically the 
number of indirect calls.


Re: Precise GC

2012-04-10 Thread deadalnix

Le 09/04/2012 23:27, Walter Bright a écrit :

On 4/9/2012 11:30 AM, deadalnix wrote:

In the other hand, TLS can be collected independently and only
influence the
thread that own the data. Both are every powerfull improvement, and
the design
you propose « as this » cannot provide any mean to handle that. Which
is a big
missed opportunity, and will be hard to change in the future.


I think this is an orthogonal issue.


You mean an allocator/deallocator one ?

I'm not sure. For instance, concurrent shared memory scanning will 
require some magic on reference changes (it can be hooked into the 
program using page protection). In such a case, you have constraint in 
what the scanning function can do or can't.


If the function is scanning immutable data, such a constraint disappears.

In a similar way, when scanning TLS, you'll want to avoid going into non 
TLS world. This is currently possible only of you go back to main GC 
code and trigger the indirect call every single time you encounter a 
pointer or a reference. This is going to be a performance killer on many 
architecture.


So this code, in a way or another will need to be aware of the 
qualifier. Or it will either require to pass every single 
pointer/reference into an indirect function call, or forget about 
optimizations that the type system has been made to allow (in the 
program in general, not especially in the GC).


Re: Precise GC

2012-04-09 Thread Manu
On 10 April 2012 00:06, deadalnix  wrote:

> Le 09/04/2012 20:33, Manu a écrit :
>
>  Eh?
>> Not sure what you mean. The idea is the template would produce a
>> struct/table of data instead of being a pointer to a function, this way
>> the GC could work without calling anything. If the GC was written to
>> assume GC info in a particular format/structure, it could be written
>> without any calls.
>> I'm just saying to leave that as a possibility, and not REQUIRE an
>> indirect function call for every single allocation in the system. Some
>> GC might be able to make better use of that sort of setup.
>>
>
> If you have reference to objects, you can't avoid a function call. If you
> have something you know at compile time, the generated function can
> directly call the other function that mark the pointed data (or even can do
> it itself, if you don't fear code bloat) without going back to the GC and
> its indirect call.
>
> So it make no difference in the number of indirect calls you have, but the
> struct proposal is a stronger constraint on the GC that the function one.
>
> BTW, starting you answer by « Not sure what you mean. » should have been a
> red flag.
>

It is, and I still don't follow. I can't imagine there are any indirect
function calls, except for the ones introduced by this proposal, where you
may register a function to mark the pointers in complex structs.
You seem to be suggesting that another one already exists anyway? Where is
it? Why is it there?


Re: Precise GC

2012-04-09 Thread Walter Bright

On 4/9/2012 11:30 AM, deadalnix wrote:

In the other hand, TLS can be collected independently and only influence the
thread that own the data. Both are every powerfull improvement, and the design
you propose « as this » cannot provide any mean to handle that. Which is a big
missed opportunity, and will be hard to change in the future.


I think this is an orthogonal issue.


Re: Precise GC

2012-04-09 Thread deadalnix

Le 09/04/2012 20:33, Manu a écrit :

Eh?
Not sure what you mean. The idea is the template would produce a
struct/table of data instead of being a pointer to a function, this way
the GC could work without calling anything. If the GC was written to
assume GC info in a particular format/structure, it could be written
without any calls.
I'm just saying to leave that as a possibility, and not REQUIRE an
indirect function call for every single allocation in the system. Some
GC might be able to make better use of that sort of setup.


If you have reference to objects, you can't avoid a function call. If 
you have something you know at compile time, the generated function can 
directly call the other function that mark the pointed data (or even can 
do it itself, if you don't fear code bloat) without going back to the GC 
and its indirect call.


So it make no difference in the number of indirect calls you have, but 
the struct proposal is a stronger constraint on the GC that the function 
one.


BTW, starting you answer by « Not sure what you mean. » should have been 
a red flag.


Re: Precise GC

2012-04-09 Thread Manu
On 9 April 2012 21:20, deadalnix  wrote:

> Le 08/04/2012 14:02, Alex Rønne Petersen a écrit :
>
>  On 08-04-2012 11:42, Manu wrote:
>>
>>> On 8 April 2012 11:56, Timon Gehr >> > wrote:
>>>
>>> On 04/08/2012 10:45 AM, Timon Gehr wrote:
>>>
>>> That actually sounds like a pretty awesome idea.
>>>
>>>
>>> Make sure that the compiler does not actually rely on the fact that
>>> the template generates a function. The design should include the
>>> possibility of just generating tables. It all should be completely
>>> transparent to the compiler, if that is possible.
>>>
>>>
>>> This sounds important to me. If it is also possible to do the work with
>>> generated tables, and not calling thousands of indirect functions in
>>> someone's implementation, it would be nice to reserve that possibility.
>>> Indirect function calls in hot loops make me very nervous for non-x86
>>> machines.
>>>
>>
>> Yes, I agree here. The last thing we need is a huge amount of
>> kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine
>> on x86, but anywhere else, it's really not what you want in a GC.
>>
>>
> Nothing prevent the generated function to itself call other generated
> functions, when things are predictable. It avoid many indirect calls, and
> purely by lib, which is good (can be tuned for application/plateform).
>

Eh?
Not sure what you mean. The idea is the template would produce a
struct/table of data instead of being a pointer to a function, this way the
GC could work without calling anything. If the GC was written to assume GC
info in a particular format/structure, it could be written without any
calls.
I'm just saying to leave that as a possibility, and not REQUIRE an indirect
function call for every single allocation in the system. Some GC might be
able to make better use of that sort of setup.


Re: Precise GC

2012-04-09 Thread deadalnix

Le 08/04/2012 03:56, Walter Bright a écrit :

Of course, many of us have been thinking about this for a looong time,
and what is the best way to go about it. The usual technique is for the
compiler to emit some sort of table for each TypeInfo giving the layout
of the object, i.e. where the pointers are.

The general problem with these is the table is non-trivial, as it will
require things like iterated data blocks, etc. It has to be compressed
to save space, and the gc then has to execute a fair amount of code to
decode it.

It also requires some significant work on the compiler end, leading of
course to complexity, rigidity, development bottlenecks, and the usual
bugs.

An alternative Andrei and I have been talking about is to put in the
TypeInfo a pointer to a function. That function will contain customized
code to mark the pointers in an instance of that type. That custom code
will be generated by a template defined by the library. All the compiler
has to do is stupidly instantiate the template for the type, and insert
an address to the generated function.

The compiler need know NOTHING about how the marking works.

Even better, as ctRegex has demonstrated, the custom generated code can
be very, very fast compared with a runtime table-driven approach. (The
slow part will be calling the function indirectly.)

And best of all, the design is pushed out of the compiler into the
library, so various schemes can be tried out without needing compiler work.

I think this is an exciting idea, it will enable us to get a precise gc
by enabling people to work on it in parallel rather than serially
waiting for me.


This id a good idea. However, this doesn't handle type qualifiers. And 
this is important !


D2 type system is made in such a way that most data are either thread 
local or immutable, and a small amount is shared. Both thread local 
storage and immutability are source of BIG improvement for the GC. Doing 
without it is a huge design error.


For instance, Ocaml's GC is known to be more performant than Java's. 
Because in Caml, most data are immutable, and the GC take advantage of 
this. Immutability means 100% concurrent garbage collection.


In the other hand, TLS can be collected independently and only influence 
the thread that own the data. Both are every powerfull improvement, and 
the design you propose « as this » cannot provide any mean to handle 
that. Which is a big missed opportunity, and will be hard to change in 
the future.


Re: Precise GC

2012-04-09 Thread deadalnix

Le 08/04/2012 14:02, Alex Rønne Petersen a écrit :

On 08-04-2012 11:42, Manu wrote:

On 8 April 2012 11:56, Timon Gehr mailto:timon.g...@gmx.ch>> wrote:

On 04/08/2012 10:45 AM, Timon Gehr wrote:

That actually sounds like a pretty awesome idea.


Make sure that the compiler does not actually rely on the fact that
the template generates a function. The design should include the
possibility of just generating tables. It all should be completely
transparent to the compiler, if that is possible.


This sounds important to me. If it is also possible to do the work with
generated tables, and not calling thousands of indirect functions in
someone's implementation, it would be nice to reserve that possibility.
Indirect function calls in hot loops make me very nervous for non-x86
machines.


Yes, I agree here. The last thing we need is a huge amount of
kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine
on x86, but anywhere else, it's really not what you want in a GC.



Nothing prevent the generated function to itself call other generated 
functions, when things are predictable. It avoid many indirect calls, 
and purely by lib, which is good (can be tuned for application/plateform).


Re: Precise GC

2012-04-09 Thread Steven Schveighoffer
On Sat, 07 Apr 2012 21:56:09 -0400, Walter Bright  
 wrote:


Of course, many of us have been thinking about this for a looong time,  
and what is the best way to go about it. The usual technique is for the  
compiler to emit some sort of table for each TypeInfo giving the layout  
of the object, i.e. where the pointers are.


The general problem with these is the table is non-trivial, as it will  
require things like iterated data blocks, etc. It has to be compressed  
to save space, and the gc then has to execute a fair amount of code to  
decode it.


It also requires some significant work on the compiler end, leading of  
course to complexity, rigidity, development bottlenecks, and the usual  
bugs.


An alternative Andrei and I have been talking about is to put in the  
TypeInfo a pointer to a function. That function will contain customized  
code to mark the pointers in an instance of that type. That custom code  
will be generated by a template defined by the library. All the compiler  
has to do is stupidly instantiate the template for the type, and insert  
an address to the generated function.


The compiler need know NOTHING about how the marking works.

Even better, as ctRegex has demonstrated, the custom generated code can  
be very, very fast compared with a runtime table-driven approach. (The  
slow part will be calling the function indirectly.)


And best of all, the design is pushed out of the compiler into the  
library, so various schemes can be tried out without needing compiler  
work.


I think this is an exciting idea, it will enable us to get a precise gc  
by enabling people to work on it in parallel rather than serially  
waiting for me.


I think this is a really good idea.

I would like to go further and propose that there be an arbitrary way to  
add members to the TypeInfo types using templates.  Not sure how it would  
be implemented, but I don't see why this has to be specific to GCs.  Some  
way to signify "hey compiler, please initialize this member with template  
X given the type being compiled".


This could be a huge bridge between compile-time and runtime type  
information.


-Steve


Re: Precise GC

2012-04-08 Thread Alex Rønne Petersen

On 08-04-2012 11:42, Manu wrote:

On 8 April 2012 11:56, Timon Gehr mailto:timon.g...@gmx.ch>> wrote:

On 04/08/2012 10:45 AM, Timon Gehr wrote:

That actually sounds like a pretty awesome idea.


Make sure that the compiler does not actually rely on the fact that
the template generates a function. The design should include the
possibility of just generating tables. It all should be completely
transparent to the compiler, if that is possible.


This sounds important to me. If it is also possible to do the work with
generated tables, and not calling thousands of indirect functions in
someone's implementation, it would be nice to reserve that possibility.
Indirect function calls in hot loops make me very nervous for non-x86
machines.


Yes, I agree here. The last thing we need is a huge amount of 
kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine 
on x86, but anywhere else, it's really not what you want in a GC.


--
- Alex


Re: Precise GC

2012-04-08 Thread Alex Rønne Petersen

On 08-04-2012 12:07, Rainer Schuetze wrote:



On 4/8/2012 11:21 AM, Timon Gehr wrote:

On 04/08/2012 10:45 AM, Timon Gehr wrote:

That actually sounds like a pretty awesome idea.


I understand that the stack will still have to be scanned
conservatively, but how does the scheme deal with closures?


I guess the compiler should generate an (anonymous) struct type
corresponding to the closure data layout. There probably has to be a
template for compiler generated structs or classes anyway.

This new type could also be used as the type of the context pointer, so
a debugger could display the closure variables.



This sounds sensible to me. No reason closure marking can't be precise 
if the compiler just emits the relevant type info (pretty much any other 
compiler with closures does this; see C#, F#, etc).


--
- Alex


Re: Precise GC

2012-04-08 Thread Rainer Schuetze



On 4/8/2012 11:21 AM, Timon Gehr wrote:

On 04/08/2012 10:45 AM, Timon Gehr wrote:

That actually sounds like a pretty awesome idea.


I understand that the stack will still have to be scanned
conservatively, but how does the scheme deal with closures?


I guess the compiler should generate an (anonymous) struct type 
corresponding to the closure data layout. There probably has to be a 
template for compiler generated structs or classes anyway.


This new type could also be used as the type of the context pointer, so 
a debugger could display the closure variables.




Re: Precise GC

2012-04-08 Thread Manu
On 8 April 2012 11:56, Timon Gehr  wrote:

> On 04/08/2012 10:45 AM, Timon Gehr wrote:
>
>> That actually sounds like a pretty awesome idea.
>>
>
> Make sure that the compiler does not actually rely on the fact that the
> template generates a function. The design should include the possibility of
> just generating tables. It all should be completely transparent to the
> compiler, if that is possible.
>

This sounds important to me. If it is also possible to do the work with
generated tables, and not calling thousands of indirect functions in
someone's implementation, it would be nice to reserve that possibility.
Indirect function calls in hot loops make me very nervous for non-x86
machines.


Re: Precise GC

2012-04-08 Thread Walter Bright

On 4/8/2012 2:21 AM, Timon Gehr wrote:

On 04/08/2012 10:45 AM, Timon Gehr wrote:

That actually sounds like a pretty awesome idea.


I understand that the stack will still have to be scanned conservatively, but
how does the scheme deal with closures?


For now, just treat them conservatively.


Re: Precise GC

2012-04-08 Thread Timon Gehr

On 04/08/2012 10:45 AM, Timon Gehr wrote:

That actually sounds like a pretty awesome idea.


I understand that the stack will still have to be scanned 
conservatively, but how does the scheme deal with closures?


Re: Precise GC

2012-04-08 Thread Timon Gehr

On 04/08/2012 10:45 AM, Timon Gehr wrote:

That actually sounds like a pretty awesome idea.


Make sure that the compiler does not actually rely on the fact that the 
template generates a function. The design should include the possibility 
of just generating tables. It all should be completely transparent to 
the compiler, if that is possible.


Re: Precise GC

2012-04-08 Thread Timon Gehr

On 04/08/2012 03:56 AM, Walter Bright wrote:

Of course, many of us have been thinking about this for a looong time,
and what is the best way to go about it. The usual technique is for the
compiler to emit some sort of table for each TypeInfo giving the layout
of the object, i.e. where the pointers are.

The general problem with these is the table is non-trivial, as it will
require things like iterated data blocks, etc. It has to be compressed
to save space, and the gc then has to execute a fair amount of code to
decode it.

It also requires some significant work on the compiler end, leading of
course to complexity, rigidity, development bottlenecks, and the usual
bugs.

An alternative Andrei and I have been talking about is to put in the
TypeInfo a pointer to a function. That function will contain customized
code to mark the pointers in an instance of that type. That custom code
will be generated by a template defined by the library. All the compiler
has to do is stupidly instantiate the template for the type, and insert
an address to the generated function.

The compiler need know NOTHING about how the marking works.

Even better, as ctRegex has demonstrated, the custom generated code can
be very, very fast compared with a runtime table-driven approach. (The
slow part will be calling the function indirectly.)

And best of all, the design is pushed out of the compiler into the
library, so various schemes can be tried out without needing compiler work.

I think this is an exciting idea, it will enable us to get a precise gc
by enabling people to work on it in parallel rather than serially
waiting for me.


That actually sounds like a pretty awesome idea.


Re: Precise GC

2012-04-08 Thread Sean Kelly
On Apr 7, 2012, at 6:56 PM, Walter Bright  wrote:

> Of course, many of us have been thinking about this for a looong time, and 
> what is the best way to go about it. The usual technique is for the compiler 
> to emit some sort of table for each TypeInfo giving the layout of the object, 
> i.e. where the pointers are.
> 
> The general problem with these is the table is non-trivial, as it will 
> require things like iterated data blocks, etc. It has to be compressed to 
> save space, and the gc then has to execute a fair amount of code to decode it.
> 
> It also requires some significant work on the compiler end, leading of course 
> to complexity, rigidity, development bottlenecks, and the usual bugs.
> 
> An alternative Andrei and I have been talking about is to put in the TypeInfo 
> a pointer to a function. That function will contain customized code to mark 
> the pointers in an instance of that type. That custom code will be generated 
> by a template defined by the library. All the compiler has to do is stupidly 
> instantiate the template for the type, and insert an address to the generated 
> function.
> 
> The compiler need know NOTHING about how the marking works.
> 
> Even better, as ctRegex has demonstrated, the custom generated code can be 
> very, very fast compared with a runtime table-driven approach. (The slow part 
> will be calling the function indirectly.)
> 
> And best of all, the design is pushed out of the compiler into the library, 
> so various schemes can be tried out without needing compiler work.
> 
> I think this is an exciting idea, it will enable us to get a precise gc by 
> enabling people to work on it in parallel rather than serially waiting for me.

With __traits and such, I kind of always figured we'd go this way. There's 
simply no reason to have the compiler generate a map. Glad to see it's working 
out. 

Re: Precise GC

2012-04-07 Thread Walter Bright

On 4/7/2012 7:58 PM, Chad J wrote:

Hey, that sounds awesome. I think I geeked out a bit.

Would this make it any easier to reference count types that can be statically
proven to have no cyclical references?



It has nothing to do with reference counting that I can think of.


Re: Precise GC

2012-04-07 Thread Chad J

Hey, that sounds awesome.  I think I geeked out a bit.

Would this make it any easier to reference count types that can be 
statically proven to have no cyclical references?




Re: Precise GC

2012-04-07 Thread Andrei Alexandrescu

On 4/7/12 9:49 PM, Andrei Alexandrescu wrote:

On 4/7/12 8:56 PM, Walter Bright wrote:
[snip]

I think this is an exciting idea, it will enable us to get a precise gc
by enabling people to work on it in parallel rather than serially
waiting for me.


I'm also very excited about this design, and will make time to help with
the library part of the implementation.

Maybe we can get a GSoC project on that. We already have a related
proposal (lock-free GC).


BTW one exciting thing about both this and the nascent attributes design 
is they integrate wonderful with language's generic and generative 
capabilities.


Andrei



Re: Precise GC

2012-04-07 Thread Froglegs

 That sounds cool, perhaps people can have customizable GC for
specific applications?

Looking forward to D having a precise GC


Re: Precise GC

2012-04-07 Thread Andrei Alexandrescu

On 4/7/12 8:56 PM, Walter Bright wrote:
[snip]

I think this is an exciting idea, it will enable us to get a precise gc
by enabling people to work on it in parallel rather than serially
waiting for me.


I'm also very excited about this design, and will make time to help with 
the library part of the implementation.


Maybe we can get a GSoC project on that. We already have a related 
proposal (lock-free GC).



Andrei


Re: Precise GC

2012-04-07 Thread Andrei Alexandrescu

On 4/7/12 9:59 PM, Walter Bright wrote:

On 4/7/2012 7:58 PM, Chad J wrote:

Hey, that sounds awesome. I think I geeked out a bit.

Would this make it any easier to reference count types that can be
statically
proven to have no cyclical references?



It has nothing to do with reference counting that I can think of.


Nevertheless good food for thought. This is all very cool.

Andrei