Re: Potential GSoC project - GC improvements

2016-03-09 Thread Joakim via Digitalmars-d

On Wednesday, 9 March 2016 at 22:49:28 UTC, Jeremy DeHaan wrote:

Hey all,

I'm trying to think of good project ideas for this years GSoC, 
and one in particular I thought would be a great was working on 
and improving the GC. I'm not sure what the scope of this 
project would be like, but at the moment I am thinking writing 
a generational collector would be a good place to start.


I have more ideas, but I don't have a proposal yet. I just 
wanted some initial feedback on the idea, perhaps some advice 
for scope with the time frame in mind, and hopefully someone on 
the mentor list willing to take this on if it were to be 
accepted.


I'd think working on the GC would make a great GSoC project, as 
it's needed and fairly self-contained while having large impact.  
Perhaps someone could build off of Sociomantic's concurrent GC 
(https://www.sociomantic.com/blog/2013/06/porting-cdgc-to-d2/), 
which I assume has been ported to D2, porting it to Windows or 
whatever else remains to be done and then adding to it.


There were some good ideas in this thread from a couple years ago 
too:


http://forum.dlang.org/thread/ucsamkqvrihfuspnx...@forum.dlang.org


Re: Potential GSoC project - GC improvements

2016-03-09 Thread Adam Wilson via Digitalmars-d

Jeremy DeHaan wrote:

Hey all,

I'm trying to think of good project ideas for this years GSoC, and one
in particular I thought would be a great was working on and improving
the GC. I'm not sure what the scope of this project would be like, but
at the moment I am thinking writing a generational collector would be a
good place to start.

I have more ideas, but I don't have a proposal yet. I just wanted some
initial feedback on the idea, perhaps some advice for scope with the
time frame in mind, and hopefully someone on the mentor list willing to
take this on if it were to be accepted.

 Jeremy


The follow-on projects enabled by a precise GC would be quite useful...

I'd be interested in mentoring if nobody else with more experience is 
interested.


--
// Adam Wilson
// quiet.dlang.dev


Re: Potential GSoC project - GC improvements

2016-03-09 Thread NX via Digitalmars-d

On Wednesday, 9 March 2016 at 22:49:28 UTC, Jeremy DeHaan wrote:

Hey all,

I'm trying to think of good project ideas for this years GSoC, 
and one in particular I thought would be a great was working on 
and improving the GC. I'm not sure what the scope of this 
project would be like, but at the moment I am thinking writing 
a generational collector would be a good place to start.


http://www.infognition.com/blog/2014/the_real_problem_with_gc_in_d.html

Good luck with generational GC...

I think the best possible improvement for GC is making it 
lock-free. Currently, GC lock cause some serious performance 
penalties for multithreaded code when frequent allocations take 
place.


Re: Potential GSoC project - GC improvements

2016-03-10 Thread thedeemon via Digitalmars-d

On Thursday, 10 March 2016 at 05:01:37 UTC, Joakim wrote:
Perhaps someone could build off of Sociomantic's concurrent GC 
(https://www.sociomantic.com/blog/2013/06/porting-cdgc-to-d2/), 
which I assume has been ported to D2, porting it to Windows or 
whatever else remains to be done and then adding to it.


Do you think Windows version of this cdgc is feasible at all?
It relies on fork, something that gets emulated on Windows but 
too slowly, as I understand.




Re: Potential GSoC project - GC improvements

2016-03-10 Thread Joakim via Digitalmars-d

On Thursday, 10 March 2016 at 11:12:47 UTC, thedeemon wrote:

On Thursday, 10 March 2016 at 05:01:37 UTC, Joakim wrote:
Perhaps someone could build off of Sociomantic's concurrent GC 
(https://www.sociomantic.com/blog/2013/06/porting-cdgc-to-d2/), which I assume has been ported to D2, porting it to Windows or whatever else remains to be done and then adding to it.


Do you think Windows version of this cdgc is feasible at all?
It relies on fork, something that gets emulated on Windows but 
too slowly, as I understand.


I don't know enough about GCs and Windows to say.  I'm assuming 
it can be done with some effort, but I don't know how much.


Even if it's too much work, it'd be worth it just for POSIX 
systems, which make up almost 70% of all general-purpose 
computers running apps these days. :)


If it's good enough, it could eventually become the default on 
there, while leaving the legacy GC for Windows.


Re: Potential GSoC project - GC improvements

2016-03-10 Thread Andrei Alexandrescu via Digitalmars-d

On 3/9/16 10:40 PM, NX wrote:

I think the best possible improvement for GC is making it lock-free.
Currently, GC lock cause some serious performance penalties for
multithreaded code when frequent allocations take place.


I agree. A first step would be easy to do with std.allocator's 
thread-local freelists. -- Andrei




Re: Potential GSoC project - GC improvements

2016-03-10 Thread Jeremy DeHaan via Digitalmars-d
On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu 
wrote:

On 3/9/16 10:40 PM, NX wrote:
I think the best possible improvement for GC is making it 
lock-free.

Currently, GC lock cause some serious performance penalties for
multithreaded code when frequent allocations take place.


I agree. A first step would be easy to do with std.allocator's 
thread-local freelists. -- Andrei


I was looking into this, but I am slightly hesitant. Should the 
gc use something in std.experimental? Or should we think that's 
ok?


I also know that there are some people that think we should avoid 
using Phobos in druntime.


Re: Potential GSoC project - GC improvements

2016-03-10 Thread ZombineDev via Digitalmars-d

On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:
On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu 
wrote:

On 3/9/16 10:40 PM, NX wrote:
I think the best possible improvement for GC is making it 
lock-free.
Currently, GC lock cause some serious performance penalties 
for

multithreaded code when frequent allocations take place.


I agree. A first step would be easy to do with std.allocator's 
thread-local freelists. -- Andrei


I was looking into this, but I am slightly hesitant. Should the 
gc use something in std.experimental? Or should we think that's 
ok?


I also know that there are some people that think we should 
avoid using Phobos in druntime.


There's no problem in using std.experimental.* internally, 
because if the API changes, the one who changes the API will also 
need to change all internal usages, otherwise his pull request 
won't compile.


About using Phobos in Druntime - you can't directly import 
Phobos, since it's files are not present when Druntime is built. 
The solution is to copy the stuff you need to 
https://github.com/D-Programming-Language/druntime/tree/master/src/rt/util or some place like this.


Re: Potential GSoC project - GC improvements

2016-03-10 Thread Daniel Kozak via Digitalmars-d



Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):

On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:

On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu wrote:

On 3/9/16 10:40 PM, NX wrote:

I think the best possible improvement for GC is making it lock-free.
Currently, GC lock cause some serious performance penalties for
multithreaded code when frequent allocations take place.


I agree. A first step would be easy to do with std.allocator's 
thread-local freelists. -- Andrei


I was looking into this, but I am slightly hesitant. Should the gc 
use something in std.experimental? Or should we think that's ok?


I also know that there are some people that think we should avoid 
using Phobos in druntime.


There's no problem in using std.experimental.* internally, because if 
the API changes, the one who changes the API will also need to change 
all internal usages, otherwise his pull request won't compile.


About using Phobos in Druntime - you can't directly import Phobos, 
since it's files are not present when Druntime is built.
But we allready need existing D compiler to build D frontend, so maybe 
we can use this full D compiler for building druntime in the future?


Re: Potential GSoC project - GC improvements

2016-03-10 Thread ZombineDev via Digitalmars-d

On Thursday, 10 March 2016 at 17:46:15 UTC, Daniel Kozak wrote:



Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):
On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan 
wrote:
On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei 
Alexandrescu wrote:

On 3/9/16 10:40 PM, NX wrote:
I think the best possible improvement for GC is making it 
lock-free.
Currently, GC lock cause some serious performance penalties 
for

multithreaded code when frequent allocations take place.


I agree. A first step would be easy to do with 
std.allocator's thread-local freelists. -- Andrei


I was looking into this, but I am slightly hesitant. Should 
the gc use something in std.experimental? Or should we think 
that's ok?


I also know that there are some people that think we should 
avoid using Phobos in druntime.


There's no problem in using std.experimental.* internally, 
because if the API changes, the one who changes the API will 
also need to change all internal usages, otherwise his pull 
request won't compile.


About using Phobos in Druntime - you can't directly import 
Phobos, since it's files are not present when Druntime is 
built.
But we allready need existing D compiler to build D frontend, 
so maybe we can use this full D compiler for building druntime 
in the future?


The build process looks something like this currently:
(a bit simplified)
1. Download old dmd.
2. Build new dmd with 1.
3. Build druntime with 2.
4. Build phobos with 2 and link it with 3.
5. Package 2, 3 and 4 into a release zip

You can't use 1 for building 3 (or 4), because 3 (or 4) may 
depend on a new feature only present in 2. Also you can't rely on 
5 for building 3 (or 4) because you'll get a cyclic dependency.




Re: Potential GSoC project - GC improvements

2016-03-10 Thread Daniel Kozak via Digitalmars-d



Dne 10.3.2016 v 19:39 ZombineDev via Digitalmars-d napsal(a):

On Thursday, 10 March 2016 at 17:46:15 UTC, Daniel Kozak wrote:



Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):

On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:

On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu wrote:

On 3/9/16 10:40 PM, NX wrote:

I think the best possible improvement for GC is making it lock-free.
Currently, GC lock cause some serious performance penalties for
multithreaded code when frequent allocations take place.


I agree. A first step would be easy to do with std.allocator's 
thread-local freelists. -- Andrei


I was looking into this, but I am slightly hesitant. Should the gc 
use something in std.experimental? Or should we think that's ok?


I also know that there are some people that think we should avoid 
using Phobos in druntime.


There's no problem in using std.experimental.* internally, because 
if the API changes, the one who changes the API will also need to 
change all internal usages, otherwise his pull request won't compile.


About using Phobos in Druntime - you can't directly import Phobos, 
since it's files are not present when Druntime is built.
But we allready need existing D compiler to build D frontend, so 
maybe we can use this full D compiler for building druntime in the 
future?


The build process looks something like this currently:
(a bit simplified)
1. Download old dmd.
2. Build new dmd with 1.
3. Build druntime with 2.
4. Build phobos with 2 and link it with 3.
5. Package 2, 3 and 4 into a release zip

You can't use 1 for building 3 (or 4), because 3 (or 4) may depend on 
a new feature only present in 2. Also you can't rely on 5 for building 
3 (or 4) because you'll get a cyclic dependency.


I know how it works now :). But if D is stable enought  1 should be good 
enought to do 3 and 4 (yes you can't rely on features and fixes from 2). 
I think we should avoid to use latest features when developing dmd, 
druntime and phobos (maybe we should use only some subset or make some 
rules like new version must be compilable by 3 previous version of dmd)


Re: Potential GSoC project - GC improvements

2016-03-10 Thread ZombineDev via Digitalmars-d

On Thursday, 10 March 2016 at 18:49:28 UTC, Daniel Kozak wrote:



Dne 10.3.2016 v 19:39 ZombineDev via Digitalmars-d napsal(a):

On Thursday, 10 March 2016 at 17:46:15 UTC, Daniel Kozak wrote:



Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):
On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan 
wrote:
On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei 
Alexandrescu wrote:

On 3/9/16 10:40 PM, NX wrote:
I think the best possible improvement for GC is making it 
lock-free.
Currently, GC lock cause some serious performance 
penalties for

multithreaded code when frequent allocations take place.


I agree. A first step would be easy to do with 
std.allocator's thread-local freelists. -- Andrei


I was looking into this, but I am slightly hesitant. Should 
the gc use something in std.experimental? Or should we 
think that's ok?


I also know that there are some people that think we should 
avoid using Phobos in druntime.


There's no problem in using std.experimental.* internally, 
because if the API changes, the one who changes the API will 
also need to change all internal usages, otherwise his pull 
request won't compile.


About using Phobos in Druntime - you can't directly import 
Phobos, since it's files are not present when Druntime is 
built.
But we allready need existing D compiler to build D frontend, 
so maybe we can use this full D compiler for building 
druntime in the future?


The build process looks something like this currently:
(a bit simplified)
1. Download old dmd.
2. Build new dmd with 1.
3. Build druntime with 2.
4. Build phobos with 2 and link it with 3.
5. Package 2, 3 and 4 into a release zip

You can't use 1 for building 3 (or 4), because 3 (or 4) may 
depend on a new feature only present in 2. Also you can't rely 
on 5 for building 3 (or 4) because you'll get a cyclic 
dependency.


I know how it works now :). But if D is stable enought  1 
should be good enought to do 3 and 4 (yes you can't rely on 
features and fixes from 2). I think we should avoid to use 
latest features when developing dmd, druntime and phobos (maybe 
we should use only some subset or make some rules like new 
version must be compilable by 3 previous version of dmd)


Well for 4. maybe, but after a couple of years... Most of the 
time Phobos can't be compiled with an older version of DMD not 
because it relies on new features, but because there are bugs in 
the older version of DMD.
As for 3. - druntime is very tightly coupled with the compiler. 
Quite often changes in the compiler rely on changes in druntime - 
e.g. the new DWARF exceptions, C++ and Objective-C support, 
better code coverage and GC profiling, things in object.d, 
built-in AAs and arrays, linking with MSVC 2015 libc, shared 
libraries, TLS, etc.
On the other hand, I think that using Phobos in DMDFE is a good 
idea and is far more likely to happen.


Re: Potential GSoC project - GC improvements

2016-03-11 Thread Jeremy DeHaan via Digitalmars-d

Thank you all for the feedback.

I think I might still need a little more feedback as to what the 
project should actually entail, but here's what it's looking like 
so far:


Implement lock free allocation using std.experimental.allocator's 
freelists (SharedFreeList? It was the only thing in the 
documentation that specifically mentions lock free allocation)


Implement a generational garbage collector

Implement precise garbage collection (possibly piggybacking off 
of Rainer's work)



Does anyone think that might be too much to take on or does it 
sound pretty good? I have other garbage collector things I'd like 
to explore, but I they should probably go in the "if time allows" 
category at this point.


   Jeremy


Re: Potential GSoC project - GC improvements

2016-03-11 Thread Rikki Cattermole via Digitalmars-d

On 12/03/16 4:13 AM, Jeremy DeHaan wrote:

Thank you all for the feedback.

I think I might still need a little more feedback as to what the project
should actually entail, but here's what it's looking like so far:

Implement lock free allocation using std.experimental.allocator's
freelists (SharedFreeList? It was the only thing in the documentation
that specifically mentions lock free allocation)

Implement a generational garbage collector

Implement precise garbage collection (possibly piggybacking off of
Rainer's work)


Does anyone think that might be too much to take on or does it sound
pretty good? I have other garbage collector things I'd like to explore,
but I they should probably go in the "if time allows" category at this
point.

Jeremy


Ugh I would say that is a lot.
So here is what I would do, have two lists of goals.
One where you would happy if you got done the other if you are able to 
do it, you will.


That way your prioritize what you want to get done and not the others.


Re: Potential GSoC project - GC improvements

2016-03-12 Thread Adam Wilson via Digitalmars-d

Jeremy DeHaan wrote:

Thank you all for the feedback.

I think I might still need a little more feedback as to what the project
should actually entail, but here's what it's looking like so far:

Implement lock free allocation using std.experimental.allocator's
freelists (SharedFreeList? It was the only thing in the documentation
that specifically mentions lock free allocation)

Implement a generational garbage collector

Implement precise garbage collection (possibly piggybacking off of
Rainer's work)


Does anyone think that might be too much to take on or does it sound
pretty good? I have other garbage collector things I'd like to explore,
but I they should probably go in the "if time allows" category at this
point.

Jeremy


If I may make a suggestion. The lock free work is unlikely to require 
the entirety of GSoC. And the precise GC is the next most important 
thing on your list and will have the biggest impact on GC performance.


Once the GC is fully precise we can implement a fully compacting GC, 
which improves the usefulness of generational collection. Additionally, 
precision allows a significant amount of work to be done in improving 
the performance of the GC in multi-threaded scenarios. It should be 
quite possible to avoid needing fork() or anything like it altogether. I 
know that the .NET GC doesn't need to use anything like it.


Also, I would strongly recommend getting this book and reading it cover 
to cover before starting:

http://www.amazon.com/gp/product/1420082795/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1944687562&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0471941484&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=0QD9X3E5QATSBCBT6BMM

I think Rainer's Precise GC only dealt with Heap objects. What needs 
work is Register and Stack scanning. Expanding on Rainer's existing 
Precise GC work is the right idea, but Register and Stack scanning is a 
very big project in it's own right.I suspect it will take up the 
remainder of your GSoC time. :)


--
// Adam Wilson
// quiet.dlang.dev


Re: Potential GSoC project - GC improvements

2016-03-12 Thread Chris Wright via Digitalmars-d
On Sat, 12 Mar 2016 00:50:06 -0800, Adam Wilson wrote:
> If I may make a suggestion. The lock free work is unlikely to require
> the entirety of GSoC. And the precise GC is the next most important
> thing on your list and will have the biggest impact on GC performance.
> 
> Once the GC is fully precise we can implement a fully compacting GC,

Unions mean we can't be entirely precise, but we can get pretty close. 
Stack maps and type pointer maps are entirely separate efforts, and type 
pointer maps will be easier to implement, so the first pass will probably 
only include type pointer maps.

Switching to a compacting GC is risky. It's a transparent change if 
you're only working with D code, but if you interface to other languages, 
it breaks things horribly.

> which improves the usefulness of generational collection. Additionally,
> precision allows a significant amount of work to be done in improving
> the performance of the GC in multi-threaded scenarios.

Primarily because less memory needs to be scanned on average, so less 
work needs to be done and GC pauses are shorter. There's nothing specific 
to multithreading there.

> It should be
> quite possible to avoid needing fork() or anything like it altogether.

fork() is used to avoid pausing the program at all during the mark phase.

> I know that the .NET GC doesn't need to use anything like it.

The .NET GC stops all managed threads during a collection.

During a nursery or gen 1 collection, on average it's probably fast 
enough that forking wouldn't give that much benefit. During a full 
collection, if your application is latency-sensitive, a forking strategy 
would be better.

But .NET can't do that because Windows doesn't have a fork() system call, 
and it's expensive to emulate.


Re: Potential GSoC project - GC improvements

2016-03-12 Thread Jeremy DeHaan via Digitalmars-d

On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:
If I may make a suggestion. The lock free work is unlikely to 
require the entirety of GSoC. And the precise GC is the next 
most important thing on your list and will have the biggest 
impact on GC performance.


Rainer has two different precise GC's in pull requests right now 
and both are slower than the current one unless there are false 
pointers.  I would expect anything I come up with to largely act 
the same. The reason I keep pushing for a generational garbage 
collector is because I think it would be where we would see the 
most benefit in terms of general performance.



Once the GC is fully precise we can implement a fully 
compacting GC, which improves the usefulness of generational 
collection. Additionally, precision allows a significant amount 
of work to be done in improving the performance of the GC in 
multi-threaded scenarios. It should be quite possible to avoid 
needing fork() or anything like it altogether. I know that the 
.NET GC doesn't need to use anything like it.


A compacting GC could be built on top of a generational GC 
without much difficulty I would think, if we wanted to go that 
route. The compaction would just happen as part of a collection 
cycle when things are moved into the next generation. I have 
concerns about doing any compaction though, mostly because D can 
have both references and pointers to objects, and I don't know 
how we would want to go about updating pointers. Especially since 
pointers to D objects can exists in C and C++ code.


Another reason I want to work on a generational GC is because 
this can also lead into a concurrent GC without trying to emulate 
fork() on windows. The .Net GC has 3 generations with the last 
one having its collections running concurrently because it is 
unlikely to affect anything else going on. They don't bother 
running the other generations concurrently because their 
collections are really short. We could do something similar.


Perhaps someone more intimate with GC's than I am can speak up, 
but I think that a generational GC would be the best use of time 
in relation to performance gains. Other things can then be 
implemented on top of it later.



Also, I would strongly recommend getting this book and reading 
it cover to cover before starting:

http://www.amazon.com/gp/product/1420082795/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1944687562&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0471941484&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=0QD9X3E5QATSBCBT6BMM


Thank you for the link to the book. I was planning on this one 
http://www.amazon.com/gp/product/0471941484/ , but maybe I will 
buy them both.





Re: Potential GSoC project - GC improvements

2016-03-12 Thread Adam Wilson via Digitalmars-d

Jeremy DeHaan wrote:

On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:

If I may make a suggestion. The lock free work is unlikely to require
the entirety of GSoC. And the precise GC is the next most important
thing on your list and will have the biggest impact on GC performance.


Rainer has two different precise GC's in pull requests right now and
both are slower than the current one unless there are false pointers.  I
would expect anything I come up with to largely act the same. The reason
I keep pushing for a generational garbage collector is because I think
it would be where we would see the most benefit in terms of general
performance.



I would contend that it's not quite that simple. Of course the precise 
GC is slower, it's scanning more stuff. The current conservative GC by 
definition can't figure out large chunks of memory so it won't even 
bother scanning them. What it does to those things it can't scan. And 
what it can't scan it has to pin, which means the memory can't be moved. 
You can't build a fully generational GC until you can move everything. 
They talk about this problem in the books. However, once you have a 
precise GC, a fully moving GC becomes possible and only then do the 
performance benefits of a generational GC become realized. I strongly 
suspect that a partially moving GC will perform worse than the precise 
GC. And I know for a fact that it will suffer enormously from 
fragmentation and Out-of-Memory issues. Precision is the basis from 
which all higher order GC functions flow.





Once the GC is fully precise we can implement a fully compacting GC,
which improves the usefulness of generational collection.
Additionally, precision allows a significant amount of work to be done
in improving the performance of the GC in multi-threaded scenarios. It
should be quite possible to avoid needing fork() or anything like it
altogether. I know that the .NET GC doesn't need to use anything like it.


A compacting GC could be built on top of a generational GC without much
difficulty I would think, if we wanted to go that route. The compaction
would just happen as part of a collection cycle when things are moved
into the next generation. I have concerns about doing any compaction
though, mostly because D can have both references and pointers to
objects, and I don't know how we would want to go about updating
pointers. Especially since pointers to D objects can exists in C and C++
code.



Erm, a generational GC is, in practice, a moving GC as it must move the 
blobs around from the various heaps. The commonly used method is 
Stop-and-Copy.


As for the C/C++ code problem various solutions have been proposed here. 
For one, D knows the difference between the two, so pinning is possible 
there. Or a separate heap space can be used. AFIAK each OS allows you to 
specify multiple heaps, so you could specify an unmanaged heap to go 
along with the managed heaps. IIRC, C++/CLI does something similar on 
Windows. This would be the preferred solution IMO. (Look into HeapCreate 
on Windows for example.)



Another reason I want to work on a generational GC is because this can
also lead into a concurrent GC without trying to emulate fork() on
windows. The .Net GC has 3 generations with the last one having its
collections running concurrently because it is unlikely to affect
anything else going on. They don't bother running the other generations
concurrently because their collections are really short. We could do
something similar.



A concurrent GC without precision is mostly useless because the GC roots 
are primarily derived from the stack which the thread was spawned and 
the stack roots are not scanned in a conservative GC. Additionally, the 
Async/Await pattern is not practical without a precise GC.



Perhaps someone more intimate with GC's than I am can speak up, but I
think that a generational GC would be the best use of time in relation
to performance gains. Other things can then be implemented on top of it
later.



The problem is that performance is the most obvious problem with the GC, 
but not the most important problem, or most useful. Precision enables a 
fully moving, fully generational, fully concurrent, GC. It also enables 
other major features that have nothing to do with GC.





Also, I would strongly recommend getting this book and reading it
cover to cover before starting:
http://www.amazon.com/gp/product/1420082795/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1944687562&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0471941484&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=0QD9X3E5QATSBCBT6BMM



Thank you for the link to the book. I was planning on this one
http://www.amazon.com/gp/product/0471941484/ , but maybe I will buy them
both.




Obviously you can do whatever you want. :) And I'll admit that precision 
isn't sexy, and it won't improve performance right away, but it opens 
the door way to further work that will provide those things. It's hard 
and thankless work, but you'll be bet

Re: Potential GSoC project - GC improvements

2016-03-12 Thread Chris Wright via Digitalmars-d
On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote:

To start off, let's talk terminology. You seem to be using nonstandard 
terminology and possibly misunderstanding standard terminology.

A GC scan is the mark phase of a mark/sweep collector (and specifically 
the part where the GC examines roots and allocated memory, if that 
matters). The GC searches for pointers to GC memory. Simple GCs only need 
to record whether there is a pointer to a given allocation, not where 
those pointers are coming from.

A conservative GC is one that will encounter some things that aren't 
pointers but must be scanned as if they were pointers. In D, for 
instance, a union between a pointer and a size_t forces collectors to be 
conservative.

A false pointer is something that a conservative GC must treat as a 
pointer and appears to point to an object allocated by that GC. For 
instance, unions in D allow us to create false pointers.

A precise GC is a GC that never mistakes a non-pointer for a pointer.

A moving GC is a GC that can relocate objects.

A generational GC is one that can perform a collection on part of the 
heap without dealing with the full heap. Those parts tend to be organized 
by how long an object has survived. This does not require a precise or 
moving GC, though it works better with a precise, moving GC.

A write barrier is a way to record which sections of memory have been 
altered since the last GC collection. This lets the GC scan only the 
changed data -- useful to any GC, especially useful to generational ones.

A "partially moving" GC does not exist, as far as I know.

> Jeremy DeHaan wrote:
>> On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:
>>> If I may make a suggestion. The lock free work is unlikely to require
>>> the entirety of GSoC. And the precise GC is the next most important
>>> thing on your list and will have the biggest impact on GC performance.
>>
>> Rainer has two different precise GC's in pull requests right now and
>> both are slower than the current one unless there are false pointers. 
>> I would expect anything I come up with to largely act the same. The
>> reason I keep pushing for a generational garbage collector is because I
>> think it would be where we would see the most benefit in terms of
>> general performance.
>>
>>
> I would contend that it's not quite that simple. Of course the precise
> GC is slower, it's scanning more stuff.

Precise GCs are slower because they must reference some data telling them 
which items on the stack and in allocations represent pointers and which 
do not. This data takes up cache space, and having less cache for memory 
being scanned slows things down. Correlating two pieces of data takes 
extra time, too.

The hope with a precise GC is to avoid false pointers. A false pointer to 
an object that holds references to many other objects can be costly, both 
in GC time and in excess memory usage.

In D, we have a tiny bit of type information surfaced to the GC to reduce 
the incidence of false pointers. We could additionally take advantage of 
the fact that numbers tend to be small and people tend to use 4 to 8 byte 
numbers: we could arrange our address space to avoid ranges where false 
pointers are likely. But that's probably unnecessary.

> The current conservative GC by
> definition can't figure out large chunks of memory so it won't even
> bother scanning them.

It must scan anything that might be a pointer. A precise GC reduces the 
number of things that might be pointers but aren't to zero. Therefore a 
precise GC scans less.

If a conservative GC didn't scan something that might be a pointer, it 
would sometimes free an object that still had live references to it. This 
would result in memory corruption or segmentation faults.

> What it does to those things it can't scan. And
> what it can't scan it has to pin, which means the memory can't be moved.

If a moving GC cannot precisely identify all pointers to an object, it is 
not allowed to move that object. If it can, it is allowed to move that 
object.

A conservative GC can be a moving GC. It must classify things as 
"definitely pointer", "maybe pointer", and "not pointer". It cannot move 
an object with a "maybe pointer" pointing to it, but it can move an 
object with several "definitely pointers" pointing to it.

In D, thanks to unions, we can never create a fully precise collector, 
but we can create a moving collector. Objects pointed to from a union 
can't be moved, but few objects will be pointed to from a union.

> You can't build a fully generational GC until you can move everything.

You can build a more efficient generational GC if you can move objects. 
However, it's not impossible to build a non-moving generational GC. 
Instead of having large blocks of address space for each generation, you 
would have each Pool indicate its generation. Then you have to find the 
Pool for a pointer and check that flag.

That's comparatively slow, but it'd still be faste

Re: Potential GSoC project - GC improvements

2016-03-13 Thread Dmitry Olshansky via Digitalmars-d

On 11-Mar-2016 18:13, Jeremy DeHaan wrote:

Thank you all for the feedback.

I think I might still need a little more feedback as to what the project
should actually entail, but here's what it's looking like so far:

Implement lock free allocation using std.experimental.allocator's
freelists (SharedFreeList? It was the only thing in the documentation
that specifically mentions lock free allocation)

Implement a generational garbage collector

Implement precise garbage collection (possibly piggybacking off of
Rainer's work)




Let's not forget another low-hanging fruit which is parallel marking.
Regardless of the kind of collector we use parallel marking is a must 
have in our multicore world, and should be stright-forward to implement 
in D.



Does anyone think that might be too much to take on or does it sound
pretty good? I have other garbage collector things I'd like to explore,
but I they should probably go in the "if time allows" category at this
point.

Jeremy



--
Dmitry Olshansky


Re: Potential GSoC project - GC improvements

2016-03-13 Thread Adam Wilson via Digitalmars-d

Chris Wright wrote:

On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote:

To start off, let's talk terminology. You seem to be using nonstandard
terminology and possibly misunderstanding standard terminology.

A GC scan is the mark phase of a mark/sweep collector (and specifically
the part where the GC examines roots and allocated memory, if that
matters). The GC searches for pointers to GC memory. Simple GCs only need
to record whether there is a pointer to a given allocation, not where
those pointers are coming from.

A conservative GC is one that will encounter some things that aren't
pointers but must be scanned as if they were pointers. In D, for
instance, a union between a pointer and a size_t forces collectors to be
conservative.

A false pointer is something that a conservative GC must treat as a
pointer and appears to point to an object allocated by that GC. For
instance, unions in D allow us to create false pointers.

A precise GC is a GC that never mistakes a non-pointer for a pointer.

A moving GC is a GC that can relocate objects.

A generational GC is one that can perform a collection on part of the
heap without dealing with the full heap. Those parts tend to be organized
by how long an object has survived. This does not require a precise or
moving GC, though it works better with a precise, moving GC.



I didn't intend to confuse, but I was not writing to neophyte's, and I 
used linguistic shortcuts.



A write barrier is a way to record which sections of memory have been
altered since the last GC collection. This lets the GC scan only the
changed data -- useful to any GC, especially useful to generational ones.

A "partially moving" GC does not exist, as far as I know.



Yep, it's a Bad Idea.


Jeremy DeHaan wrote:

On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:

If I may make a suggestion. The lock free work is unlikely to require
the entirety of GSoC. And the precise GC is the next most important
thing on your list and will have the biggest impact on GC performance.


Rainer has two different precise GC's in pull requests right now and
both are slower than the current one unless there are false pointers.
I would expect anything I come up with to largely act the same. The
reason I keep pushing for a generational garbage collector is because I
think it would be where we would see the most benefit in terms of
general performance.



I would contend that it's not quite that simple. Of course the precise
GC is slower, it's scanning more stuff.


Precise GCs are slower because they must reference some data telling them
which items on the stack and in allocations represent pointers and which
do not. This data takes up cache space, and having less cache for memory
being scanned slows things down. Correlating two pieces of data takes
extra time, too.



Yep the data is usually held with the object in a header. Hence the 
"scanning more stuff".



The hope with a precise GC is to avoid false pointers. A false pointer to
an object that holds references to many other objects can be costly, both
in GC time and in excess memory usage.

In D, we have a tiny bit of type information surfaced to the GC to reduce
the incidence of false pointers. We could additionally take advantage of
the fact that numbers tend to be small and people tend to use 4 to 8 byte
numbers: we could arrange our address space to avoid ranges where false
pointers are likely. But that's probably unnecessary.



We are not limited by what we currently have. Most of Rainer's work 
involved surfacing more data to the GC.



The current conservative GC by
definition can't figure out large chunks of memory so it won't even
bother scanning them.


It must scan anything that might be a pointer. A precise GC reduces the
number of things that might be pointers but aren't to zero. Therefore a
precise GC scans less.



Yep, precise GC's aren't totally perf-negative.


If a conservative GC didn't scan something that might be a pointer, it
would sometimes free an object that still had live references to it. This
would result in memory corruption or segmentation faults.


What it does to those things it can't scan. And
what it can't scan it has to pin, which means the memory can't be moved.


If a moving GC cannot precisely identify all pointers to an object, it is
not allowed to move that object. If it can, it is allowed to move that
object.

A conservative GC can be a moving GC. It must classify things as
"definitely pointer", "maybe pointer", and "not pointer". It cannot move
an object with a "maybe pointer" pointing to it, but it can move an
object with several "definitely pointers" pointing to it.

In D, thanks to unions, we can never create a fully precise collector,
but we can create a moving collector. Objects pointed to from a union
can't be moved, but few objects will be pointed to from a union.



Actually, we probably could with modifications to the compiler. A little 
metadata goes a long way. And I know that Walte

Re: Potential GSoC project - GC improvements

2016-03-13 Thread Chris Wright via Digitalmars-d
On Sun, 13 Mar 2016 12:43:37 -0700, Adam Wilson wrote:
>> A "partially moving" GC does not exist, as far as I know.
>>
>>
> Yep, it's a Bad Idea.

It's not a standard term. Google's only seeing about four references to 
the term, none of them authoritative or definitive. Since it's non-
standard, it would behoove you to define it.

I think you mean a GC in which some objects are pinned (due to 
limitations in the GC rather than the user explicitly requesting that 
they be pinned).

Regardless, you ought to define the term and explain why it's a bad idea 
instead of providing a bare assertion. It doesn't make any difference to 
the operation of the GC *why* something's pinned; what matters is how 
much is pinned.

So, for instance, you could point to a conservative moving GC, select one 
circumstance in which it can't move objects, tell us how much memory that 
ends up pinning on average, and then you'll have an argument.

>> In D, we have a tiny bit of type information surfaced to the GC to
>> reduce the incidence of false pointers. We could additionally take
>> advantage of the fact that numbers tend to be small and people tend to
>> use 4 to 8 byte numbers: we could arrange our address space to avoid
>> ranges where false pointers are likely. But that's probably
>> unnecessary.
>>
>>
> We are not limited by what we currently have. Most of Rainer's work
> involved surfacing more data to the GC.

Sure. My point is that we don't *need* a lot of type information in the GC 
to get reasonable advantages. A little type information goes a long way,  
and a little caution in placing objects in the heap can help too.

>> In D, thanks to unions, we can never create a fully precise collector,
>> but we can create a moving collector. Objects pointed to from a union
>> can't be moved, but few objects will be pointed to from a union.
>
> Actually, we probably could with modifications to the compiler.

No, we couldn't. I've been assuming that we can make whatever 
modifications we want to the compiler for free.

The compiler has static type information. This explicitly contains 
ambiguities. When it's based solely on data available at compile time, 
the compiler can resolve the ambiguities -- and you could have written 
the code without using a union.

But in the majority of cases, people use a union because they must. That 
means which values are set depends on runtime data, and the compiler 
can't tell you which fields will be set.

We could provide a runtime function that is invoked when you set a value 
inside a union. That won't work very well. Consider:

  struct A { long a; long b; }
  struct B { double a; void* p; }
  union AB { A a; B b; }

  extern void foo(ref A a);
  
  AB ab;
  ab.b.p = GC.malloc(512);
  foo(ab.a);

Was that pointer overwritten? No clue! The compiler can't tell. When 
compiling the definition of foo(), it didn't have any idea whether the 
struct was a union member. Now we have to intercept *every* indirect 
write to *any* value with a runtime call, even if you never use a union 
in your program --

Oh look, now Python's beating us by an order of magnitude in benchmarks.

What do we get for it? Well, the 1% of programs that use this pattern 
will allow the GC to move an additional 0.1% of their objects.

Or we leave the GC conservative in this one case. We can detect this case 
pretty easily and treat apparent pointers as pinned. Everything will 
work, and the GC will be better at releasing memory than it currently is.


Re: Potential GSoC project - GC improvements

2016-03-13 Thread Adam Wilson via Digitalmars-d

Chris Wright wrote:

On Sun, 13 Mar 2016 12:43:37 -0700, Adam Wilson wrote:

A "partially moving" GC does not exist, as far as I know.



Yep, it's a Bad Idea.


It's not a standard term. Google's only seeing about four references to
the term, none of them authoritative or definitive. Since it's non-
standard, it would behoove you to define it.

I think you mean a GC in which some objects are pinned (due to
limitations in the GC rather than the user explicitly requesting that
they be pinned).

Regardless, you ought to define the term and explain why it's a bad idea
instead of providing a bare assertion. It doesn't make any difference to
the operation of the GC *why* something's pinned; what matters is how
much is pinned.

So, for instance, you could point to a conservative moving GC, select one
circumstance in which it can't move objects, tell us how much memory that
ends up pinning on average, and then you'll have an argument.



Is there an implementation of a conservative moving (compacting) GC out 
there? I'm not aware of one, but there are a lot of GC's out there. 
Boehm isn't.



In D, we have a tiny bit of type information surfaced to the GC to
reduce the incidence of false pointers. We could additionally take
advantage of the fact that numbers tend to be small and people tend to
use 4 to 8 byte numbers: we could arrange our address space to avoid
ranges where false pointers are likely. But that's probably
unnecessary.



We are not limited by what we currently have. Most of Rainer's work
involved surfacing more data to the GC.


Sure. My point is that we don't *need* a lot of type information in the GC
to get reasonable advantages. A little type information goes a long way,
and a little caution in placing objects in the heap can help too.


In D, thanks to unions, we can never create a fully precise collector,
but we can create a moving collector. Objects pointed to from a union
can't be moved, but few objects will be pointed to from a union.


Actually, we probably could with modifications to the compiler.


No, we couldn't. I've been assuming that we can make whatever
modifications we want to the compiler for free.

The compiler has static type information. This explicitly contains
ambiguities. When it's based solely on data available at compile time,
the compiler can resolve the ambiguities -- and you could have written
the code without using a union.

But in the majority of cases, people use a union because they must. That
means which values are set depends on runtime data, and the compiler
can't tell you which fields will be set.

We could provide a runtime function that is invoked when you set a value
inside a union. That won't work very well. Consider:

   struct A { long a; long b; }
   struct B { double a; void* p; }
   union AB { A a; B b; }

   extern void foo(ref A a);

   AB ab;
   ab.b.p = GC.malloc(512);
   foo(ab.a);

Was that pointer overwritten? No clue! The compiler can't tell. When
compiling the definition of foo(), it didn't have any idea whether the
struct was a union member. Now we have to intercept *every* indirect
write to *any* value with a runtime call, even if you never use a union
in your program --

Oh look, now Python's beating us by an order of magnitude in benchmarks.

What do we get for it? Well, the 1% of programs that use this pattern
will allow the GC to move an additional 0.1% of their objects.

Or we leave the GC conservative in this one case. We can detect this case
pretty easily and treat apparent pointers as pinned. Everything will
work, and the GC will be better at releasing memory than it currently is.



Ok, so yes, it needs conservative scanning in this case. Because even in 
C# I can pin things, so obviously it has valid use cases. I'm not 
absolutely against pinning and unions aren't going away, and if the GC 
would have to be go conservative then so be it.  And given that unions 
are a pretty niche use case, I don't think we should drop the idea of 
precision just because it can't be 100% precise, so we end up with a few 
conservatively-scanned edge cases, big deal, the benefits of precision 
to the whole ecosystem would be enormous in the general case.


Is this a debate about precise vs. non-precise GC or are we just 
bikeshedding about terminology and technical details? ;) I moved 
recently and lost track of the GC book for awhile so I've been 
refreshing my terminology. :P


--
// Adam Wilson
// quiet.dlang.dev


Re: Potential GSoC project - GC improvements

2016-03-13 Thread deadalnix via Digitalmars-d

On Sunday, 13 March 2016 at 23:34:44 UTC, Adam Wilson wrote:
Is there an implementation of a conservative moving 
(compacting) GC out there? I'm not aware of one, but there are 
a lot of GC's out there. Boehm isn't.




That is impossible, you need to know what is and isn't a pointer 
to be able to move things around.




Re: Potential GSoC project - GC improvements

2016-03-13 Thread Chris Wright via Digitalmars-d
On Sun, 13 Mar 2016 23:46:51 +, deadalnix wrote:

> On Sunday, 13 March 2016 at 23:34:44 UTC, Adam Wilson wrote:
>> Is there an implementation of a conservative moving (compacting) GC out
>> there? I'm not aware of one, but there are a lot of GC's out there.
>> Boehm isn't.
>>
>>
> That is impossible, you need to know what is and isn't a pointer to be
> able to move things around.

Let's say we do the most precise GC possible for D. We can categorize 
things into:
 * maybe a pointer; it's part of a union that contains a non-pointer 
element
 * definitely a pointer (we know because of stack pointer maps and type 
information, and there aren't any unions mucking it up)
 * definitely not a pointer

We can move anything that has only "definitely a pointer"s pointing at 
it. We can't move anything that has one or more "maybe a pointer"s 
pointing at it.

I keep saying this...


Re: Potential GSoC project - GC improvements

2016-03-13 Thread Chris Wright via Digitalmars-d
On Sun, 13 Mar 2016 16:34:44 -0700, Adam Wilson wrote:
> Is this a debate about precise vs. non-precise GC or are we just
> bikeshedding about terminology and technical details?

You made a large number of assertions about garbage collection and they 
were almost all wrong.

It's not about minutiae -- D can't reasonably have a precise collector, 
and you were saying that nothing else useful can happen before you make 
the GC precise. A non-precise collector can and often should be 
concurrent, and you were saying this was useless.

These are big things. They affect prioritization of major projects. 
Getting them wrong could mislead a GSoC student to the point of not 
accomplishing much when they could have plucked lower-hanging fruit 
successfully.


Re: Potential GSoC project - GC improvements

2016-03-13 Thread Adam Wilson via Digitalmars-d

Chris Wright wrote:

On Sun, 13 Mar 2016 16:34:44 -0700, Adam Wilson wrote:

Is this a debate about precise vs. non-precise GC or are we just
bikeshedding about terminology and technical details?


You made a large number of assertions about garbage collection and they
were almost all wrong.

It's not about minutiae -- D can't reasonably have a precise collector,
and you were saying that nothing else useful can happen before you make
the GC precise. A non-precise collector can and often should be
concurrent, and you were saying this was useless.

These are big things. They affect prioritization of major projects.
Getting them wrong could mislead a GSoC student to the point of not
accomplishing much when they could have plucked lower-hanging fruit
successfully.



Erm. That seems a bit extreme. You are asserting that D can't have a 
precise garbage collector, yet only have presented one case where the 
compiler cannot provide definitive information about the classification 
of pointer. Unions. Ok, so Unions, a relatively low-usage scenario with 
a valid work-around (assume it's a pointer). We can teach the compiler 
and GC how to pin C-style pointers so that's not it. Lastly, Rainer 
seemed to think a precise GC could be done, and he then went and did it 
... so "can't reasonably have a precise collector" is a factually 
incorrect assertion.


My GC lingo may be rusty, and I will admit to using superlatives 
incorrectly myself, but based on the existing evidence my assertions are 
hardly "almost all wrong". I'll clean up my lingo and retool my language 
if you'll tone down the superlatives.


Of course GSoC students need guidance, I've been a GSoC Mentor for DLang 
before. If I didn't think that a precise GC was not only extremely 
important to D, but also within the ability of the student, I would not 
have pushed for it.


"low-hanging fruit" is the reason D's quality is so spotty. We now have 
a chance with GSoC to get some quality time dedicated to a pain point, 
we should utilize that time to work on foundational issues. Then we (the 
community) can flame each other to death about the best algorithms to 
achieve the performance we desire. That is assuming we haven't scared 
him off.


--
// Adam Wilson
// quiet.dlang.dev


Re: Potential GSoC project - GC improvements

2016-03-13 Thread Adam Wilson via Digitalmars-d

deadalnix wrote:

On Sunday, 13 March 2016 at 23:34:44 UTC, Adam Wilson wrote:

Is there an implementation of a conservative moving (compacting) GC
out there? I'm not aware of one, but there are a lot of GC's out
there. Boehm isn't.



That is impossible, you need to know what is and isn't a pointer to be
able to move things around.



It was a rhetorical question. I was just leaving the door open for 
someone to surprise me. ;-)


--
// Adam Wilson
// quiet.dlang.dev


Re: Potential GSoC project - GC improvements

2016-03-13 Thread thedeemon via Digitalmars-d

On Monday, 14 March 2016 at 01:38:50 UTC, Adam Wilson wrote:
Lastly, Rainer seemed to think a precise GC could be done, and 
he then went and did it ... so "can't reasonably have a precise 
collector" is a factually incorrect assertion.


IIRC, Rainer called it "mostly precise", and for a good reason. 
If we're ok with calling a partially conservative GC a precise 
one, then go on. ;) As I understand Rainer's GC works in VisualD 
now and it did solve the problem of leaks he had with the default 
GC, so there is certainly benefit in pursuing preciseness, even 
non-100% one.


I think continuing Rainer's work toward "even more precise" GC, 
and adding important optimizations like allocations without a 
global lock and parallel scanning would be a great project, and a 
realistic one.


As for moving and generational GC, I have big doubts one can do 
it without breaking 90% of old code written in D (passing 
pointers to C libs) and turning the language into yet another C# 
with write barriers everywhere. Now D competes in speed with C++, 
with your proposed changes it will only try to compete with Java 
and C#, i.e. battle with C++ will be lost. On the other hand I 
might be overestimating write-barriers costs, please feel free to 
prove me wrong.


Fun part: if you make a moving GC and try to automate pinning and 
let the compiler do it instead of requiring programmers to 
rewrite most of their D code, you'll probably end up with a 
"partially moving GC" where half of the heap is pinned, some 
objects are moving here and there, and GC rarely releases memory 
back to OS because of those pinned objects.



My GC lingo may be rusty, and I will admit to using 
superlatives incorrectly myself, but based on the existing 
evidence my assertions are hardly "almost all wrong".


They did look so, sorry. Additional explanations helped.


Re: Potential GSoC project - GC improvements

2016-03-13 Thread Adam Wilson via Digitalmars-d

thedeemon wrote:

On Monday, 14 March 2016 at 01:38:50 UTC, Adam Wilson wrote:

Lastly, Rainer seemed to think a precise GC could be done, and he then
went and did it ... so "can't reasonably have a precise collector" is
a factually incorrect assertion.


IIRC, Rainer called it "mostly precise", and for a good reason. If we're
ok with calling a partially conservative GC a precise one, then go on.
;) As I understand Rainer's GC works in VisualD now and it did solve the
problem of leaks he had with the default GC, so there is certainly
benefit in pursuing preciseness, even non-100% one.



He did. And C# supports pinning so at some point all GC's are partially 
conservative. ;-)



I think continuing Rainer's work toward "even more precise" GC, and
adding important optimizations like allocations without a global lock
and parallel scanning would be a great project, and a realistic one.



On this we are in complete agreement.


As for moving and generational GC, I have big doubts one can do it
without breaking 90% of old code written in D (passing pointers to C
libs) and turning the language into yet another C# with write barriers
everywhere. Now D competes in speed with C++, with your proposed changes
it will only try to compete with Java and C#, i.e. battle with C++ will
be lost. On the other hand I might be overestimating write-barriers
costs, please feel free to prove me wrong.



Well, conclusively proving the cost of write-barriers would require 
profiling. But I think we should be able to optimize the use of 
write-barriers to reduce the impact.


Here is the thing, nobody wants to admit it, but we have lost the battle 
with C++, although to be honest, I don't think we ever had a prayer of 
winning it in the first place. They want C++ with the "warts" removed, 
for some personally distinct set of "warts", they're not really 
interested in D for itself.


Maybe ... why don't instead of trying to compete with everybody else, we 
do our own thing, and do it very well. As long as we're just another "me 
too" operation people will treat us like we are. Let's focus on being 
the best D language out there. And D needs a GC, so let's build the best 
damn garbage collector the natively-compiled world has every seen.



Fun part: if you make a moving GC and try to automate pinning and let
the compiler do it instead of requiring programmers to rewrite most of
their D code, you'll probably end up with a "partially moving GC" where
half of the heap is pinned, some objects are moving here and there, and
GC rarely releases memory back to OS because of those pinned objects.



Yea, this is true, although I think getting most of the way there is 
going to be very useful in it's own right. And it would open the 
possibility of experimenting with different algorithms and strategies. 
Right now we're pretty limited in the types of algorithms we can use, 
precision offers us more options and more experiments. Let's get the 
foundation in place, then we can go nuts.





My GC lingo may be rusty, and I will admit to using superlatives
incorrectly myself, but based on the existing evidence my assertions
are hardly "almost all wrong".


They did look so, sorry. Additional explanations helped.


Thank you. :)

--
// Adam Wilson
// quiet.dlang.dev


Re: Potential GSoC project - GC improvements

2016-03-14 Thread Jeremy DeHaan via Digitalmars-d
I haven't had power for a couple of days, but it looks like the 
discussion has gone along pretty ok. After reading everything, I 
think I'm inclined to agree with Adam and the main focus of my 
proposal will be a precise GC (or as precise as we can get). I'll 
definitely need some guidance, but I'll be learning everything I 
can about GC's and precision until the start of the project.


On Monday, 14 March 2016 at 05:28:13 UTC, Adam Wilson wrote:
Maybe ... why don't instead of trying to compete with everybody 
else, we do our own thing, and do it very well. As long as 
we're just another "me too" operation people will treat us like 
we are. Let's focus on being the best D language out there. And 
D needs a GC, so let's build the best damn garbage collector 
the natively-compiled world has every seen.



That is what I hope to work towards. Hopefully next year I can 
work on making a generational GC, or something else depending on 
how much time I could dedicate between when this GSoC is finished 
and the next begins.


Adam, can you reach out to Craig and let him know you're willing 
to mentor this project if it get's accepted? I talked with him a 
few days ago via email and he said that someone would need to 
take it on but he wasn't sure who.






Re: Potential GSoC project - GC improvements

2016-03-14 Thread jmh530 via Digitalmars-d

On Tuesday, 15 March 2016 at 01:34:07 UTC, Jeremy DeHaan wrote:
I haven't had power for a couple of days, but it looks like the 
discussion has gone along pretty ok.


I would characterize it as very interesting, although I know very 
little about how GCs are implemented.


I have a question, for the more knowledgeable...

There was some discussion about the difficulty of handling unions 
and being able to ensure that pointers are pointers in a precise 
GC. Is there less work involved in getting the GC to be precise 
for @safe code? Does the current GC take this into account?


Re: Potential GSoC project - GC improvements

2016-03-14 Thread Adam Wilson via Digitalmars-d

Jeremy DeHaan wrote:

I haven't had power for a couple of days, but it looks like the
discussion has gone along pretty ok. After reading everything, I think
I'm inclined to agree with Adam and the main focus of my proposal will
be a precise GC (or as precise as we can get). I'll definitely need some
guidance, but I'll be learning everything I can about GC's and precision
until the start of the project.



Awesome! It's a complex topic, but you'll get broad exposure to all 
corners of the D ecosystem, it's the kind of experience that you can 
apply to all aspects of software engineering.



On Monday, 14 March 2016 at 05:28:13 UTC, Adam Wilson wrote:

Maybe ... why don't instead of trying to compete with everybody else,
we do our own thing, and do it very well. As long as we're just
another "me too" operation people will treat us like we are. Let's
focus on being the best D language out there. And D needs a GC, so
let's build the best damn garbage collector the natively-compiled
world has every seen.



That is what I hope to work towards. Hopefully next year I can work on
making a generational GC, or something else depending on how much time I
could dedicate between when this GSoC is finished and the next begins.

Adam, can you reach out to Craig and let him know you're willing to
mentor this project if it get's accepted? I talked with him a few days
ago via email and he said that someone would need to take it on but he
wasn't sure who.



Will do. And given the nature of the work you want to do, I think 
(personal opinion follows) your application will sail through the process.


The application window opened today so my first recommendation would be 
to get started on your proposal as soon as possible, if you haven't 
already. GSoC has a basic outline of what you'll need to include here: 
http://write.flossmanuals.net/gsocstudentguide/writing-a-proposal/


I'd like to review the proposal before you submit it.

And welcome aboard the crazy-train known as GSoC, it's a lot of work but 
it'll be buckets of fun! :)


--
// Adam Wilson
// quiet.dlang.dev


Re: Potential GSoC project - GC improvements

2016-03-18 Thread Craig Dillabaugh via Digitalmars-d

On Tuesday, 15 March 2016 at 01:34:07 UTC, Jeremy DeHaan wrote:
I haven't had power for a couple of days, but it looks like the 
discussion has gone along pretty ok. After reading everything, 
I think I'm inclined to agree with Adam and the main focus of 
my proposal will be a precise GC (or as precise as we can get). 
I'll definitely need some guidance, but I'll be learning 
everything I can about GC's and precision until the start of 
the project.


[...]


In case you didn't already know, Adam is now an official mentor 
so you are all set to go.  Good luck!


Re: Potential GSoC project - GC improvements

2016-03-19 Thread Rainer Schuetze via Digitalmars-d



On 18.03.2016 22:04, Jeremy deHaan wrote:

On Friday, 18 March 2016 at 16:41:21 UTC, Rainer Schuetze wrote:



On 15.03.2016 02:34, Jeremy DeHaan wrote:

[...]


Being always way behind reading the forum these days, I'm a little
late and have not read all the messages in this thread thoroughly.
Here are some thoughts:

[...]


Thank you for the feedback. I'm still working on my proposal so nothing
is set in stone just yet. I'm very interested in working on the GC for
this GSoC, so what would you suggest be my main focus? It sounds like
you already have a GC that is more or less what I was planning on
implementing...


Well, there are a number of unfinished parts left:

- as far as I understand the precise GC PR 
https://github.com/D-Programming-Language/druntime/pull/1022 is 
currently stalled because it doesn't yield better overall performance 
than the current GC in most situations. The reasoning is that it should 
be able compensate the additional time during allocation (saving pointer 
information) by improved scanning using that information to skip 
non-pointers. That didn't work out yet, though.


- last time I checked generating RTInfo (which contains the pointer 
info) was a bit unreliable, see 
https://github.com/D-Programming-Language/dmd/pull/2480 and 
https://github.com/D-Programming-Language/dmd/pull/3958.


- I only implemented the DATA and TLS section TypeInfo emission for the 
OMF backend, this needs to be ported to other platforms. Martin Nowak 
recently considered to just emit pointer locations. This would make the 
scanning function a bit simpler, but might need a bit more memory in the 
binary.


Regarding the concurrent GC: I consider my implementation a prototype 
with rough edges and a lot of optimization opportunities. I'm not sure 
whether page protection is good enough to implement a generational GC, 
but it might still be possible to take advantage of the information that 
a page hasn't been written to.
Judging from https://blog.golang.org/go15gc Go 1.5 seems to be using a 
similar GC (concurrent mark-and-sweep), though with proper write 
barriers. I guess there is a lot of stuff that can be used from their 
experience.


> so what would you suggest be my main focus?

Given that 32-bit applications are becoming legacy, and false pointers 
are not a common problem in 64-bit processes (they do happen eventually, 
though) I suspect that concurrency of the GC would make a much larger 
impact on the D language than preciseness. A good target should be to 
reduce stop-the-world-time to something acceptable for interactive 
programs, i.e. well below 50ms.


Re: Potential GSoC project - GC improvements

2016-03-19 Thread Rainer Schuetze via Digitalmars-d



On 15.03.2016 02:34, Jeremy DeHaan wrote:

I haven't had power for a couple of days, but it looks like the
discussion has gone along pretty ok. After reading everything, I think
I'm inclined to agree with Adam and the main focus of my proposal will
be a precise GC (or as precise as we can get). I'll definitely need some
guidance, but I'll be learning everything I can about GC's and precision
until the start of the project.

On Monday, 14 March 2016 at 05:28:13 UTC, Adam Wilson wrote:

Maybe ... why don't instead of trying to compete with everybody else,
we do our own thing, and do it very well. As long as we're just
another "me too" operation people will treat us like we are. Let's
focus on being the best D language out there. And D needs a GC, so
let's build the best damn garbage collector the natively-compiled
world has every seen.



That is what I hope to work towards. Hopefully next year I can work on
making a generational GC, or something else depending on how much time I
could dedicate between when this GSoC is finished and the next begins.

Adam, can you reach out to Craig and let him know you're willing to
mentor this project if it get's accepted? I talked with him a few days
ago via email and he said that someone would need to take it on but he
wasn't sure who.



Being always way behind reading the forum these days, I'm a little late 
and have not read all the messages in this thread thoroughly. Here are 
some thoughts:


The precise GC used by Visual D is not only heap-precise, but also has 
type information to scan the data and TLS segments precisely. This is
done with a patched dmd now based on version 2.066 and is not part of 
the PR. It uses the solution that I presented at DConf 2013: 
http://dconf.org/2013/talks/schuetze.pdf


Creating information for stack and registers seems a lot of work with 
uncertain return: Apart from having to describe your code in very high 
detail, there is usually no guarantee that you can unwind the stack 
properly, especially when calling through C/C++ functions that you have 
no control of.


A moving collector is possible IMO, but it needs write barriers, and the 
idea of adding these explicitly has met quite some resistance, 
especially by Walter. Write barriers are also necessary for incremental 
collections. It changes the way you can interact with C/C++ because it's 
no longer good enough to just keep a pointer used by C visible to the 
GC, you have to pin it explicitly.


You can have very coarse barriers by using the page protection mechanism 
of the hardware, that's what I implemented for Windows, see 
http://rainers.github.io/visuald/druntime/concurrentgc.html. This is 
similar to Sociomantics concurrent GC using fork. It scans in the 
background, but not with multiple threads in parallel.


Unfortunately, this work has rotton almost 2 years now, but I hope to 
get back to it some day...


Re: Potential GSoC project - GC improvements

2016-03-19 Thread Jeremy deHaan via Digitalmars-d

On Friday, 18 March 2016 at 16:41:21 UTC, Rainer Schuetze wrote:



On 15.03.2016 02:34, Jeremy DeHaan wrote:

[...]


Being always way behind reading the forum these days, I'm a 
little late and have not read all the messages in this thread 
thoroughly. Here are some thoughts:


[...]


Thank you for the feedback. I'm still working on my proposal so 
nothing is set in stone just yet. I'm very interested in working 
on the GC for this GSoC, so what would you suggest be my main 
focus? It sounds like you already have a GC that is more or less 
what I was planning on implementing...


Re: Potential GSoC project - GC improvements

2016-03-19 Thread Adam Wilson via Digitalmars-d

Rainer Schuetze wrote:



On 18.03.2016 22:04, Jeremy deHaan wrote:

On Friday, 18 March 2016 at 16:41:21 UTC, Rainer Schuetze wrote:



On 15.03.2016 02:34, Jeremy DeHaan wrote:

[...]


Being always way behind reading the forum these days, I'm a little
late and have not read all the messages in this thread thoroughly.
Here are some thoughts:

[...]


Thank you for the feedback. I'm still working on my proposal so nothing
is set in stone just yet. I'm very interested in working on the GC for
this GSoC, so what would you suggest be my main focus? It sounds like
you already have a GC that is more or less what I was planning on
implementing...


Well, there are a number of unfinished parts left:

- as far as I understand the precise GC PR
https://github.com/D-Programming-Language/druntime/pull/1022 is
currently stalled because it doesn't yield better overall performance
than the current GC in most situations. The reasoning is that it should
be able compensate the additional time during allocation (saving pointer
information) by improved scanning using that information to skip
non-pointers. That didn't work out yet, though.

- last time I checked generating RTInfo (which contains the pointer
info) was a bit unreliable, see
https://github.com/D-Programming-Language/dmd/pull/2480 and
https://github.com/D-Programming-Language/dmd/pull/3958.

- I only implemented the DATA and TLS section TypeInfo emission for the
OMF backend, this needs to be ported to other platforms. Martin Nowak
recently considered to just emit pointer locations. This would make the
scanning function a bit simpler, but might need a bit more memory in the
binary.

Regarding the concurrent GC: I consider my implementation a prototype
with rough edges and a lot of optimization opportunities. I'm not sure
whether page protection is good enough to implement a generational GC,
but it might still be possible to take advantage of the information that
a page hasn't been written to.
Judging from https://blog.golang.org/go15gc Go 1.5 seems to be using a
similar GC (concurrent mark-and-sweep), though with proper write
barriers. I guess there is a lot of stuff that can be used from their
experience.

 > so what would you suggest be my main focus?

Given that 32-bit applications are becoming legacy, and false pointers
are not a common problem in 64-bit processes (they do happen eventually,
though) I suspect that concurrency of the GC would make a much larger
impact on the D language than preciseness. A good target should be to
reduce stop-the-world-time to something acceptable for interactive
programs, i.e. well below 50ms.


I would argue that precision is still important as it enables a 
compacting GC. While technically possible to implement a generational GC 
without compaction, it's generally not considered wise. And even though 
64-bit does not suffer from false pointers as much as 32-bit does it 
still will, especially when used in the big-data, analytics, and 
scientific scenarios that D seems to have made real traction in. While 
false pointers are a problem for a simple command line app, and probably 
even most Vibe.D servers, there is a significant class of work being 
done in D today that would be directly effected by them.


--
// Adam Wilson
// quiet.dlang.dev