Re: GC question

2017-02-08 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Saturday, 4 February 2017 at 15:23:53 UTC, Adam D. Ruppe wrote:

On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:

- Automatic but conservative. Can leak at any time.


All GCs are prone to leak, including precise ones. The point of 
garbage collection is not to prevent leaks, but rather to 
prevent use-after-free bugs.


No, the main point of GC is to prevent leaks in the case where 
you have circular references.


Precise GCs don't leak, by definition. If the object is reachable 
then it isn't a leak.


Now, you might claim that objects that provably won't be touched 
again should be classified as dead and freed and that this is a 
bug that exhibit the same behaviour as a leak (running out of 
memory). But it's really nothing like the leaks you experience 
with manual memory management (e.g. circular references 
preventing memory from being released in a reference counting 
management scheme)







Re: GC question

2017-02-05 Thread Cym13 via Digitalmars-d-learn
On Sunday, 5 February 2017 at 04:22:30 UTC, rikki cattermole 
wrote:

On 05/02/2017 5:02 PM, thedeemon wrote:

snip

It may look so from a distance. But in my experience it's not 
that bad.
In most software I did in D it did not matter really (it's 
either 64-bit
or short lived programs) and the control D gives to choose how 
to deal
with everything makes it all quite manageable, I can decide 
what to take

from both worlds and hence pick the best, not the worst.


The best of both worlds can be done quite simply.

Instead of a chain of input ranges like:

int[] data = input.filter!"a != 7".map!"a * 2".array;

Use:

int[] data;
data.length = input.length;

size_t i;
foreach(v; input.filter!"a != 7".map!"a * 2") {
data[i] = v;
i++;
}

data.length = i;

Of course this is dirt simple example, but instead look at it 
for e.g. a csv parser with some complex data structure creation 
+ manipulation.


I have some real world code here[0] that uses it. Not only is 
there less allocations and uses the GC but also it ends up 
being significantly faster!


[0] 
https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L124


Some data to weigh that in order to compare different memory 
management strategies on that simple case:


#!/usr/bin/env rdmd

import std.conv;
import std.stdio;
import std.array;
import std.range;
import std.algorithm;

auto input = [1, 2, 7, 3, 7, 8, 8, 9, 7, 1, 0];


void naive() {
int[] data = input.filter!(a => a!= 7).map!(a => a*2).array;
assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void maxReallocs() {
int[] data;

size_t i;
foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
data ~= v;
}

assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void betterOfTwoWorlds() {
int[] data;
data.length = input.length;

size_t i;
foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
data[i] = v;
i++;
}
data.length = i;

assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void explicitNew() {
int[] data = new int[input.length];
scope(exit) delete data;

size_t i;
foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
data[i] = v;
i++;
}
data.length = i;

assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void cStyle() @nogc {
import std.c.stdlib;

int* data = cast(int*)malloc(input.length * int.sizeof);
scope(exit) free(data);

size_t i;
foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
data[i++] = v;
}

debug assert(data[0..i] == [2, 4, 6, 16, 16, 18, 2, 0], 
data.to!string);

}

void onTheStack() @nogc {
int[100] data;

size_t i;
foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
data[i++] = v;
}

debug assert(data[0..i] == [2, 4, 6, 16, 16, 18, 2, 0], 
data.to!string);

}

void main(string[] args) {
import std.datetime;
benchmark!(
naive,
maxReallocs,
betterOfTwoWorlds,
explicitNew,
cStyle,
onTheStack
)(10).each!writeln;
}

/* Results:

Compiled with dmd -profile=gc test.d


TickDuration(385731143)  // naive,
TickDuration(575673615)  // maxReallocs,
TickDuration(255928562)  // betterOfTwoWorlds,
TickDuration(270497154)  // explicitNew,
TickDuration(97596363)   // cStyle,
TickDuration(96467459)   // onTheStack

GC usage:

bytes allocated, allocations, type, function, file:line
   1760  10 int[] test.explicitNew test.d:43
440  10 int[] test.betterOfTwoWorlds 
test.d:30

320  80 int[] test.maxReallocs test.d:22
320  10 int[] test.maxReallocs test.d:25
320  10 int[] test.explicitNew test.d:51
320  10 int[] test.explicitNew test.d:53
320  10 int[] test.betterOfTwoWorlds 
test.d:37
320  10 int[] test.betterOfTwoWorlds 
test.d:39
320  10 
std.array.Appender!(int[]).Appender.Data 
std.array.Appender!(int[]).Appender.this 
/usr/include/dlang/dmd/std/array.d:2675

320  10 int[] test.naive test.d:14

Compiled with dmd -O -inline test.d
===

TickDuration(159383005)  // naive,
TickDuration(187192137)  // maxReallocs,
TickDuration(94094585)   // betterOfTwoWorlds,
TickDuration(102374657)  // explicitNew,
TickDuration(41801695)   // cStyle,
TickDuration(45613954)   // onTheStack

Compiled with dmd -O -inline -release -boundscheck=off test.d
=

TickDuration(152151439)  // naive,
TickDuration(140870515)  // maxReallocs,
TickDuration(46740440)   // betterOfTwoWorlds,
TickDuration(59089016)   // explicitNew,
TickDuration(26038060)   // cStyle,
TickDuration(25984371)   // onTheStack

*/



Re: GC question

2017-02-04 Thread rikki cattermole via Digitalmars-d-learn

On 05/02/2017 5:02 PM, thedeemon wrote:

snip


It may look so from a distance. But in my experience it's not that bad.
In most software I did in D it did not matter really (it's either 64-bit
or short lived programs) and the control D gives to choose how to deal
with everything makes it all quite manageable, I can decide what to take
from both worlds and hence pick the best, not the worst.


The best of both worlds can be done quite simply.

Instead of a chain of input ranges like:

int[] data = input.filter!"a != 7".map!"a * 2".array;

Use:

int[] data;
data.length = input.length;

size_t i;
foreach(v; input.filter!"a != 7".map!"a * 2") {
data[i] = v;
i++;
}

data.length = i;

Of course this is dirt simple example, but instead look at it for e.g. a 
csv parser with some complex data structure creation + manipulation.


I have some real world code here[0] that uses it. Not only is there less 
allocations and uses the GC but also it ends up being significantly faster!


[0] 
https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L124


Re: GC question

2017-02-04 Thread thedeemon via Digitalmars-d-learn

On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:
- Automatic but conservative. Can leak at any time. You have to 
implement manual management (managed heaps) to avoid leaks. 
Leaks are hard to find as any heap value may be causing it.


By "managed heap" I just meant the GC heap, the one used by "new" 
operator.
Besides it, there are already other allocators and container 
libraries available, don't need to implement this stuff manually.



the worst of both worlds.


It may look so from a distance. But in my experience it's not 
that bad. In most software I did in D it did not matter really 
(it's either 64-bit or short lived programs) and the control D 
gives to choose how to deal with everything makes it all quite 
manageable, I can decide what to take from both worlds and hence 
pick the best, not the worst.





Re: GC question

2017-02-04 Thread osa1 via Digitalmars-d-learn
All GCs are prone to leak, including precise ones. The point of 
garbage collection is not to prevent leaks, but rather to 
prevent use-after-free bugs.


Of course I can have leaks in a GC environment, but having 
non-deterministic leaks is another thing, and I'd rather make 
sure to delete my references to let GC do its thing than to pray 
and hope some random number on my stack won't be in the range of 
my heap.


I don't agree that the point is just preventing use-after-free, 
which can be guaranteed statically even in a non-GC language (see 
e.g. Rust).


Re: GC question

2017-02-04 Thread Adam D. Ruppe via Digitalmars-d-learn

On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:

- Automatic but conservative. Can leak at any time.


All GCs are prone to leak, including precise ones. The point of 
garbage collection is not to prevent leaks, but rather to prevent 
use-after-free bugs.


Granted, the D 32 bit GC is more prone to leak than most others 
(including D 64 bit), but this isn't as horrible as you're 
believing, it does it's *main* job pretty well just at the cost 
of higher memory consumption, which we can often afford.


And if you can't, manual management of large arrays tends to be 
relatively simple anyway. For example, my png.d used to leak 
something nasty in 32 bit because it used GC-allocated large 
temporary buffers while decompressing images. But, since they 
were temporary buffers, it was really easy to just `scope(exit) 
free(buffer);` after allocating to let them be freed at the end 
of the function. Then the memory consumption cut in half.


Re: GC question

2017-02-04 Thread Kagamin via Digitalmars-d-learn

On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:

I'm surprised that D was able to come this far with this.


It's used mostly for server software. Things are moving to 64 
bit, so this will be less of an issue.


Re: GC question

2017-02-04 Thread osa1 via Digitalmars-d-learn

On Saturday, 4 February 2017 at 11:09:21 UTC, thedeemon wrote:

On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:

I'm wondering what
are the implications of the fact that current GC is a 
Boehm-style conservative
GC rather than a precise one, I've never worked with a 
conservative GC before.
Are there any disallowed memory operations? Can I break things 
by not following
some unchecked rules etc. ? How often does it leak? Do I need 
to be careful

with some operations to avoid leaks?


Here's some practical perspective from someone who released a 
32-bit video processing app in D with thousands of users.
When developing with GC in D you need to keep in mind 3 key 
things:


1) The GC will treat some random stack data as possible 
pointers, and some of those false pointers will accidentally 
point to some places in the heap, so for any object in GC heap 
there is a probability that GC will think it's alive (used) 
even when it's not, and this probability is directly 
proportional to the size of your object.


2) Each GC iteration scans the whole GC heap, so the larger 
your managed heap, the slower it gets.


Main consequence of 1 and 2: don't store large objects (images, 
big file chunks etc.) in the GC heap, use other allocators for 
them. Leave GC heap just for the small litter. This way you 
practically don't leak and keep GC pauses short.


3) GC will call destructors (aka finalizers) for the objects 
that have them, and during the GC phase no allocations are 
allowed. Also, since you don't know in which order objects are 
collected, accessing other objects from a destructor is a bad 
idea, those objects might be collected already.


Main consequence of 3: don't do silly things in destructor 
(like throwing exceptions or doing other operations that might 
allocate), try avoiding using the destructors at all, if 
possible. They may be used to ensure you release your 
resources, but don't make it the primary and only way to 
release them, since some objects might leak and their 
destructors won't be called at all.


If you follow these principles, your app will be fine, it's not 
hard really.


Honestly this still sounds horrible. I'd be OK with any of these 
two:


- Percise GC, no manual management, no RAII or destructors etc.
- Manual GC, RAII and destructors, smart pointers.

but this:

- Automatic but conservative. Can leak at any time. You have to 
implement manual management (managed heaps) to avoid leaks. Leaks 
are hard to find as any heap value may be causing it.


is the worst of both worlds. I'm surprised that D was able to 
come this far with this.


Re: GC question

2017-02-04 Thread thedeemon via Digitalmars-d-learn

On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:

I'm wondering what
are the implications of the fact that current GC is a 
Boehm-style conservative
GC rather than a precise one, I've never worked with a 
conservative GC before.
Are there any disallowed memory operations? Can I break things 
by not following
some unchecked rules etc. ? How often does it leak? Do I need 
to be careful

with some operations to avoid leaks?


Here's some practical perspective from someone who released a 
32-bit video processing app in D with thousands of users.
When developing with GC in D you need to keep in mind 3 key 
things:


1) The GC will treat some random stack data as possible pointers, 
and some of those false pointers will accidentally point to some 
places in the heap, so for any object in GC heap there is a 
probability that GC will think it's alive (used) even when it's 
not, and this probability is directly proportional to the size of 
your object.


2) Each GC iteration scans the whole GC heap, so the larger your 
managed heap, the slower it gets.


Main consequence of 1 and 2: don't store large objects (images, 
big file chunks etc.) in the GC heap, use other allocators for 
them. Leave GC heap just for the small litter. This way you 
practically don't leak and keep GC pauses short.


3) GC will call destructors (aka finalizers) for the objects that 
have them, and during the GC phase no allocations are allowed. 
Also, since you don't know in which order objects are collected, 
accessing other objects from a destructor is a bad idea, those 
objects might be collected already.


Main consequence of 3: don't do silly things in destructor (like 
throwing exceptions or doing other operations that might 
allocate), try avoiding using the destructors at all, if 
possible. They may be used to ensure you release your resources, 
but don't make it the primary and only way to release them, since 
some objects might leak and their destructors won't be called at 
all.


If you follow these principles, your app will be fine, it's not 
hard really.


Re: GC question

2017-02-03 Thread Dsby via Digitalmars-d-learn

On Friday, 3 February 2017 at 11:36:26 UTC, osa1 wrote:

On Friday, 3 February 2017 at 10:49:00 UTC, Kagamin wrote:
Leaks are likely in 32-bit processes and unlikely in 64-bit 
processes. See e.g. 
https://issues.dlang.org/show_bug.cgi?id=15723


This looks pretty bad. I think I'll consider something else 
until D's memory management story gets better. This is sad 
because the language otherwise looks quite good, and I'd love 
to try assertions, contracts, scope guards, macros etc.


you can use less auto GC. use the RC to replace the GC.
https://github.com/huntlabs/SmartRef


Re: GC question

2017-02-03 Thread osa1 via Digitalmars-d-learn

On Friday, 3 February 2017 at 10:49:00 UTC, Kagamin wrote:
Leaks are likely in 32-bit processes and unlikely in 64-bit 
processes. See e.g. 
https://issues.dlang.org/show_bug.cgi?id=15723


This looks pretty bad. I think I'll consider something else until 
D's memory management story gets better. This is sad because the 
language otherwise looks quite good, and I'd love to try 
assertions, contracts, scope guards, macros etc.


Re: GC question

2017-02-03 Thread Kagamin via Digitalmars-d-learn

On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:

Are there any disallowed memory operations?


Currently can't touch GC from destructor during collection. 
Another concern is interoperability with C-allocated memory: GC 
knows nothing about C heap.



How often does it leak?


Leaks are likely in 32-bit processes and unlikely in 64-bit 
processes. See e.g. https://issues.dlang.org/show_bug.cgi?id=15723



Do I need to be careful with some operations to avoid leaks?


Leaks happen only due to false pointers. But data allocated in GC 
with new operator and known to have no pointers (e.g. strings) is 
not scanned. Premature collection happen when GC doesn't see a 
pointer to the allocated data, happens when such pointer is put 
in a memory GC doesn't see, like C heap.



Is a precise GC in the roadmap?


There's an effort on it: 
https://forum.dlang.org/post/hdwwkzqswwtffjehe...@forum.dlang.org


It's fine if I have to do manual memory management, but I don't 
want any leaks.


If you manually deallocate memory, it gets deallocated for sure, 
shouldn't leak.
Comparing to java, D GC trades GC performance for code execution 
performance, which can result in better overall performance when 
you don't allocate much and worse performance for 
allocation-heavy code that java GC is optimized for.


Re: GC question

2017-02-01 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Wednesday, 1 February 2017 at 09:50:42 UTC, osa1 wrote:
Thanks for the answer. Could you elaborate on the lacklustre 
part? It's fine if I have to do manual memory management, but I 
don't want any leaks. Ideally I'd have a precise GC + RAII 
style resource management when needed.


Rust, Go, Java, Swift etc have a single memory management scheme 
which is used by libraries and mostly enforced by the compiler.


In C++ you tend to go with unique ownership and occasional shared 
ownership with the ability to have weak pointers and swap out 
objects without updating pointer, and there is an easy transition 
form old ad-hoc ownership to shared (the reference counter is in 
a separate object). It is not enforced by the compiler, but C++ 
is moving towards having dedicated tools for checking correctness.


In D the goal is to have safety enforced by the compiler, but it 
isn't quite there yet and what is on the map for leak free 
resource management seems a simple reference counting mechanism 
(simpler than swift?) with refcount embedded in objects (like 
intrusive_ptr in Boost except the compiler is intended to be 
better at optimizing unnecessary updates of the reference count).




Re: GC question

2017-02-01 Thread osa1 via Digitalmars-d-learn
On Wednesday, 1 February 2017 at 09:40:17 UTC, Ola Fosheim 
Grøstad wrote:

On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:

I'm wondering what
are the implications of the fact that current GC is a 
Boehm-style conservative
GC rather than a precise one, I've never worked with a 
conservative GC before.


The GC isn't competitive with the ones you find in GC languages 
(Java, Go etc). E.g. Go is now aiming for GC pauses in the 
microseconds range.


Resource management in D is rather lacklustre, even C++ does 
better imho. D seems to move towards using thread local 
ref-counting and making GC optional. I guess that would be ok 
on cpus with few cores, but not really adequate on many core 
CPUs.


Thanks for the answer. Could you elaborate on the lacklustre 
part? It's fine if I have to do manual memory management, but I 
don't want any leaks. Ideally I'd have a precise GC + RAII style 
resource management when needed.


Re: GC question

2017-02-01 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:

I'm wondering what
are the implications of the fact that current GC is a 
Boehm-style conservative
GC rather than a precise one, I've never worked with a 
conservative GC before.


The GC isn't competitive with the ones you find in GC languages 
(Java, Go etc). E.g. Go is now aiming for GC pauses in the 
microseconds range.


Resource management in D is rather lacklustre, even C++ does 
better imho. D seems to move towards using thread local 
ref-counting and making GC optional. I guess that would be ok on 
cpus with few cores, but not really adequate on many core CPUs.





GC question

2017-01-31 Thread osa1 via Digitalmars-d-learn

Hi all,

I was looking at D as the next language to use in my hobby 
projects, but the

"conservative GC" part in the language spec
(http://dlang.org/spec/garbage.html) looks a bit concerning. I'm 
wondering what
are the implications of the fact that current GC is a Boehm-style 
conservative
GC rather than a precise one, I've never worked with a 
conservative GC before.
Are there any disallowed memory operations? Can I break things by 
not following
some unchecked rules etc. ? How often does it leak? Do I need to 
be careful
with some operations to avoid leaks? Is a precise GC in the 
roadmap? any kind

of comments on the GC would be really appreciated.

Thanks