Re: O(N) Garbage collection?

2011-02-20 Thread Simen kjaeraas

Nick Sabalausky a@a.a wrote:


Ahh, I was confusing gene with chromosome (and probably still got the
exact number wrong ;) ).


That you did. Humans have 23 chromosome pairs. But now you know!


--
Simen


Re: O(N) Garbage collection?

2011-02-19 Thread dsimcha
BTW, here are the timings if I remove the GC.BlkAttr.NO_SCAN, meaning 
that everything gets scanned.  Said timings aren't drastically 
different.  Something is seriously wrong here.


Collected a 10 megabyte heap in 1 milliseconds.
Collected a 50 megabyte heap in 3 milliseconds.
Collected a 200 megabyte heap in 15 milliseconds.
Collected a 500 megabyte heap in 38 milliseconds.
Collected a 1000 megabyte heap in 77 milliseconds.
Collected a 5000 megabyte heap in 696 milliseconds.
Collected a 1 megabyte heap in 1394 milliseconds.
Collected a 3 megabyte heap in 2885 milliseconds.
Collected a 5 megabyte heap in 4343 milliseconds.

On 2/19/2011 12:03 AM, dsimcha wrote:

I've been trying out D's new 64-bit compiler and a serious barrier to
using it effectively seems to be abysmal garbage collection performance
with large heaps. It seems like the time for a garbage collection to run
scales linearly with the size of the heap *even if most of the heap is
marked as NO_SCAN*. I'm running a program with a heap size of ~6GB,
almost all of which is strings (DNA sequences), which are not scanned by
the GC. It's spending most of its time in GC, based on pausing it every
once in a while in GDB and seeing what's at the top of the stack.

Here's a test program and the results for a few runs.

import std.stdio, std.datetime, core.memory, std.conv;

void main(string[] args) {
if(args.length  2) {
stderr.writeln(Need size.);
return;
}

immutable mul = to!size_t(args[1]);
auto ptr = GC.malloc(mul * 1_048_576, GC.BlkAttr.NO_SCAN);

auto sw = StopWatch(autoStart);
GC.collect();
immutable msec = sw.peek.msecs;
writefln(Collected a %s megabyte heap in %s milliseconds.,
mul, msec);
}

Outputs for various sizes:

Collected a 10 megabyte heap in 1 milliseconds.
Collected a 50 megabyte heap in 4 milliseconds.
Collected a 200 megabyte heap in 16 milliseconds.
Collected a 500 megabyte heap in 41 milliseconds.
Collected a 1000 megabyte heap in 80 milliseconds.
Collected a 5000 megabyte heap in 397 milliseconds.
Collected a 1 megabyte heap in 801 milliseconds.
Collected a 3 megabyte heap in 2454 milliseconds.
Collected a 5 megabyte heap in 4096 milliseconds.

Note that these tests were run on a server with over 100 GB of physical
RAM, so a shortage of physical memory isn't the problem. Shouldn't GC be
O(1) with respect to the size of the unscanned portion of the heap?




Re: O(N) Garbage collection?

2011-02-19 Thread bearophile
dsimcha:

 BTW, here are the timings if I remove the GC.BlkAttr.NO_SCAN, meaning 
 that everything gets scanned.  Said timings aren't drastically 
 different.  Something is seriously wrong here.

Languages like Clojure, Scala, Boo, etc, start their development on a virtual 
machine where there is a refined GC, a standard library, a good back-end, etc, 
all things that require a very large amount of work to be built well and tuned. 
D language tries to re-create everything, even refusing the LLVM back-end (LDC 
docet) so there is a lot of painful work to do, to create a decent enough GC, 
etc. The current engineering quality of the Java GC will be out of reach of D 
language maybe forever...

Bye,
bearophile


Re: O(N) Garbage collection?

2011-02-19 Thread dsimcha

You may be right, but:

1.  Reinventing the wheel has its advantages in that you get to step 
back and reevaluate things and remove all the cruft that built up on the 
existing wheel.


2.  I'm guessing this is a silly bug somewhere rather than a deep design 
flaw, and that it can easily be solved by someone with a good mental 
model of the whole implementation of the D GC (Sean or Walter) by using 
a better data structure.  I've found several things in the GC source 
code that look like linear searches, but I don't understand the code 
well enough to know what to do about them.


On 2/19/2011 10:17 AM, bearophile wrote:

dsimcha:


BTW, here are the timings if I remove the GC.BlkAttr.NO_SCAN, meaning
that everything gets scanned.  Said timings aren't drastically
different.  Something is seriously wrong here.


Languages like Clojure, Scala, Boo, etc, start their development on a virtual 
machine where there is a refined GC, a standard library, a good back-end, etc, 
all things that require a very large amount of work to be built well and tuned. 
D language tries to re-create everything, even refusing the LLVM back-end (LDC 
docet) so there is a lot of painful work to do, to create a decent enough GC, 
etc. The current engineering quality of the Java GC will be out of reach of D 
language maybe forever...

Bye,
bearophile




Re: O(N) Garbage collection?

2011-02-19 Thread Ulrik Mikaelsson
Just a thought; I guess the references to the non-GC-scanned strings
are held in GC-scanned memory, right? Are the number of such
references also increased linearly?

I'm not a GC-expert, but if so, wouldn't that pretty much force the GC
to do at least one follow-up of every reference, before realizing it's
pointing to non-GC memory? That COULD explain the linear increase.

That said, I too feel some improvements in the border between GC and
other resource-managements methods are needed. The prospect of good
GC/non-GC combinations was what drew me here in the first place. I
would welcome some clear language and runtime-support for
non-GC-memory, such as frameworks for ref-counting and
tree-allocation, and well-defined semantics of object lifetime both in
GC and non-GC mode. I've mostly sorted out the kinks of the myself (I
think), but it's proving to be _very_ difficult, mostly undocumented,
and often appears to be an afterthought rather than by-design.

Regards
/ Ulrik

2011/2/19 dsimcha dsim...@yahoo.com:
 I've been trying out D's new 64-bit compiler and a serious barrier to using
 it effectively seems to be abysmal garbage collection performance with large
 heaps. It seems like the time for a garbage collection to run scales
 linearly with the size of the heap *even if most of the heap is marked as
 NO_SCAN*.  I'm running a program with a heap size of ~6GB, almost all of
 which is strings (DNA sequences), which are not scanned by the GC.  It's
 spending most of its time in GC, based on pausing it every once in a while
 in GDB and seeing what's at the top of the stack.

 Here's a test program and the results for a few runs.

 import std.stdio, std.datetime, core.memory, std.conv;

 void main(string[] args) {
    if(args.length  2) {
        stderr.writeln(Need size.);
        return;
    }

    immutable mul = to!size_t(args[1]);
    auto ptr = GC.malloc(mul * 1_048_576, GC.BlkAttr.NO_SCAN);

    auto sw = StopWatch(autoStart);
    GC.collect();
    immutable msec = sw.peek.msecs;
    writefln(Collected a %s megabyte heap in %s milliseconds.,
        mul, msec);
 }

 Outputs for various sizes:

 Collected a 10 megabyte heap in 1 milliseconds.
 Collected a 50 megabyte heap in 4 milliseconds.
 Collected a 200 megabyte heap in 16 milliseconds.
 Collected a 500 megabyte heap in 41 milliseconds.
 Collected a 1000 megabyte heap in 80 milliseconds.
 Collected a 5000 megabyte heap in 397 milliseconds.
 Collected a 1 megabyte heap in 801 milliseconds.
 Collected a 3 megabyte heap in 2454 milliseconds.
 Collected a 5 megabyte heap in 4096 milliseconds.

 Note that these tests were run on a server with over 100 GB of physical RAM,
 so a shortage of physical memory isn't the problem.  Shouldn't GC be O(1)
 with respect to the size of the unscanned portion of the heap?



Re: O(N) Garbage collection?

2011-02-19 Thread dsimcha

On 2/19/2011 12:50 PM, Ulrik Mikaelsson wrote:

Just a thought; I guess the references to the non-GC-scanned strings
are held in GC-scanned memory, right? Are the number of such
references also increased linearly?


Well, first of all, the benchmark I posted seems to indicate otherwise. 
 Second of all, I was running this program before on yeast DNA and it 
was ridiculously fast.  Then I tried to do the same thing on human DNA 
and it became slow as molasses.  Roughly speaking, w/o getting into the 
biology much, I've got one string for each gene.  Yeast have about 1/3 
as many genes as humans, but the genes are on average about 100 times 
smaller.  Therefore, the difference should be at most a small constant 
factor and in actuality it's a huge constant factor.


Note:  I know I could make the program in question a lot more space 
efficient, and that's what I ended up doing.  It works now.  It's just 
that it was originally written for yeast, where space efficiency is 
obviously not a concern, and I would have liked to just try a one-off 
calculation on the human genome without having to rewrite portions of it.


Re: O(N) Garbage collection?

2011-02-19 Thread retard
Sat, 19 Feb 2011 14:32:27 -0500, dsimcha wrote:

 On 2/19/2011 12:50 PM, Ulrik Mikaelsson wrote:
 Just a thought; I guess the references to the non-GC-scanned strings
 are held in GC-scanned memory, right? Are the number of such references
 also increased linearly?
 
 Well, first of all, the benchmark I posted seems to indicate otherwise.
   Second of all, I was running this program before on yeast DNA and it
 was ridiculously fast.  Then I tried to do the same thing on human DNA
 and it became slow as molasses.  Roughly speaking, w/o getting into the
 biology much, I've got one string for each gene.  Yeast have about 1/3
 as many genes as humans, but the genes are on average about 100 times
 smaller.  Therefore, the difference should be at most a small constant
 factor and in actuality it's a huge constant factor.
 
 Note:  I know I could make the program in question a lot more space
 efficient, and that's what I ended up doing.  It works now.  It's just
 that it was originally written for yeast, where space efficiency is
 obviously not a concern, and I would have liked to just try a one-off
 calculation on the human genome without having to rewrite portions of
 it.

Probably one reason for this behavior is the lack of testing. My desktop 
only has 24 GB of DDR3. I have another machine with 16 GB of DDR2, but 
don't know how to combine the address spaces via clustering. This would 
also horribly drag down GC performance. Even JVM is badly tuned for 
larger systems, they might use the Azul Java runtimes instead..


Re: O(N) Garbage collection?

2011-02-19 Thread Steven Schveighoffer

On Sat, 19 Feb 2011 00:03:27 -0500, dsimcha dsim...@yahoo.com wrote:

I've been trying out D's new 64-bit compiler and a serious barrier to  
using it effectively seems to be abysmal garbage collection performance  
with large heaps. It seems like the time for a garbage collection to run  
scales linearly with the size of the heap *even if most of the heap is  
marked as NO_SCAN*.  I'm running a program with a heap size of ~6GB,  
almost all of which is strings (DNA sequences), which are not scanned by  
the GC.  It's spending most of its time in GC, based on pausing it every  
once in a while in GDB and seeing what's at the top of the stack.


Here's a test program and the results for a few runs.

import std.stdio, std.datetime, core.memory, std.conv;

void main(string[] args) {
 if(args.length  2) {
 stderr.writeln(Need size.);
 return;
 }

 immutable mul = to!size_t(args[1]);
 auto ptr = GC.malloc(mul * 1_048_576, GC.BlkAttr.NO_SCAN);

 auto sw = StopWatch(autoStart);
 GC.collect();
 immutable msec = sw.peek.msecs;
 writefln(Collected a %s megabyte heap in %s milliseconds.,
 mul, msec);
}

Outputs for various sizes:

Collected a 10 megabyte heap in 1 milliseconds.
Collected a 50 megabyte heap in 4 milliseconds.
Collected a 200 megabyte heap in 16 milliseconds.
Collected a 500 megabyte heap in 41 milliseconds.
Collected a 1000 megabyte heap in 80 milliseconds.
Collected a 5000 megabyte heap in 397 milliseconds.
Collected a 1 megabyte heap in 801 milliseconds.
Collected a 3 megabyte heap in 2454 milliseconds.
Collected a 5 megabyte heap in 4096 milliseconds.

Note that these tests were run on a server with over 100 GB of physical  
RAM, so a shortage of physical memory isn't the problem.  Shouldn't GC  
be O(1) with respect to the size of the unscanned portion of the heap?


Having recently constructed the GC model in my head (and it's rapidly  
deteriorating from there, believe me), here is a stab at what I see could  
be a bottleneck.


The way the GC works is you have this giant loop (in pseudocode):
bool changed;
while(changed)
{
   changed = false;

   foreach(memblock in heap)
   {
  if(memblock.marked  memblock.containsPointers)
 foreach(pointer in memblock)
 {
auto memblock2 = heap.findBlock(pointer);
if(memblock2  !memblock2.marked)
{
   memblock2.mark();
   changed = true;
}
 }
   }
}

So you can see two things.  First, every iteration of the outer while loop  
loops through *all* memory blocks, even if they do not contain pointers.   
This has a non-negligible cost.
Second, there looks like the potential for the while loop to mark one, or  
at least a very small number, of blocks, so the algorithm worst case  
degenerates into O(n^2).  This may not be happening, but it made me a  
little uneasy.


The part in my mind that already deteriorated is whether marked blocks  
which have already been scanned are scanned again.  I would guess not, but  
if that's not the case, that would be a really easy thing to fix.


Also note that the findPointer function is a binary search I think.  So  
you are talking O(lg(n)) there, not O(1).


Like I said, I may not remember exactly how it works.

-steve


Re: O(N) Garbage collection?

2011-02-19 Thread Nick Sabalausky
retard r...@tard.com.invalid wrote in message 
news:ijp7pa$1d34$1...@digitalmars.com...
 Sat, 19 Feb 2011 14:32:27 -0500, dsimcha wrote:

 On 2/19/2011 12:50 PM, Ulrik Mikaelsson wrote:
 Just a thought; I guess the references to the non-GC-scanned strings
 are held in GC-scanned memory, right? Are the number of such references
 also increased linearly?

 Well, first of all, the benchmark I posted seems to indicate otherwise.
   Second of all, I was running this program before on yeast DNA and it
 was ridiculously fast.  Then I tried to do the same thing on human DNA
 and it became slow as molasses.  Roughly speaking, w/o getting into the
 biology much, I've got one string for each gene.  Yeast have about 1/3
 as many genes as humans, but the genes are on average about 100 times
 smaller.  Therefore, the difference should be at most a small constant
 factor and in actuality it's a huge constant factor.

 Note:  I know I could make the program in question a lot more space
 efficient, and that's what I ended up doing.  It works now.  It's just
 that it was originally written for yeast, where space efficiency is
 obviously not a concern, and I would have liked to just try a one-off
 calculation on the human genome without having to rewrite portions of
 it.

 Probably one reason for this behavior is the lack of testing. My desktop
 only has 24 GB of DDR3. I have another machine with 16 GB of DDR2, but
 don't know how to combine the address spaces via clustering. This would
 also horribly drag down GC performance. Even JVM is badly tuned for
 larger systems, they might use the Azul Java runtimes instead..

*Only* 24GB of DDR3, huh? :)

Makes me feel like a pauper: I recently upgraded from 1GB to 2GB of DDR1 ;) 
(It actually had been 2GB a few years ago, but I cannablized half of it to 
build my Linux box.)

Out of curiosity, what are you running on that? (Multiple instances of 
Crysis? High-definition voxels?)




Re: O(N) Garbage collection?

2011-02-19 Thread Nick Sabalausky
dsimcha dsim...@yahoo.com wrote in message 
news:ijp61d$1bu1$1...@digitalmars.com...
 On 2/19/2011 12:50 PM, Ulrik Mikaelsson wrote:
 Just a thought; I guess the references to the non-GC-scanned strings
 are held in GC-scanned memory, right? Are the number of such
 references also increased linearly?

 Well, first of all, the benchmark I posted seems to indicate otherwise. 
 Second of all, I was running this program before on yeast DNA and it was 
 ridiculously fast.  Then I tried to do the same thing on human DNA and it 
 became slow as molasses.  Roughly speaking, w/o getting into the biology 
 much, I've got one string for each gene.  Yeast have about 1/3 as many 
 genes as humans, but the genes are on average about 100 times smaller. 
 Therefore, the difference should be at most a small constant factor and in 
 actuality it's a huge constant factor.


Out of curiosity, roughly how many, umm characters (I forget the technical 
term for each T, G, etc), are in each yeast gene, and how many genes do they 
have? (Humans have, umm, was it 26? My last biology class was ages ago.)

 Note:  I know I could make the program in question a lot more space 
 efficient, and that's what I ended up doing.  It works now.  It's just 
 that it was originally written for yeast, where space efficiency is 
 obviously not a concern, and I would have liked to just try a one-off 
 calculation on the human genome without having to rewrite portions of it.




Re: O(N) Garbage collection?

2011-02-19 Thread dsimcha

On 2/19/2011 10:21 PM, Nick Sabalausky wrote:

Out of curiosity, roughly how many, umm characters (I forget the technical
term for each T, G, etc), are in each yeast gene, and how many genes do they
have? (Humans have, umm, was it 26? My last biology class was ages ago.)



It varies massively, but you can compute the averages yourself.  There 
are about 6,000 yeast genes and about 12 million nucleotides (the 
technical term for characters).  Humans have about 20k to 25k genes, 
and a total of about 3 billion nucleotides, though a lot of this is 
intergenic regions (stuff that isn't genes).


Re: O(N) Garbage collection?

2011-02-19 Thread Nick Sabalausky
dsimcha dsim...@yahoo.com wrote in message 
news:ijq2bl$2opj$1...@digitalmars.com...
 On 2/19/2011 10:21 PM, Nick Sabalausky wrote:
 Out of curiosity, roughly how many, umm characters (I forget the 
 technical
 term for each T, G, etc), are in each yeast gene, and how many genes do 
 they
 have? (Humans have, umm, was it 26? My last biology class was ages ago.)


 It varies massively, but you can compute the averages yourself.  There are 
 about 6,000 yeast genes and about 12 million nucleotides (the technical 
 term for characters).  Humans have about 20k to 25k genes, and a total 
 of about 3 billion nucleotides, though a lot of this is intergenic regions 
 (stuff that isn't genes).

Ahh, I was confusing gene with chromosome (and probably still got the 
exact number wrong ;) ).

Interesting stuff. And that certianly explains the computation-practicality 
difference of yeast vs human: approx 12MB vs 3GB, assuming ASCII/UTF-8.




Re: O(N) Garbage collection?

2011-02-19 Thread retard
Sat, 19 Feb 2011 22:15:52 -0500, Nick Sabalausky wrote:

 retard r...@tard.com.invalid wrote in message
 news:ijp7pa$1d34$1...@digitalmars.com...
 Sat, 19 Feb 2011 14:32:27 -0500, dsimcha wrote:

 On 2/19/2011 12:50 PM, Ulrik Mikaelsson wrote:
 Just a thought; I guess the references to the non-GC-scanned strings
 are held in GC-scanned memory, right? Are the number of such
 references also increased linearly?

 Well, first of all, the benchmark I posted seems to indicate
 otherwise.
   Second of all, I was running this program before on yeast DNA and it
 was ridiculously fast.  Then I tried to do the same thing on human DNA
 and it became slow as molasses.  Roughly speaking, w/o getting into
 the biology much, I've got one string for each gene.  Yeast have about
 1/3 as many genes as humans, but the genes are on average about 100
 times smaller.  Therefore, the difference should be at most a small
 constant factor and in actuality it's a huge constant factor.

 Note:  I know I could make the program in question a lot more space
 efficient, and that's what I ended up doing.  It works now.  It's just
 that it was originally written for yeast, where space efficiency is
 obviously not a concern, and I would have liked to just try a one-off
 calculation on the human genome without having to rewrite portions of
 it.

 Probably one reason for this behavior is the lack of testing. My
 desktop only has 24 GB of DDR3. I have another machine with 16 GB of
 DDR2, but don't know how to combine the address spaces via clustering.
 This would also horribly drag down GC performance. Even JVM is badly
 tuned for larger systems, they might use the Azul Java runtimes
 instead..
 
 *Only* 24GB of DDR3, huh? :)
 
 Makes me feel like a pauper: I recently upgraded from 1GB to 2GB of DDR1
 ;) (It actually had been 2GB a few years ago, but I cannablized half of
 it to build my Linux box.)
 
 Out of curiosity, what are you running on that? (Multiple instances of
 Crysis? High-definition voxels?)

The largest processes are virtual machines, application servers, web 
server, IDE environment, several compiler instances in parallel. The Web 
browser also seems to have use for a gigabyte or two these days. As I 
recall, the memory was cheaper than now when I bought it. It's also 
cheaper than DDR2 or DDR (per gigabyte).