On 2009-06-07 02:23:47 +0200, Lionello Lunesu <l...@lunesu.remove.com> said:
Vladimir Panteleev wrote:
On Sat, 06 Jun 2009 18:12:58 +0300, Jarrett Billingsley
<jarrett.billings...@gmail.com> wrote:
On Sat, Jun 6, 2009 at 8:03 AM, Vladimir
Panteleev<thecybersha...@gmail.com> wrote:
// Works for DMD1/Phobos, DMD1/Tango and DMD2/Phobos
version(Tango) import tango.io.Console;
else import std.stdio;
struct S
{
ubyte[40_000] data;
}
void main()
{
S[] a;
a ~= S();
// QUESTION: How much memory will this program consume upon reaching
this point?
version(Tango) Cin.copyln();
else readln();
}
There seems to be something wrong with the newCapacity function that
_d_arrayappendcT calls. From an element size of 20000 (I halved it
just to make the allocation faster) and an array length of 1, it
somehow calculates the new size to be 266686600. Hm. That seems a
bit off.
It seems this line:
long mult = 100 + (1000L * size) / log2plus1(newcap);
is to blame. I don't think large value types were taken into account
here. The resulting multiplier is 1,333,433, which is hilariously
large.
Bulls-eye. I think mr. Dave Fladebo deserves a math lesson or
something, and mr. Walter Bright should review patches that go into the
language's runtime more carefully. :P
Can we discuss a suitable replacement? I suggested in #d that we look
at how other platforms do it. For example, Java's Vector just doubles
its capacity by default (
http://java.sun.com/javase/6/docs/api/java/util/Vector.html ).
In my own array classes I'm using Python's: size += max(size>>3, 8);
Using this, you won't end up wasting a lot of memory. Preallocating is
always preferred anyway.
L.
actually the original function proposed by Dave Fladebo is not so bad,
it calculates the average storage requirement per bit
(total_size/num_bits) and tries to allocate that space more.
The advantage is that it stays exponential like which means ~O(log(n))
(maybe log(n)*log(log(n))) reallocations needed when continuously
adding, but tries to waste less and less space the larger the array is.
I did propose two "reasonable" possibilities, that I give again here in
a cleaner way
{{{
const int b=2;
version(A){
const a =1000L;
} else {
const a=100L;
}
static int log2plusB(size_t c)
{
int i=b;
while(c >>= 1){
++i;
}
return i;
}
variant(A)
long mult = 100 + a*b / log2plusB(newcap);
} else {
long mult = 100 + a*b / log2plusB(newlength);
}
// testing shows 1.02 for large arrays is about the point
of diminishing return
if (mult < 102)
mult = 102;
newext = cast(size_t)((newcap * mult) / 100);
newext += size-(newext % size);
}}}
version A uses the total memory, the other the number of elements.
Increasing a makes the the amount of reserved space larger (it is the
maximum amount of extra reserved space in %).
The value of a I choose is different from one version to the other
because min(log2(newcap))=~12, min(log2(newlength))=1
Increasing b makes the decrease flatter, and goes closer to treating
all sizes with the same extra space. For small b the decrease for
larger arrays is faster.
In general I think that the functional form is sound, and it is
superior to either fixed "double the size" or "add a constant size".
This approach should be used also in other places that need to
periodically reallocate.
The question is which values of a and b are optimal, and if it is
better to use the total size or the number of elements.
Now if someone would to some meaningful benchmarks (on real
applications), then I would be very happy, otherwise I would just take
one of my guesses...
Fawzi