On 2009-06-07 02:23:47 +0200, Lionello Lunesu <l...@lunesu.remove.com> said:

Vladimir Panteleev wrote:
On Sat, 06 Jun 2009 18:12:58 +0300, Jarrett Billingsley <jarrett.billings...@gmail.com> wrote:

On Sat, Jun 6, 2009 at 8:03 AM, Vladimir
Panteleev<thecybersha...@gmail.com> wrote:
// Works for DMD1/Phobos, DMD1/Tango and DMD2/Phobos
version(Tango) import tango.io.Console;
else           import std.stdio;

struct S
{
       ubyte[40_000] data;
}

void main()
{
       S[] a;
       a ~= S();

       // QUESTION: How much memory will this program consume upon reaching
this point?
       version(Tango) Cin.copyln();
       else           readln();
}


There seems to be something wrong with the newCapacity function that
_d_arrayappendcT calls.  From an element size of 20000 (I halved it
just to make the allocation faster) and an array length of 1, it
somehow calculates the new size to be 266686600.  Hm.  That seems a
bit off.

It seems this line:

long mult = 100 + (1000L * size) / log2plus1(newcap);

is to blame.  I don't think large value types were taken into account
here.  The resulting multiplier is 1,333,433, which is hilariously
large.

Bulls-eye. I think mr. Dave Fladebo deserves a math lesson or something, and mr. Walter Bright should review patches that go into the language's runtime more carefully. :P

Can we discuss a suitable replacement? I suggested in #d that we look at how other platforms do it. For example, Java's Vector just doubles its capacity by default ( http://java.sun.com/javase/6/docs/api/java/util/Vector.html ).


In my own array classes I'm using Python's: size += max(size>>3, 8);
Using this, you won't end up wasting a lot of memory. Preallocating is always preferred anyway.

L.

actually the original function proposed by Dave Fladebo is not so bad, it calculates the average storage requirement per bit (total_size/num_bits) and tries to allocate that space more. The advantage is that it stays exponential like which means ~O(log(n)) (maybe log(n)*log(log(n))) reallocations needed when continuously adding, but tries to waste less and less space the larger the array is. I did propose two "reasonable" possibilities, that I give again here in a cleaner way
{{{
           const int b=2;
           version(A){
             const a =1000L;
           } else {
             const a=100L;
           }

           static int log2plusB(size_t c)
           {
               int i=b;
               while(c >>= 1){
                   ++i;
               }
               return i;
           }
           variant(A)
             long mult = 100 + a*b / log2plusB(newcap);
           } else {
             long mult = 100 + a*b / log2plusB(newlength);
           }

// testing shows 1.02 for large arrays is about the point of diminishing return
           if (mult < 102)
               mult = 102;
           newext = cast(size_t)((newcap * mult) / 100);
           newext += size-(newext % size);
}}}
version A uses the total memory, the other the number of elements.

Increasing a makes the the amount of reserved space larger (it is the maximum amount of extra reserved space in %). The value of a I choose is different from one version to the other because min(log2(newcap))=~12, min(log2(newlength))=1

Increasing b makes the decrease flatter, and goes closer to treating all sizes with the same extra space. For small b the decrease for larger arrays is faster.

In general I think that the functional form is sound, and it is superior to either fixed "double the size" or "add a constant size". This approach should be used also in other places that need to periodically reallocate.

The question is which values of a and b are optimal, and if it is better to use the total size or the number of elements. Now if someone would to some meaningful benchmarks (on real applications), then I would be very happy, otherwise I would just take one of my guesses...

Fawzi

Reply via email to