Re: Pop quiz [memory usage]

Fawzi Mohamed Sun, 07 Jun 2009 02:36:05 -0700

On 2009-06-07 02:23:47 +0200, Lionello Lunesu <l...@lunesu.remove.com> said:

Vladimir Panteleev wrote:
On Sat, 06 Jun 2009 18:12:58 +0300, Jarrett Billingsley<jarrett.billings...@gmail.com> wrote:
On Sat, Jun 6, 2009 at 8:03 AM, Vladimir
Panteleev<thecybersha...@gmail.com> wrote:
// Works for DMD1/Phobos, DMD1/Tango and DMD2/Phobos
version(Tango) import tango.io.Console;
else           import std.stdio;

struct S
{
       ubyte[40_000] data;
}

void main()
{
       S[] a;
       a ~= S();

       // QUESTION: How much memory will this program consume upon reaching
this point?
       version(Tango) Cin.copyln();
       else           readln();
}
There seems to be something wrong with the newCapacity function that
_d_arrayappendcT calls.  From an element size of 20000 (I halved it
just to make the allocation faster) and an array length of 1, it
somehow calculates the new size to be 266686600.  Hm.  That seems a
bit off.

It seems this line:

long mult = 100 + (1000L * size) / log2plus1(newcap);

is to blame.  I don't think large value types were taken into account
here.  The resulting multiplier is 1,333,433, which is hilariously
large.
Bulls-eye. I think mr. Dave Fladebo deserves a math lesson orsomething, and mr. Walter Bright should review patches that go into thelanguage's runtime more carefully. :P
Can we discuss a suitable replacement? I suggested in #d that we lookat how other platforms do it. For example, Java's Vector just doublesits capacity by default (http://java.sun.com/javase/6/docs/api/java/util/Vector.html ).
In my own array classes I'm using Python's: size += max(size>>3, 8);
Using this, you won't end up wasting a lot of memory. Preallocating isalways preferred anyway.
L.

actually the original function proposed by Dave Fladebo is not so bad,it calculates the average storage requirement per bit(total_size/num_bits) and tries to allocate that space more.The advantage is that it stays exponential like which means ~O(log(n))(maybe log(n)*log(log(n))) reallocations needed when continuouslyadding, but tries to waste less and less space the larger the array is.I did propose two "reasonable" possibilities, that I give again here ina cleaner way

{{{
           const int b=2;
           version(A){
             const a =1000L;
           } else {
             const a=100L;
           }

           static int log2plusB(size_t c)
           {
               int i=b;
               while(c >>= 1){
                   ++i;
               }
               return i;
           }
           variant(A)
             long mult = 100 + a*b / log2plusB(newcap);
           } else {
             long mult = 100 + a*b / log2plusB(newlength);
           }

// testing shows 1.02 for large arrays is about the pointof diminishing return

           if (mult < 102)
               mult = 102;
           newext = cast(size_t)((newcap * mult) / 100);
           newext += size-(newext % size);
}}}
version A uses the total memory, the other the number of elements.

Increasing a makes the the amount of reserved space larger (it is themaximum amount of extra reserved space in %).The value of a I choose is different from one version to the otherbecause min(log2(newcap))=~12, min(log2(newlength))=1

Increasing b makes the decrease flatter, and goes closer to treatingall sizes with the same extra space. For small b the decrease forlarger arrays is faster.

In general I think that the functional form is sound, and it issuperior to either fixed "double the size" or "add a constant size".This approach should be used also in other places that need toperiodically reallocate.

The question is which values of a and b are optimal, and if it isbetter to use the total size or the number of elements.Now if someone would to some meaningful benchmarks (on realapplications), then I would be very happy, otherwise I would just takeone of my guesses...


Fawzi

Re: Pop quiz [memory usage]

Reply via email to