On 23/01/17 11:15, Markus Laker wrote:

A 2GiB disk file caused tinycat.d to use over 4GiB of memory.


When extending arrays, a common approach is to double the size every time you run out of space. This guarantees an amortized O(1) cost of append.

Unfortunately, this also guarantees that we will never have enough space freed by previous copies to reuse existing memory:

100 byte array

increase

100 bytes free
200 bytes array

increase

300 free
400 array

etc. The array will always be bigger than the total amount of space we freed.

If, instead of increasing its size by 100%, we increase it by a smaller percentage of its previous size, we still maintain the amortized O(1) cost (with a multiplier that might be a little higher, but see the trade off). On the other hand, we can now reuse memory. Let's say we increase by 50% each time:

100 array

increase

100 free
150 array

increase

250 free
225 array

increase

475 free
338 array

813 free
507 array

The next increase will require 761 bytes, but we already have 813 free, so we can allocate the new array over the memory already freed from before, reducing the heap size.

Of course, if we integrate the allocating and the move, we could have reused previously used memory starting from allocation 3, but I'm assuming that would also be possible when increasing by 100%, so I'm assuming we can't do that.

Of course, if, instead of 50% we increase by less (say, 20%), we could reuse previously used memory even sooner.

I am assuming that this is the problem that manifests itself in this use scenario. I would suggest solving it at the language level, rather than the library level.

Shachar

Reply via email to