One more thing.

It is possible to tweak the numbers based on the overall use. For example, add 100% for arrays smaller than 1MB, 50% for 1MB - 100MB, and 20% for arrays above 100MB big. This would eliminate the performance degradation for almost all users.

Shachar

On 23/01/17 12:44, Shachar Shemesh wrote:
On 23/01/17 11:15, Markus Laker wrote:

A 2GiB disk file caused tinycat.d to use over 4GiB of memory.


When extending arrays, a common approach is to double the size every
time you run out of space. This guarantees an amortized O(1) cost of
append.

Unfortunately, this also guarantees that we will never have enough space
freed by previous copies to reuse existing memory:

100 byte array

increase

100 bytes free
200 bytes array

increase

300 free
400 array

etc. The array will always be bigger than the total amount of space we
freed.

If, instead of increasing its size by 100%, we increase it by a smaller
percentage of its previous size, we still maintain the amortized O(1)
cost (with a multiplier that might be a little higher, but see the trade
off). On the other hand, we can now reuse memory. Let's say we increase
by 50% each time:

100 array

increase

100 free
150 array

increase

250 free
225 array

increase

475 free
338 array

813 free
507 array

The next increase will require 761 bytes, but we already have 813 free,
so we can allocate the new array over the memory already freed from
before, reducing the heap size.

Of course, if we integrate the allocating and the move, we could have
reused previously used memory starting from allocation 3, but I'm
assuming that would also be possible when increasing by 100%, so I'm
assuming we can't do that.

Of course, if, instead of 50% we increase by less (say, 20%), we could
reuse previously used memory even sooner.

I am assuming that this is the problem that manifests itself in this use
scenario. I would suggest solving it at the language level, rather than
the library level.

Shachar

Reply via email to