On 05.04.2012 20:35, BLM wrote:
On Thursday, 5 April 2012 at 15:30:45 UTC, Vladimir Panteleev wrote:


The GC can't really know which parts of the array you're using. For
example, your only reference to the array might be a pointer, and you
might be traversing the array in either direction, only keeping count
of the remaining bytes until the array boundary.

Consider .dup-ing the slices you're going to need, or using std.mmfile
to map the file into memory - in that case, the OS won't load the
unnecessary parts of the file into memory in the first place.

I had considered using .dup, but I wanted to minimize overhead. I should
probably look into std.mmfile or pull the data out in smaller chunks
that the GC can handle individually.

Another idea is to copy out interesting parts of the original chunk to a separate storage array. This array will contain your sliced-out data just packed more tightly. If you have a upper bound on % of useful bytes then you can get away without extra allocations.

The tricky part is reallocating this storage array, as it will make slices that point to it dangling (and keeping GC from deallocation), a workaround would be to use pure index-based "slices" that work on this block only.

If the GC can distinguish between pointers and slices, it should
theoretically be able to prune an array that is only referenced by
slices, but I'm not sure how well that would fit into the current GC
system.


--
Dmitry Olshansky

Reply via email to