On Friday, 18 May 2012 at 07:52:57 UTC, Mehrdad wrote:
On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer
wrote:
2. I realized, buffering input stream of type T is actually an
input range of type T[].
The trouble is, why a slice? Why not an std.array.Array? Why
not some other data source?
(Check/egg problem....)
Another problem I've noticed is the following:
Say you're tokenizing some input range, and it happens to just
be a huge, gigantic string.
It *should* be possible to turn it into tokens with slices
referring to the ORIGINAL string, which is VERY efficient
because it doesn't require *any* heap allocations whatsoever.
(You just tokenize with opApply() as you go, without every
requiring a heap allocation...)
However, this is *only* possible if you don't use the concept
of an input range!
Since you can't slice an input range, you'd be forced to use
the front() and popFront() properties. But, as soon as you do
that, you're gonna have to store the data somewhere... so your
next-best option is to append it to some new gigantic array
(instead of a bunch of small arrays, which require a lot of
heap allocations), but even then, it's not as efficient as
possible, because there's O(n) extra memory involved -- which
defeats the whole purpose of working on small chunks at a time
with no heap allocations.
(If you're going to do that, after all, you might as well read
the entire thing into a giant string at the beginning, and work
with an array anyway, discarding the whole idea of a range
while doing your tokenization.)
Any ideas on how to solve this problem?
Provide slicing if underlying data source is compatible.
I have the same need in my DCT, and so far I went with a custom
implementation (not on Github yet), but plan to reuse std.io as
soon as it will be more or less stable and usable.