On Friday, 18 May 2012 at 07:52:57 UTC, Mehrdad wrote:
On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer wrote:
2. I realized, buffering input stream of type T is actually an input range of type T[].

The trouble is, why a slice? Why not an std.array.Array? Why not some other data source?
(Check/egg problem....)




Another problem I've noticed is the following:


Say you're tokenizing some input range, and it happens to just be a huge, gigantic string.

It *should* be possible to turn it into tokens with slices referring to the ORIGINAL string, which is VERY efficient because it doesn't require *any* heap allocations whatsoever. (You just tokenize with opApply() as you go, without every requiring a heap allocation...)

However, this is *only* possible if you don't use the concept of an input range!

Since you can't slice an input range, you'd be forced to use the front() and popFront() properties. But, as soon as you do that, you're gonna have to store the data somewhere... so your next-best option is to append it to some new gigantic array (instead of a bunch of small arrays, which require a lot of heap allocations), but even then, it's not as efficient as possible, because there's O(n) extra memory involved -- which defeats the whole purpose of working on small chunks at a time with no heap allocations. (If you're going to do that, after all, you might as well read the entire thing into a giant string at the beginning, and work with an array anyway, discarding the whole idea of a range while doing your tokenization.)


Any ideas on how to solve this problem?
Provide slicing if underlying data source is compatible.

I have the same need in my DCT, and so far I went with a custom implementation (not on Github yet), but plan to reuse std.io as soon as it will be more or less stable and usable.

Reply via email to