30-Dec-2013 02:45, Vladimir Panteleev пишет:
On Sunday, 29 December 2013 at 22:02:57 UTC, Dmitry Olshansky wrote:
[snip]

Hmm, just yesterday I was rewriting a parser to use a buffer instead of
loading the whole file in memory, so this is quite timely for me.

Questions:

1. What happens when the distance between the pinned and current
position exceeds the size of the buffer (sliding window)? Is the buffer
size increased, or is the stream rewound if possible and the range
returned by the slice does seeking?

It's expected that the window is increased. The exact implementation may play any dirty tricks it sees fit as long as it can provide a slice over the pinned area. In short - maintain the illusion that the window has increased. I would be against seeking range and would most likely opt for memory-mapped files instead but it all depends on the exact numbers.


2. I don't understand the rationale behind the current semantics of
lookahead/lookbehind. If you want to e.g. peek ahead/behind to find the
first whitespace char, you don't know how many chars to request.

If you want to 'find' just do front/popFront, no?
Or do you specifically want to do array-wise operations?

Wouldn't it be better to make these functions return the ENTIRE
available buffer in O(1)?

Indeed, now I think that 2 overloads would be better:
auto lookahead(size_t n); //exactly n bytes, re-buffering as needed
auto lookahead(); // all that is available in the window, no re-buffering

Similar for lookbehind.

I guess I see the point when applied to regular expressions, where the
user explicitly specifies how many characters to look ahead/behind.

Actually the user doesn't - our lookahead/lookbehind is variable length. One thing I would have to drop is unbound lookbehind, not that it's so critical.

However, I think in most use cases the amount is not known beforehand
(without imposing arbitrary limitations on users like "Thou shalt not
have variable identifiers longer than 32 characters"), so the pattern
would be "try a cheap lookahead/behind, and if that fails, do an
expensive one".

I would say that in case where you need arbitrary-length lookahead:
m = mark, seek + popFront x N, seek(m) should fit the bill.
Or as is the case in regex at the moment - mark once, and use seek back to some position relative to it. In one word - backtracking.

An example of where fixed lookahead rocks:
https://github.com/blackwhale/datapicked/blob/master/dpick/buffer/buffer.d#L421


3. I think ideally the final design would use something like what
std.allocator does with "unbounded" and "chooseAtRuntime" - some uses
might not need lookahead or lookbehind or other features at all, so
having a way to disable the relevant code would benefit those cases.

It makes sense to make lookahead and lookbehind optional.
As for the code - for the moment it doesn't add much and builds on stuff already there. Though I suspect some other implementations would be able to "cut corners" more efficiently.

--
Dmitry Olshansky

Reply via email to