Re: D2 byChunk

Matthias Walter Sat, 11 Dec 2010 09:24:11 -0800


On 12/11/2010 01:00 AM, Christopher Nicholson-Sauls wrote:
> On 12/10/10 22:36, Matthias Walter wrote:
>> On 12/10/2010 09:57 PM, Matthias Walter wrote:
>>> Hi all,
>>>
>>> I currently work on a parser for some file format. I wanted to use the
>>> std.stdio.ByChunk Range to read from a file and extract tokens from the
>>> chunks. Obviously it can happen that the current chunk ends before a
>>> token can be extracted, in which case I can ask for the next chunk from
>>> the Range. In order to keep the already-read part in mind, I need to dup
>>> at least the unprocessed part of the older chunk and concatenate it in
>>> front of the next part or at least write the code that works like they
>>> were concatenated. This looks like a stupid approach to me.
>>>
>>> Here is a small example:
>>>
>>> file contents: "Hello world"
>>> chunks: "Hello w" "orld"
>>>
>>> First I read the token "Hello" from the first chunk and maybe skip the
>>> whitespace. Then I have the "w" (which I need to move away from the
>>> buffer, because ByChunk fill overwrite it) and get "orld".
>>>
>>> My idea was to have a ByChunk-related Object, which the user can tell
>>> how much of the buffer he/she actually used, such that it can move this
>>> data to the beginning of the buffer and append the next chunk. This
>>> wouldn't need further allocations and give the user contiguous data
>>> he/she can work with.
>> I coded something that works like this:
>>
>> foreach (ref ubyte[] data; byBuffer(file, 12))
>> {
>>   writefln("[%s]", cast(string) data);
>>   data = data[$-2 .. $];
>> }
>>
>> The 2nd line in the loop tells ByBuffer that we didn't process the last
>> two chars and would like to get them again along with newly read data.
>> And as long as we do process something, the internal buffer does not get
>> reallocated.
>>
>> It works and respects the formal requirements of ranges. Whether it
>> respects the intended semantics, one can discuss about. Any comments
>> whether the above things make sense or is an evil exploit of the
>> provided syntax sugar?
> I don't think it's a bad approach, but I have a suggestion.
>
> It leaves a lot of room for abuse or misuse if you require the user code
> to modify the data[] array in order to send this "protect some
> characters" message.  I think it would be better to provide an explicit
> function/method that means precisely that.  Maybe return a transparent
> struct wrapping a view to the buffer's data, that further provides a
> function for doing precisely this.
>
> foreach( data; byBuffer( file, 12 )) {
>   // do things with data, decide we need to keep 2 chars
>   data.save( 2 );
> }
>
> Or something like it.  With regards to this, you may want to allow the
> internal buffer to grow (if you aren't already) as needed.  Imagine what
> would otherwise happen if you needed to 'save' the entire current buffer.
>
> -- Chris N-S
Thank you! This is a really good idea. So I basically wrap the
buffer-array and implement it such that the default behavior (without
explicitely doing something) is like the ByChunk mechanism.


Matthias

Re: D2 byChunk

Reply via email to