31-Dec-2013 22:46, Joseph Cassman пишет:
On Tuesday, 31 December 2013 at 09:04:58 UTC, Dmitry Olshansky wrote:
31-Dec-2013 05:53, Joseph Cassman пишет:
On Sunday, 29 December 2013 at 22:02:57 UTC, Dmitry Olshansky wrote:

I'm thinking there might be a way to bridge the new range type with
ForwardRange but not directly as defined at the moment.

A possibility I consider is to separate a Buffer object (not a range),
and let it be shared among views - light-weight buffer-ranges. Then if
we imagine that these light-weight buffer-ranges are working as marks
(i.e. they pin down the buffer) in the current proposal then they
could be forward ranges.

I've created a fork where I've implemented just that.
As a bonus I also tweaked stream primitives so it now works with pipes or whatever input stdin happens to be.

Links stay the same:
Docs: http://blackwhale.github.io/datapicked/dpick.buffer.traits.html
Code: https://github.com/blackwhale/datapicked/tree/fwd-buffer-range/dpick/buffer

The description has largely simplified and the primitive count reduced.

1. A buffer range is a forward range. It has reference semantics.
2. A copy produced by _save_ is an independent view of the underlying buffer (or window). 3. No bytes can be discarded that are seen in some existing view. Thus each reference pins its position in the buffer.
4. 3 new primitives are:
   Range slice(BufferRange r);
Returns a slice of a window between the current range position and r. It must be a random access range.

   ptrdiff_t tell(BufferRange r);
Returns a difference in positions in the window of current range and r. Note that unlike slice(r).length this can be both positive and negative.

   bool seek(ptrdiff_t ofs);
Reset buffer state to an offset from the current position. Return indicates success of the operation. It may fail if there is not enough data, or (if ofs is negative) that this portion of data was already discarded.

5. Lookahead and lookbehind are a extra primitives that were left intact for the moment. Where applicable a range may provide lookahead:

Range lookahead(); //as much as available in the window
Range lookahead(size_t n); // either n exactly or nothing if not

And lookbehind:

Range lookbehind(); //as much as available in the window
Range lookbehind(size_t n); //either n exactly or nothing if not

These should probably be tested as separate traits.

input-source <--> buffer range <--> parser/consumer

Meaning that if we can mix and match parsers with buffer ranges, and
buffer ranges with input sources we had grown something powerful indeed.

Being able to wrap an already-in-use range object with the buffer
interface as you do in the sample code
(https://github.com/blackwhale/datapicked/blob/master/dgrep.d) is good
for composability. Also allows for existing functionality in
std.algorithm to be reused as-is.

It was more about wrapping an array but it's got to integrate well with what we have. I could imagine a use case for buffering an input range.
Then I think a buffer range of anything other then bytes would be in order.

I think the new range type could also be added directly to some new, or
perhaps retrofitted into existing, code to add the new functionality
without sacrificing performance. In that way the internal payload
already used to get the data (say by the input range) could be reused
without having to allocate new memory to support the buffer API.

As one idea of using a buffer range from the start, a function template
by(T) (where T is ubyte, char, wchar, or dchar) could be added to
std.stdio.

IMHO C run-time I/O has no use in D. The amount of work spent on special-casing the non-locking primitives of each C run-time, repeating legacy mistakes (like text mode, codepages and locales) and stumbling on portability problems (getc is a macro we can't have) would have been better spent elsewhere - designing our own I/O framework.

I've put together up something pretty simple and fast for buffer range directly on native I/O:
https://github.com/blackwhale/datapicked/blob/fwd-buffer-range/dpick/buffer/stream.d

It needs a bit better error messages then naked enforce, and a bit of tweaks to memory management. It does runs circles around existing std.stdio already.

It would return a buffer range object providing more
functionality than byChunk or byLine while adding access to the entire
stream of data in a file in a contiguous and yet efficient manner.

Drop 'efficient' if we talk interfacing with C run-time. Otherwise, yes, absolutely.

Seems
to help with the issues faced in processing file data mentioned in
previous comments in this thread.


--
Dmitry Olshansky

Reply via email to