Re: stream interfaces - with ranges

Artur Skawina Fri, 18 May 2012 08:05:48 -0700

On 05/18/12 15:51, kenji hara wrote:
> 2012/5/18 Artur Skawina <art.08...@gmail.com>:
>> On 05/18/12 06:19, kenji hara wrote:
>>> I think range interface is not useful for *efficient* IO. The expected
>>> IO interface will be more *abstract* than range primitives.
>>>
>>> ---
>>> If you use range I/F to read bytes from device, we will always do
>>> blocking IO - even if the device is socket. It is not efficient.
>>>
>>> auto sock = new TcpSocketDevice();
>>> if (sock.empty) { auto e = sock.front; }
>>>   // In empty primitive, we *must* wait the socket gets one or more
>>> bytes or really disconnected.
>>
>> No. 'empty' has to return true only _after_ seeing EOF.
>>
>> Something like 'available' can return the number of elements known
>> to be fetchable w/o blocking. [1]
>>
>>>   // If not, what exactly returns sock.front?
>>
>> EWOULDBLOCK :^)
>>
>> But, yes, it needs to block, as there's no generic way to return
>> EAGAIN/EWOULDBLOCK. This is where the primitive returning a slice
>> comes in - that one /can/ return an empty slice.
>> So '!r.empty && r.fronts.length==0)' is the equivalent to EAGAIN.
>> (and note i'm oversimplifying -- 'fronts' can return something that
>> /acts/ as a slice; which is what i'm in fact are doing)
> 
> OK. If reading bytes from underlying device failed, your 'fronts' can
> return empty slice. I understood.
> But, It is still *not efficient*. The returned slice will specifies a
> buffer controlled by underlying device. If you want to gather bytes
> into one chunk, you must copy bytes from returned slice to your chunk.
> We should reduce copying memories as much as possible.


Depends if your input range supports zero-copy or not. IOW you avoid
the copy iff the range can somehow write the data directly to the caller
provided buffer. This can be true eg for file reads, where you can tell
the read(2) syscall to write into the user buffer. But what if you need to
buffer the stream? An intermediate buffer can become necessary anyway.
But, as i said before, i agree that a caller-provided-buffer-interface
is useful.

   E[] fronts();
   void fronts(ref E[]);

And one can be implemented in terms of the other, ie:

  E[] fronts[] { E[] els; fronts(els); return els; }
  void fronts(ref E[] e) { e[] = fronts()[]; }

depending on which is more efficient. A range can provide

  enum bool HasBuffer = 0 || 1;

so that the user can pick the more suited alternative.

> And, 'put' primitive in output range concept doesn't support non-blocikng 
> write.
> 'put' should consume *all* of given data and write it  to underlying
> device, then it would block.

True, a write-as-much-as-possible-but not-more primitive is needed.

   size_t puts(E[], size_t atleast=size_t.max);

or something like that. (Doing it this way allows for explicit
non-blocking 'puts', ie '(written=puts(els, 0))==0' means EAGAIN.)

> Therefore, whole of range concept doesn't cover non-blocking I/O.

See above.

>>>   // Then using range interface for socket reading enforces blocking
>>> IO. It is *really* inefficient.
>>
>>> I think IO primitives must be distinct from range ones for the reasons
>>> mentioned above...
>>>
>>> I'm designing experimental IO primitives:
>>> https://github.com/9rnsr/dio
>>>
>>> I call the input stream "source", and call output stream "sink".
>>> "source" has a 'pull' primitive, and sink has 'push' primitive, and
>>> they can avoid blocking.
>>> If you want to construct input range interface from "source", you
>>> should use 'ranged' helper function in io.core module. 'ranged'
>>> returns a wrapper object, and in its front method, It reads bytes from
>>> "source", and if the read bytes not sufficient, blocks the input.
>>>
>>> In other words, range is not almighty. We should think distinct
>>> primitives for the IO.
>>
>> Well, your 'pull' and 'push' are just different names for my 'fronts'
>> and 'puts' (modulo the data transfer interface, which can be done both
>> ways using a set of overloads, hence it doesn't matter).
>>
>> I don't see any reason to invent yet another abstraction, when ranges
>> can be made to work with some improvements.
> 
> For efficiency and removing bottlenecks.
> Even today, I / O is the slowest operation in the entire program.
> Providing good primitives for I/O is enough value.
> 
> I have designed the 'pull' and 'push' primitives with two concepts:
> 1. Reduce copying memories as far as possible.
> 2. Control buffer memory under programer side, not device side.

Do you have a contained microbenchmark? It would be easy to compare
both approaches... If you do i'll write one using my scheme - so
far i only did this for inter-thread communication, there's no file
based backend.

>> Ranges are just a convention; not a perfect one, but having /one/, not
>> two or thirteen, is valuable. If you think ranges are flawed the
>> discussion should be about ripping out every trace of them from the
>> language and libraries and replacing them with something better. If
>> you think that would be bad - well, having tens of different incompatible
>> abstractions isn't good either. (and, yes, you can provide glue so that
>> they can interact, but that does not scale well)
> 
> Range concept is good abstraction if underlying container controlls
> ownership. But, in I/O we want to *move* ownership of bytes. Range is
> not designed efficiently for the purpose, IMO.
> 
>> Hmm, how are 'flush()' and 'commit()' supposed to work? Is data lost
>> if you omit one or both of them?
> 
> In my io library, BufferedSink requires three primitives, flush,
> commit, and writable.

But what happens if neither flush nor commit is called?

>> [1] Reminds me:
>>
>>   struct S(T) {
>>      shared T a;
>>      @property size_t available()() { return a; }
>>   }
>>
>> The compiler infers length as 'pure', which, depending on the
                       ^^^^^^
s/length/available/'.

>> definition of 'shared' is wrong. ('shared' /shouldn't/ imply 'volatile',
>> but, as it is now, it does - so omitting a call to 'available' would
>> be wrong)

artur

Re: stream interfaces - with ranges

Reply via email to