Re: [phobos] Fwd: [Issue 4025] New: Making network with the std.stdio.File interface

Steve Schveighoffer Thu, 08 Apr 2010 13:02:08 -0700


----- Original Message ----
> From: Andrei Alexandrescu <[email protected]>
> 
> On 04/08/2010 01:23 PM, Steve Schveighoffer wrote:
>> The network 
> socket is not a range, it's a File, and File does have
>> primitives 
> such as rawWrite and rawRead, which we can add to and
>> 
> improve.
>> 
>> File offers ranges, but you're not required to 
> use them.
> 
> That's not what I read from Walter's comment...  
> He indicated that
> something like an e.g. zip library should take a range 
> as input.
> This implies that all streams are shoehorned into range 
> form.

If the zip library works with ranges, we can use it for 
> transparently handling in-memory zip manipulation and also zip file 
> manipulation.

Yes, from a library perspective, everything as a range works well.  The problem 
is, does the range interface lend itself well to things that need streams, like 
zip.  Basically, you didn't answer the 'if zip can use ranges' part.  That's 
the part I'm more concerned about.

>> Makes sense. I'm just a bit worried about stdio's 
> poor buffering
>> interface. It only offers setvbuf(), which is quite 
> opaque.
> 
> The only reason to use FILE * as the underlying 
> implementation is to
> be compatible with C's (f)printf.  It makes 
> sense that you only need
> that compatibility for printing to a standard 
> handle.  I think we can
> probably come up with an abstraction layer 
> that uses FILE* only when
> dealing with standard handles.

It's 
> more than printf. There are several I/O routines in stdio, and all use FILE* 
> for 
> both input and output. If a D application mixes calls to C APIs that do I/O 
> with 
> stdin, stdout, and stderr, we need to take a stance on what should 
> happen.

But I'm saying, the times where we need to intermingle with C are only for the 
standard handles, it seems that's what you're saying also, but you worded it in 
a way that makes it sound like you disagree with me...  Confused.

> I don't think that accurately 
> represents what's going on. rawRead does need a fair amount of paraphernalia 
> to 
> work. For example:

// Consume input using rawRead
auto buffer = new 
> ubyte[1024];
size_t read;
while ((read = input.rawRead(buffer).length) 
> > 0) {
   auto usable = buffer[0 .. read];
   ... use usable 
> ...
}

Not that elegant. Compare and contrast with:

// Consume 
> input using a range
foreach (buffer; input.byChunk(1024)) {
    
> ... use buffer ...
}

// Consume input straight from a 
> range
input.bufsize = 1024;
foreach (buffer; input) {
    ... 
> use buffer ...
}

Yes, if your application processes 1024 bytes at a time, it is easier to use a 
range.  That's not the application I'm referring to.  The application I'm 
talking about is when you need to read a different amount of bytes per read, 
such as a varying length packet.  This is not an uncommon situation.

Let's look at that version with your range:

while(!input.empty())
{
   input.bufsize = numtoread;
   input.popFront();
   auto data = input.front();

   // process data.
}

and with File's rawRead:

ubyte buf[MAXSIZE];
ubyte[] data;
while((data = input.rawRead(buf[0..numtoread])).length)
{
   // process data.
}

And look, we can use the stack for buffering!  Plus, we don't have to worry 
about whether the data buffer will be overwritten, we control what buffer is 
used by the input object, so we can manage that less defensively.

Also, let's not forget that you can easily bolt an input range interface on top 
of a file interface (as evidenced by byChunk), but you can't do the opposite.  
For example, reading a packet at a time from a network/file stream given a 
length can easily be implemented with a range on top of a File struct, but not 
easily with a range on top of a range.

> // read N bytes
> 
> source.bufsize = N;
> auto data = source.front();
> 
> source.popFront();

I think it's more often to want to consume stuff in a 
> stream manner, as opposed to attempting to read some isolated bits. Ranges 
> are 
> optimized for the former.

So essentially, the idea is to double-buffer the data, once inside the range 
(to support the front/popFront regime) and once for your application, so you 
can build up enough "chunks" to read the data correctly?  I don't see how this 
moves us towards high performance.  One litmus test for this is, if whatever we 
come up with uses more than one buffer, it is not good enough.

> We need to figure 
> out all this stuff together, but so far I'm not at all convinced that 
> seekable 
> ranges are awkward.

I may not have explained myself well, I don't have a big problem with seekable 
ranges for certain applications, I just don't think they are the primitive that 
should be used for all applications.

-Steve



      
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos
Re: [phobos] Fwd: [Issue 4025] New: Making network with the std.stdio.File interface

Reply via email to