Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Tue, 15 May 2012 19:43:05 -0400, Sean Kelly s...@invisibleduck.org  
wrote:


One thing I'd like in a buffered input API is a way to perform  
transactional reads such that if the full read can't be performed, the  
read state remains unchanged. The best you can do with most APIs is to  
check for a desired length, but what I'd I don't want to read until a  
full line is available, and I don't know the exact length?  Typically,  
you end up having to double buffer, which stinks.


My new design supports this.  I have a function called readUntil:

https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832

Essentially, it reads into its buffer until the condition is satisfied.   
Therefore, you are not double buffering.  The return value is a slice of  
the buffer.


There is a way to opt-out of reading any data if you determine you cannot  
do a full read.  Just return 0 from the delegate.


-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Mon, 14 May 2012 22:56:08 -0400, Walter Bright  
newshou...@digitalmars.com wrote:



On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
I keep trying to avoid talking about this, because I'm writing a  
replacement
library for std.stream, and I don't want to step on any toes while it's  
still

not accepted.

But I have to say, ranges are *not* a good interface for generic data  
providers.

They are *very* good for structured data providers.

In other words, a stream of bytes, not a good range (who wants to get  
one byte

at a time?). A stream of UTF text broken into lines, a very good range.

I have no problem with getting rid of std.stream. I've never actually  
used it.
Still, we absolutely need a non-range based low-level streaming  
interface to
data. If nothing else, we need something we can build ranges upon, and  
I think

my replacement does a very good job of that.


I'll say in advance without seeing your design that it'll be a tough  
sell if it is not range based.


I've been doing some range based work on the side. I'm convinced there  
is enormous potential there, despite numerous shortcomings with them I  
ran across in Phobos. Those shortcomings can be fixed, they are not  
fatal.


The ability to do things like:

  void main() {
   stdin.byChunk(1024).
  map!(a = a.idup). // one of those shortcomings
  joiner().
  stripComments().
  copy(stdout.lockingTextWriter());
  }


I think we may have a misunderstanding.  My design is not range-based, but  
supports ranges, and actually makes them very easy to implement.


byChunk is a perfect example of good range -- it defines a specific  
criteria for determining an element of data, appropriate for specific  
situations.


But it must be built on top of something that allows reading arbitrary  
amounts of data.  At the lowest level, this is the OS file  
descriptor/HANDLE.


To be efficient, it should be based on a buffering stream.  That buffering  
stream *does not* need to be a range, and I don't think shoehorning such a  
construct into a range interface makes any sense.


To make this clear, I can say that any range File supports, my design will  
support *as a range*.


To make it even clearer, the current std.stdio.File structure, which you  
have shown to kick ass with ranges, is *NOT* range-based by my  
definition.


I should note, the output range idiom is directly supported, because the  
output range definition exactly maps to an output stream definition.


-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Walter Bright

On 5/15/2012 4:43 PM, Sean Kelly wrote:

One thing I'd like in a buffered input API is a way to perform transactional
reads such that if the full read can't be performed, the read state remains
unchanged. The best you can do with most APIs is to check for a desired
length, but what I'd I don't want to read until a full line is available, and
I don't know the exact length?  Typically, you end up having to double
buffer, which stinks.


std.stdio.byLine()



Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Walter Bright

On 5/15/2012 3:34 PM, Nathan M. Swan wrote:

I do agree for e.g. with binary data some data can't be read with ranges (when
you need to read small chunks of varying size),


I don't see why that should be true.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Christophe Travert
Steven Schveighoffer , dans le message (digitalmars.D:167548), a
 My new design supports this.  I have a function called readUntil:
 
 https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832
 
 Essentially, it reads into its buffer until the condition is satisfied.   
 Therefore, you are not double buffering.  The return value is a slice of  
 the buffer.
 
 There is a way to opt-out of reading any data if you determine you cannot  
 do a full read.  Just return 0 from the delegate.

Maybe I already told this some time ago, but I am not very comfortable 
with this design. The process delegate has to maintain an internal 
state, if you want to avoid reading everything again. It will be 
difficult to implement those process delegates. Do you have an example 
of moderately complicated reading process to show us it is not too 
complicated?

To avoid this issue, the design could be reversed: A method that would 
like to read a certain amount of character could take a delegate from 
the stream, which provides additionnal bytes of data.

Example:
// create a T by reading from stream. returns true if the T was 
// successfully created, and false otherwise.
bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t);

The stream delegate returns a buffer of data to read from when called 
with consumed==0. It must return additionnal data when called 
repeatedly. When it is called with a consumed != 0, the corresponding 
amount of consumed bytes can be discared from the buffer.

This stream delegate (if should have a better name) should not be more 
difficult to implement than readUntil, but makes it more easy to use by 
the client. Did I miss some important information ?

-- 
Christophe


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
newshou...@digitalmars.com wrote:



On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
I do agree for e.g. with binary data some data can't be read with  
ranges (when

you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?

-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Sean Kelly
On May 16, 2012, at 6:52 AM, Walter Bright newshou...@digitalmars.com wrote:

 On 5/15/2012 4:43 PM, Sean Kelly wrote:
 One thing I'd like in a buffered input API is a way to perform transactional
 reads such that if the full read can't be performed, the read state remains
 unchanged. The best you can do with most APIs is to check for a desired
 length, but what I'd I don't want to read until a full line is available, and
 I don't know the exact length?  Typically, you end up having to double
 buffer, which stinks.
 
 std.stdio.byLine()

That was just an example. What if I want to do a formatted read and I'm reading 
from a file that someone else is writing to?  I don't want to block or get a 
partial result and an EOF that needs to be reset. 

Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 10:03:42 -0400, Christophe Travert  
trav...@phare.normalesup.org wrote:



Steven Schveighoffer , dans le message (digitalmars.D:167548), a

My new design supports this.  I have a function called readUntil:

https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832

Essentially, it reads into its buffer until the condition is satisfied.
Therefore, you are not double buffering.  The return value is a slice of
the buffer.

There is a way to opt-out of reading any data if you determine you  
cannot

do a full read.  Just return 0 from the delegate.


Maybe I already told this some time ago, but I am not very comfortable
with this design. The process delegate has to maintain an internal
state, if you want to avoid reading everything again. It will be
difficult to implement those process delegates.


The delegate is given which portion has already been processed, that is  
the 'start' parameter.  If you can use this information, it's highly  
useful.


If you need more context, yes, you have to store it elsewhere, but you do  
have a delegate which contains a context pointer.  In a few places (take a  
look at TextStream's readln  
https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2149) I use  
inner functions that have access to the function call's frame pointer in  
order to configure or store data.



Do you have an example
of moderately complicated reading process to show us it is not too
complicated?


The most complicated I have so far is reading UTF data as a range of dchar:

https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2209

Note that I hand-inlined all the decoding because using std.utf or the  
runtime was too slow, so although it looks huge, it's pretty basic stuff,  
and can largely be ignored for the terms of this discussion.  The  
interesting part is how it specifies what to consume and what not to.


I realize it's a different way of thinking about how to do I/O, but it  
gives more control to the buffer, so it can reason about how best to  
buffer things.  I look at as a way of the buffered stream saying I'll  
read some data, you tell me when you see something interesting, and I'll  
give you a slice to it.  The alternative is to double-buffer your data.   
Each call to read can invalidate the previously buffered data.  But  
readUntil guarantees the data is contiguous and consumed all at once, no  
need to double-buffer




To avoid this issue, the design could be reversed: A method that would
like to read a certain amount of character could take a delegate from
the stream, which provides additionnal bytes of data.

Example:
// create a T by reading from stream. returns true if the T was
// successfully created, and false otherwise.
bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t);

The stream delegate returns a buffer of data to read from when called
with consumed==0. It must return additionnal data when called
repeatedly. When it is called with a consumed != 0, the corresponding
amount of consumed bytes can be discared from the buffer.


I can see use cases for both your method and mine.

I think I can implement your idea in terms of mine.  I might just do  
that.  The only thing missing is, you need a way to specify to the  
delegate that it needs more data.  Probably using size_t.max as an  
argument.


In fact, I need a peek function anyways, your function will provide that  
ability as well.


-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Robert Clipsham

On 16/05/2012 15:38, Steven Schveighoffer wrote:

On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
newshou...@digitalmars.com wrote:


On 5/15/2012 3:34 PM, Nathan M. Swan wrote:

I do agree for e.g. with binary data some data can't be read with
ranges (when
you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?

-Steve


A bit ugly but:

// Default to 4 byte chunks
auto range = myStream.byChunks(4);
foreach (chunk; range) {
   // Set the next chunk is 3 bytes
   // Chunk after is 4 bytes
   range.nextChunkSize = 3;

   // Next chunk is always 5 bytes
   range.chunkSize = 5;
}


--
Robert
http://octarineparrot.com/


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham  
rob...@octarineparrot.com wrote:



On 16/05/2012 15:38, Steven Schveighoffer wrote:

On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
newshou...@digitalmars.com wrote:


On 5/15/2012 3:34 PM, Nathan M. Swan wrote:

I do agree for e.g. with binary data some data can't be read with
ranges (when
you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?

-Steve


A bit ugly but:

// Default to 4 byte chunks
auto range = myStream.byChunks(4);
foreach (chunk; range) {
// Set the next chunk is 3 bytes
// Chunk after is 4 bytes
range.nextChunkSize = 3;

// Next chunk is always 5 bytes
range.chunkSize = 5;
}


Yeah, I've seen this before.  It's not convincing.

-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Dmitry Olshansky

On 16.05.2012 19:32, Steven Schveighoffer wrote:

On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham
rob...@octarineparrot.com wrote:


On 16/05/2012 15:38, Steven Schveighoffer wrote:

On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
newshou...@digitalmars.com wrote:


On 5/15/2012 3:34 PM, Nathan M. Swan wrote:

I do agree for e.g. with binary data some data can't be read with
ranges (when
you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?

-Steve


A bit ugly but:

// Default to 4 byte chunks
auto range = myStream.byChunks(4);
foreach (chunk; range) {
// Set the next chunk is 3 bytes
// Chunk after is 4 bytes
range.nextChunkSize = 3;

// Next chunk is always 5 bytes
range.chunkSize = 5;
}


Yeah, I've seen this before. It's not convincing.



Yes, It's obvious that files do *not* generally follow range of items 
semantic. I mean not even range of various items.
In case of binary data it's most of the time header followed by various 
data. Or hierarchical structure. Or table of links + raw data.

Or whatever. I've yet to see standard way to deal with binary formats :)


--
Dmitry Olshansky


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 11:48:32 -0400, Dmitry Olshansky  
dmitry.o...@gmail.com wrote:



On 16.05.2012 19:32, Steven Schveighoffer wrote:

On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham
rob...@octarineparrot.com wrote:

A bit ugly but:

// Default to 4 byte chunks
auto range = myStream.byChunks(4);
foreach (chunk; range) {
// Set the next chunk is 3 bytes
// Chunk after is 4 bytes
range.nextChunkSize = 3;

// Next chunk is always 5 bytes
range.chunkSize = 5;
}


Yeah, I've seen this before. It's not convincing.



Yes, It's obvious that files do *not* generally follow range of items  
semantic. I mean not even range of various items.
In case of binary data it's most of the time header followed by various  
data. Or hierarchical structure. Or table of links + raw data.

Or whatever. I've yet to see standard way to deal with binary formats :)


The best solution would be a range that's specific to your format.  My  
solution intends to support that.


But that's only if your format fits within the range of elements model.

Good old fashioned read X bytes needs to be supported, and insisting you  
do this range style is just plain wrong IMO.


-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Walter Bright

On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:

On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com
wrote:


On 5/15/2012 3:34 PM, Nathan M. Swan wrote:

I do agree for e.g. with binary data some data can't be read with ranges (when
you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?


std.byLine() does it.

In general, you can read n bytes by calling empty, front, and popFront n times.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Walter Bright

On 5/16/2012 7:49 AM, Sean Kelly wrote:

On May 16, 2012, at 6:52 AM, Walter Brightnewshou...@digitalmars.com
wrote:


On 5/15/2012 4:43 PM, Sean Kelly wrote:

One thing I'd like in a buffered input API is a way to perform
transactional reads such that if the full read can't be performed, the
read state remains unchanged. The best you can do with most APIs is to
check for a desired length, but what I'd I don't want to read until a
full line is available, and I don't know the exact length?  Typically,
you end up having to double buffer, which stinks.


std.stdio.byLine()


That was just an example. What if I want to do a formatted read and I'm
reading from a file that someone else is writing to?  I don't want to block
or get a partial result and an EOF that needs to be reset.


Then you'll need an input range that can be reset - a ForwardRange.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Stewart Gordon

On 16/05/2012 16:59, Walter Bright wrote:

On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:

On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com
wrote:


On 5/15/2012 3:34 PM, Nathan M. Swan wrote:

I do agree for e.g. with binary data some data can't be read with ranges (when
you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?


std.byLine() does it.


And is what you want to do with a text file in many cases.


In general, you can read n bytes by calling empty, front, and popFront n times.


Why would anybody want to read a large binary file _one byte at a time_?

Stewart.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread H. S. Teoh
On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote:
 On 16/05/2012 16:59, Walter Bright wrote:
 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright 
 newshou...@digitalmars.com
 wrote:
 
 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with
 ranges (when you need to read small chunks of varying size),
 
 I don't see why that should be true.
 
 How do you tell front and popFront how many bytes to read?
 
 std.byLine() does it.
 
 And is what you want to do with a text file in many cases.
 
 In general, you can read n bytes by calling empty, front, and
 popFront n times.
 
 Why would anybody want to read a large binary file _one byte at a
 time_?
[...]

import std.range;
byte[] readNBytes(R)(R range, size_t n)
if (isInputRange!R  hasSlicing!R)
{
return R[0..n];
}


T

-- 
MAS = Mana Ada Sistem?


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 11:59:37 -0400, Walter Bright  
newshou...@digitalmars.com wrote:



On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
newshou...@digitalmars.com

wrote:


On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
I do agree for e.g. with binary data some data can't be read with  
ranges (when

you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?


std.byLine() does it.


Have you looked at how std.byLine works?  It certainly does not use a  
range interface as a source.


In general, you can read n bytes by calling empty, front, and popFront n  
times.


I hope you are not serious!  This will make D *the worst performing* i/o  
language.


This should be evidence enough:

steves@steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1  
count=100

100+0 records in
100+0 records out
100 bytes (1.0 MB) copied, 0.74052 s, 1.4 MB/s

real0m0.744s
user0m0.176s
sys 0m0.564s
steves@steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1000  
count=1000

1000+0 records in
1000+0 records out
100 bytes (1.0 MB) copied, 0.00194096 s, 515 MB/s

real0m0.006s
user0m0.000s
sys 0m0.004s

-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Walter Bright

On 5/16/2012 9:41 AM, Stewart Gordon wrote:

On 16/05/2012 16:59, Walter Bright wrote:

On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:

On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com
wrote:


On 5/15/2012 3:34 PM, Nathan M. Swan wrote:

I do agree for e.g. with binary data some data can't be read with ranges (when
you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?


std.byLine() does it.


And is what you want to do with a text file in many cases.


In general, you can read n bytes by calling empty, front, and popFront n times.


Why would anybody want to read a large binary file _one byte at a time_?


You can have that range read from byChunk(). It's really the same thing that C's 
stdio does.




Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Walter Bright

On 5/16/2012 10:18 AM, Steven Schveighoffer wrote:

On Wed, 16 May 2012 11:59:37 -0400, Walter Bright newshou...@digitalmars.com
wrote:


On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:

On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com
wrote:


On 5/15/2012 3:34 PM, Nathan M. Swan wrote:

I do agree for e.g. with binary data some data can't be read with ranges (when
you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?


std.byLine() does it.


Have you looked at how std.byLine works? It certainly does not use a range
interface as a source.


It presents a range interface, though. Not a streaming one.




In general, you can read n bytes by calling empty, front, and popFront n times.


I hope you are not serious! This will make D *the worst performing* i/o 
language.


You can read arbitrary numbers of bytes by tacking a range on after byChunk().


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 13:21:37 -0400, Walter Bright  
newshou...@digitalmars.com wrote:



On 5/16/2012 9:41 AM, Stewart Gordon wrote:

On 16/05/2012 16:59, Walter Bright wrote:

On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
newshou...@digitalmars.com

wrote:


On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
I do agree for e.g. with binary data some data can't be read with  
ranges (when

you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?


std.byLine() does it.


And is what you want to do with a text file in many cases.

In general, you can read n bytes by calling empty, front, and popFront  
n times.


Why would anybody want to read a large binary file _one byte at a time_?


You can have that range read from byChunk(). It's really the same thing  
that C's stdio does.


This is very wrong.  byChunk doesn't cut it.  The number of bytes to  
consume from the stream can depend on any number of factors, including the  
actual data in the stream.  For instance, I challenge you to write an  
efficient (meaning no extra buffering) byLine using byChunk as a base.


-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 13:23:07 -0400, Walter Bright  
newshou...@digitalmars.com wrote:



On 5/16/2012 10:18 AM, Steven Schveighoffer wrote:
On Wed, 16 May 2012 11:59:37 -0400, Walter Bright  
newshou...@digitalmars.com

wrote:


On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
newshou...@digitalmars.com

wrote:


On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
I do agree for e.g. with binary data some data can't be read with  
ranges (when

you need to read small chunks of varying size),


I don't see why that should be true.


How do you tell front and popFront how many bytes to read?


std.byLine() does it.


Have you looked at how std.byLine works? It certainly does not use a  
range

interface as a source.


It presents a range interface, though. Not a streaming one.


But that is *the point*!  The code deciding how much data to read (i.e.  
the entity I referenced above that 'tells front and popFront how many  
bytes to read') is *not* using a range interface.  In other words, ranges  
aren't enough.


Ranges can be built on top of streaming interfaces.  But there is *still*  
a need for a comprehensive streaming toolkit.  And C's streaming toolkit  
is not as good as a native D toolkit can be.




In general, you can read n bytes by calling empty, front, and popFront  
n times.


I hope you are not serious! This will make D *the worst performing* i/o  
language.


You can read arbitrary numbers of bytes by tacking a range on after  
byChunk().


No, this doesn't work in most cases.  See my other post.  You can't get  
everything you want out of just byChunk and byLine.


what about byMySpecificPacketProtocol?

-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Andrei Alexandrescu

On 5/16/12 12:34 PM, Steven Schveighoffer wrote:

In other words, ranges aren't enough.


This is copiously clear to me, but the way I like to think about it is 
by extending the notion of range (with notions such as e.g. 
BufferedRange, LookaheadRange, and such) instead of developing an 
abstraction independent from ranges and then working on stitching that 
with ranges.


Andrei


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Adam D. Ruppe

tbh, I've found byChunk to be less than worthless
in my experience; it's a liability because I still
have to wrap it somehow to real real world files.

Consider reading a series of strings in the format
lengthdata,[...].

I'd like it to be this simple (neglecting priming the loop):

string[] s;
while(!file.eof)) {
ubyte length = file.read!ubyte;
s ~= file.read!string(length);
}


The C fgetc/fread interface can do this reasonably
well.

string[] s;
while(!feof(fp)) {
   ubyte length = fgetc(fp);
   char[] buffer;
   buffer.length = length;
   fread(buffer.ptr, 1, length, fp);
   s ~= assumeUnique(buffer);
}


But, doing it with byChunk is an exercise in pain
that I don't even feel like writing here.




Another problem is consider a network interface. You
want to handle the packets as they come in.

byChunk doesn't work at all because it blocks until it
gets the chunk of the requested size.

foreach(chunk; socket.byChunk(1024))


suppose you get a packet of length 1000 and you have
to answer it. That will block forever.

So, if you use byChunk as the underlying thing to fill
your buffer... you don't get anywhere.


I think a better input primitive is byPacket(max_size).
This works more like the read primitive on the operating
system.

Moreover, I want it to buffer, and control how much is consumed.


auto packetSource = socket.byPacket(1024);
foreach(packet; packetSource) {
   // as soon as some data comes in we can get the length
   if(packet.length  2) continue;
   auto length = packet.peek!(ushort); // neglect endian for now
   if(packet.length  length + 2) continue; // wait for more data

   packet.consume(2);
   handle(packet.consume(length));
}



In addition to the byChunk blocking problem...
what if the length straddles the edge?



byChunk is just a huge hassle to work with for every file
format I've tried so far. byLine is a little better
(some file formats are defined as being line based)
but still a bit of a pain for anything that can spill
into two lines.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 13:48:49 -0400, Andrei Alexandrescu  
seewebsiteforem...@erdani.org wrote:



On 5/16/12 12:34 PM, Steven Schveighoffer wrote:

In other words, ranges aren't enough.


This is copiously clear to me, but the way I like to think about it is  
by extending the notion of range (with notions such as e.g.  
BufferedRange, LookaheadRange, and such) instead of developing an  
abstraction independent from ranges and then working on stitching that  
with ranges.


What I think we would end up with is a streaming API with range primitives  
tacked on.


- empty is clunky, but possible to implement.  However, it may become  
invalid (think of reading a file that is being appended to by another  
process).
- popFront and front do not have any clear definition of what they refer  
to.  The only valid thing I can think of is bytes, and then nobody will  
use them.


That's hardly saying it's range based.  I refuse to believe that people  
will be thrilled by having to 'pre-configure' each front and popFront call  
in order to get work done.  If you want to try and convince me, I'm  
willing to listen, but so far I haven't seen anything that looks at all  
appetizing.


-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Adam D. Ruppe
On Wednesday, 16 May 2012 at 17:48:52 UTC, Andrei Alexandrescu 
wrote:
This is copiously clear to me, but the way I like to think 
about it is by extending the notion of range (with notions such 
as e.g. BufferedRange, LookaheadRange, and such)


I tried this in cgi.d somewhat recently. It ended up
only vaguely looking like a range.

/**
   A slight difference from regular ranges is you can give it 
the maximum

   number of bytes to consume.

   IMPORTANT NOTE: the default is to consume nothing, so if 
you don't call
   consume() yourself and use a regular foreach, it will 
infinitely loop!

*/
   void popFront(size_t maxBytesToConsume = 0 /*size_t.max*/, 
size_t minBytesToSettleFor = 0) {}



I called that a slight different in the comment, but it is
actually a pretty major difference. In practice, it is nothing
like a regular range.

If I defaulted to size_t.max, you could foreach() it, but then
you don't really get to take advantage of the buffer, since it
is cleared out entirely for each iteration.

If it defaults to 0, you can put it in a foreach... but you
have to manually say how much of it is consumed, which no other
range does, meaning it won't work with std.algorithm or anything.


It sorta looks like a range, but isn't actually one at all.




I'm sure something better is possible, but I don't think the range
abstraction is a good fit for this use case.

Of course, providing optional ranges (like how file gives byChunk,
byLine, etc.) is probably a good idea.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread H. S. Teoh
On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:
 On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
 In other words, ranges aren't enough.
 
 This is copiously clear to me, but the way I like to think about it
 is by extending the notion of range (with notions such as e.g.
 BufferedRange, LookaheadRange, and such) instead of developing an
 abstraction independent from ranges and then working on stitching
 that with ranges.
[...]

One direction that _could_ be helpful, perhaps, is to extend the concept
of range to include, let's tentatively call it, a ChunkedRange.
Basically a ChunkedRange implements the usual InputRange operations
(empty, front, popfront) but adds the following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range has at
  least n elements left;

- E[] frontN(R)(R range, int n) - returns a slice containing the front n
  elements from the range: this will buffer the next n elements from the
  range if they aren't already; repeated calls will just return the
  buffer;

- void popN(R)(R range, int n) - discards the first n elements from the
  buffer, thus causing the next call to frontN() to fetch more data if
  necessary.

These are all tentative names, of course. But the idea is that you can
keep N elements of the range in view at a time, without having to
individually read them out and save them in a second buffer, and you can
advance this view once you're done with the current data and want to
move on.

Existing range operations like popFrontN, take, takeExactly, drop, etc.,
can be extended to take advantage of the extra functionality of
ChunkedRanges. (Perhaps popFrontN can even be merged with popN, since
they amount to the same thing.)

Using a ChunkedRange allows you to write functions that parse a
particular range and return a range of chunks (say, a deserializer that
returns a range of objects given a range of bytes).

Thinking on it a bit further, perhaps we can call this a WindowedRange,
since it somewhat resembles the sliding window protocol where you keep a
window of sequential packet ids in an active buffer, and remove them
from the buffer as they get ack'ed (consumed by popN). The buffer thus
acts like a window into the next n elements in the range, which can be
slid forward as data is consumed.


T

-- 
Having a smoking section in a restaurant is like having a peeing section
in a swimming pool. -- Edward Burr 


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh hst...@quickfur.ath.cx  
wrote:



On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:

On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
In other words, ranges aren't enough.

This is copiously clear to me, but the way I like to think about it
is by extending the notion of range (with notions such as e.g.
BufferedRange, LookaheadRange, and such) instead of developing an
abstraction independent from ranges and then working on stitching
that with ranges.

[...]

One direction that _could_ be helpful, perhaps, is to extend the concept
of range to include, let's tentatively call it, a ChunkedRange.
Basically a ChunkedRange implements the usual InputRange operations
(empty, front, popfront) but adds the following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range has at
  least n elements left;

- E[] frontN(R)(R range, int n) - returns a slice containing the front n
  elements from the range: this will buffer the next n elements from the
  range if they aren't already; repeated calls will just return the
  buffer;

- void popN(R)(R range, int n) - discards the first n elements from the
  buffer, thus causing the next call to frontN() to fetch more data if
  necessary.



On such ranges, what would popFront and front do?  I'm assuming since  
frontN and popN are referring to how many elements, and since the most  
logical definition for elements is bytes, that front gets the next byte,  
and popFront discards the next byte.  This seems useless to me.


I still don't get the need to add this to ranges.  The streaming API  
works fine on its own.


But there is an omission with your proposed API regardless -- reading data  
is a mutating event.  It destructively mutates the underlying data stream  
so that you cannot get the data again.  This means you must double-buffer  
data in order to support frontN and popN that are not necessary with a  
simple read API.


For example:

auto buf = new ubyte[100];
stream.read(buf);

does not need to first buffer the data inside the stream and then copy it  
to buf, it can read it from the OS *directly* into buf.


-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Artur Skawina
On 05/16/12 21:38, H. S. Teoh wrote:
 On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:
 On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
 In other words, ranges aren't enough.

 This is copiously clear to me, but the way I like to think about it
 is by extending the notion of range (with notions such as e.g.
 BufferedRange, LookaheadRange, and such) instead of developing an
 abstraction independent from ranges and then working on stitching
 that with ranges.
 [...]
 
 One direction that _could_ be helpful, perhaps, is to extend the concept
 of range to include, let's tentatively call it, a ChunkedRange.
 Basically a ChunkedRange implements the usual InputRange operations
 (empty, front, popfront) but adds the following new primitives:
 
 - bool hasAtLeast(R)(R range, int n) - true if underlying range has at
   least n elements left;
 
 - E[] frontN(R)(R range, int n) - returns a slice containing the front n
   elements from the range: this will buffer the next n elements from the
   range if they aren't already; repeated calls will just return the
   buffer;
 
 - void popN(R)(R range, int n) - discards the first n elements from the
   buffer, thus causing the next call to frontN() to fetch more data if
   necessary.
 
 These are all tentative names, of course. But the idea is that you can
 keep N elements of the range in view at a time, without having to
 individually read them out and save them in a second buffer, and you can
 advance this view once you're done with the current data and want to
 move on.
 
 Existing range operations like popFrontN, take, takeExactly, drop, etc.,
 can be extended to take advantage of the extra functionality of
 ChunkedRanges. (Perhaps popFrontN can even be merged with popN, since
 they amount to the same thing.)
 
 Using a ChunkedRange allows you to write functions that parse a
 particular range and return a range of chunks (say, a deserializer that
 returns a range of objects given a range of bytes).
 
 Thinking on it a bit further, perhaps we can call this a WindowedRange,
 since it somewhat resembles the sliding window protocol where you keep a
 window of sequential packet ids in an active buffer, and remove them
 from the buffer as they get ack'ed (consumed by popN). The buffer thus
 acts like a window into the next n elements in the range, which can be
 slid forward as data is consumed.

Right now, everybody reinvents this, with a slightly different interface...
It's really obvious, needed and just has to be standardized.

A few notes:

hasAtLeast is redundant as it can be better expressed as .length; what would
be the point of wrapping 'r.length=n'? An '.available' property would be
useful to find eg out how much can be consumed w/o blocking, but that one 
should return a size_t too.

'E[] frontN' should have a version that returns all available elements; i 
called it '@property E[] fronts()' here. It's more efficient that way and
doesn't rely on the compiler to inline and optimize the limit checks away.

PopN -- well, its signature here is 'void popFronts(size_t n)', other than
that, there's no difference.

Similar things are necessary for output ranges. Here, what i needed was:

   void put(ref E el)
   void puts(E[] els)
   @property size_t free() // Not the most intuitive name w/o context;
   // returns the number of E's that can be 'put()'
   // w/o blocking.

Note that all of this doesn't address the consume-variable-sized-chunks issue.
But that can now be efficiently handled by another layer on top.


On 05/16/12 22:15, Steven Schveighoffer wrote:
 I still don't get the need to add this to ranges.  The streaming API works 
 fine on its own.

This is not an argument against a streaming API (at least not for me), but
for efficient ranges. With the API above I can shift tens of gigabytes of
data per second between threads. And still use the 'std' range API and
everything that works with it...

 But there is an omission with your proposed API regardless -- reading data is 
 a mutating event.  It destructively mutates the underlying data stream so 
 that you cannot get the data again.  This means you must double-buffer data 
 in order to support frontN and popN that are not necessary with a simple read 
 API.
 
 For example:
 
 auto buf = new ubyte[100];
 stream.read(buf);
 
 does not need to first buffer the data inside the stream and then copy it to 
 buf, it can read it from the OS *directly* into buf.

Sometimes having the buffer managed by 'stream' and 'read()' returning a slice
into it works (this is what 'fronts' above does). Reusing a caller managed
buffer can be useful in other cases, yes. 

artur


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread jerro
One direction that _could_ be helpful, perhaps, is to extend 
the concept

of range to include, let's tentatively call it, a ChunkedRange.
Basically a ChunkedRange implements the usual InputRange 
operations

(empty, front, popfront) but adds the following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range 
has at

  least n elements left;


I think it would be better to have a function that would return
the number of elements left.

- E[] frontN(R)(R range, int n) - returns a slice containing 
the front n
  elements from the range: this will buffer the next n elements 
from the
  range if they aren't already; repeated calls will just return 
the

  buffer;

- void popN(R)(R range, int n) - discards the first n elements 
from the
  buffer, thus causing the next call to frontN() to fetch more 
data if

  necessary.


I like the idea of frontN and popN. But is there any reason why
a type that defines those (let's call it a stream) should also
be a range? I would prefer to have a type that just defines those
two functions, a function that returns the number of available
elements and a functions that tells whether we are at the end of
stream. If you need a range of elements with a blocking popFront,
it's easy to build one on top of it. You can write a functions
that takes any stream and returns a range of element. I think
that's better than  having to write front, popFront, and empty
for every stream.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 16:30:43 -0400, H. S. Teoh hst...@quickfur.ath.cx  
wrote:



On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:

On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh
hst...@quickfur.ath.cx wrote:

[...]
One direction that _could_ be helpful, perhaps, is to extend the  
concept

of range to include, let's tentatively call it, a ChunkedRange.
Basically a ChunkedRange implements the usual InputRange operations
(empty, front, popfront) but adds the following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range has at
  least n elements left;

- E[] frontN(R)(R range, int n) - returns a slice containing the front  
n
  elements from the range: this will buffer the next n elements from  
the

  range if they aren't already; repeated calls will just return the
  buffer;

- void popN(R)(R range, int n) - discards the first n elements from the
  buffer, thus causing the next call to frontN() to fetch more data if
  necessary.


On such ranges, what would popFront and front do?  I'm assuming since
frontN and popN are referring to how many elements, and since the most
logical definition for elements is bytes, that front gets the next
byte, and popFront discards the next byte.  This seems useless to me.


How so? It's still useful for implementing readByte, for example.


readByte is covered by frontN(1).  Why the need for front()?

Let me answer that question for you -- so it can be treated as a normal  
range.  But nobody will want to do that.


i.e. copy to appender will read one byte at a time into the array.


I still don't get the need to add this to ranges.  The streaming API
works fine on its own.

But there is an omission with your proposed API regardless --
reading data is a mutating event.  It destructively mutates the
underlying data stream so that you cannot get the data again.  This
means you must double-buffer data in order to support frontN and
popN that are not necessary with a simple read API.

For example:

auto buf = new ubyte[100];
stream.read(buf);

does not need to first buffer the data inside the stream and then
copy it to buf, it can read it from the OS *directly* into buf.

[...]

The idea is that by asking for N elements at a time instead of calling
front/popFront N times, the underlying implementation can optimize the
request by creating a buffer of size N and have the OS read exactly N
bytes directly into that buffer.

// Reads 1,000,000 bytes into newly allocated buffer and returns
// buffer.
auto buf = stream.frontN(1_000_000);


OK, so stream is providing data via return value and allocation.


// Since 1,000,000 bytes is already read into the buffer, this
// simply returns a slice of the same buffer:
auto buf2 = stream.frontN(1_000_000);


Is buf2 mutable?  If so, this is no good, buf could have mutated this  
data.  But this can be fixed by making the return value of frontN be  
const(ubyte)[].



assert(buf is buf2);

// This consumes the buffer:
stream.popN(1_000_000);


What does consume mean, discard?  Obviously not reuse, due to line  
below...



// This will read another 1,000,000 bytes into a new buffer
auto buf3 = stream.frontN(1_000_000);


OK, you definitely lost me here, this will not fly.  The whole point of  
buffering is to avoid having to reallocate on every read.  If you have to  
allocate every read, buffering is going to have a negative impact on  
performance!


-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Steven Schveighoffer
On Wed, 16 May 2012 16:38:54 -0400, Artur Skawina art.08...@gmail.com  
wrote:



On 05/16/12 22:15, Steven Schveighoffer wrote:
I still don't get the need to add this to ranges.  The streaming API  
works fine on its own.


This is not an argument against a streaming API (at least not for me),  
but

for efficient ranges. With the API above I can shift tens of gigabytes of
data per second between threads. And still use the 'std' range API and
everything that works with it...


But you never would want to.  Don't get me wrong, the primitives here  
could work for a streaming API (I haven't implemented it that way, but it  
could be made to work).  But the idea that it must *also* be a std.range  
input range makes zero sense.


To me, this is as obvious as not supporting linklist[index];  Sure, it can  
be done, but who would ever use it?


-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Andrei Alexandrescu

On 5/16/12 1:00 PM, Steven Schveighoffer wrote:

What I think we would end up with is a streaming API with range
primitives tacked on.

- empty is clunky, but possible to implement. However, it may become
invalid (think of reading a file that is being appended to by another
process).
- popFront and front do not have any clear definition of what they refer
to. The only valid thing I can think of is bytes, and then nobody will
use them.

That's hardly saying it's range based. I refuse to believe that people
will be thrilled by having to 'pre-configure' each front and popFront
call in order to get work done. If you want to try and convince me, I'm
willing to listen, but so far I haven't seen anything that looks at all
appetizing.


Where the two meet is in the notion of buffered streams. Ranges are by 
default buffered, i.e. user code can call front() several times without 
an intervening popFront() and get the same thing. So a range has by 
definition a buffer of at least one element.


That makes the range interface unsuitable for strictly UNbuffered 
streams. On the other hand, a range could no problem offer OPTIONAL 
unbuffered reads (the existence of a buffer does not preclude 
availability of unbuffered transfers).


So to tie it all nicely I think we need:

1. A STRICTLY UNBUFFERED streaming abstraction

2. A notion of range that supports unbuffered transfers.


Andrei


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread H. S. Teoh
On Wed, May 16, 2012 at 04:52:09PM -0400, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 16:30:43 -0400, H. S. Teoh
 hst...@quickfur.ath.cx wrote:
 
 On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh
 hst...@quickfur.ath.cx wrote:
 [...]
 One direction that _could_ be helpful, perhaps, is to extend the
 concept of range to include, let's tentatively call it, a
 ChunkedRange.  Basically a ChunkedRange implements the usual
 InputRange operations (empty, front, popfront) but adds the
 following new primitives:
 
 - bool hasAtLeast(R)(R range, int n) - true if underlying range has
   at least n elements left;
 
 - E[] frontN(R)(R range, int n) - returns a slice containing the
   front n elements from the range: this will buffer the next n
   elements from the range if they aren't already; repeated calls
   will just return the buffer;
 
 - void popN(R)(R range, int n) - discards the first n elements from
   the buffer, thus causing the next call to frontN() to fetch more
   data if necessary.
 
 
 On such ranges, what would popFront and front do?  I'm assuming
 since frontN and popN are referring to how many elements, and since
 the most logical definition for elements is bytes, that front gets
 the next byte, and popFront discards the next byte.  This seems
 useless to me.
 
 How so? It's still useful for implementing readByte, for example.
 
 readByte is covered by frontN(1).  Why the need for front()?
 
 Let me answer that question for you -- so it can be treated as a
 normal range.  But nobody will want to do that.
 
 i.e. copy to appender will read one byte at a time into the array.

If this new type of range is recognized by std.range, then the relevant
algorithms can be made to recognize the existence of frontN and make
good use of it, instead of iterating front N times. Then front() can
still be used by stuff that really only wants a single byte at a time.


[...]
 The idea is that by asking for N elements at a time instead of
 calling front/popFront N times, the underlying implementation can
 optimize the request by creating a buffer of size N and have the OS
 read exactly N bytes directly into that buffer.
 
  // Reads 1,000,000 bytes into newly allocated buffer and returns
  // buffer.
  auto buf = stream.frontN(1_000_000);
 
 OK, so stream is providing data via return value and allocation.
 
  // Since 1,000,000 bytes is already read into the buffer, this
  // simply returns a slice of the same buffer:
  auto buf2 = stream.frontN(1_000_000);
 
 Is buf2 mutable?  If so, this is no good, buf could have mutated
 this data.  But this can be fixed by making the return value of
 frontN be const(ubyte)[].
 
  assert(buf is buf2);
 
  // This consumes the buffer:
  stream.popN(1_000_000);
 
 What does consume mean, discard?  Obviously not reuse, due to
 line below...

Yes, discard. That's what popFront does right now for a single element.


  // This will read another 1,000,000 bytes into a new buffer
  auto buf3 = stream.frontN(1_000_000);
 
 OK, you definitely lost me here, this will not fly.  The whole point
 of buffering is to avoid having to reallocate on every read.  If you
 have to allocate every read, buffering is going to have a negative
 impact on performance!
[...]

I thought the whole point of buffering is to avoid excessive roundtrips
to disk I/O.

Though you do have a point that allocating on every read is a bad idea.


T

-- 
Why is it that all of the instruments seeking intelligent life in the universe 
are pointed away from Earth? -- Michael Beibl


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Artur Skawina
On 05/16/12 22:58, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 16:38:54 -0400, Artur Skawina art.08...@gmail.com wrote:
 
 On 05/16/12 22:15, Steven Schveighoffer wrote:
 I still don't get the need to add this to ranges.  The streaming API 
 works fine on its own.

 This is not an argument against a streaming API (at least not for me), but
 for efficient ranges. With the API above I can shift tens of gigabytes of
 data per second between threads. And still use the 'std' range API and
 everything that works with it...
 
 But you never would want to.  Don't get me wrong, the primitives here could 
 work for a streaming API (I haven't implemented it that way, but it could be 
 made to work).  But the idea that it must *also* be a std.range input range 
 makes zero sense.

Well, I do want to. For example, I can pass the produced data to *any* range
consumer, it may be as efficient as mine, but will still work reasonably (I just
did a quick test and the difference seems to be about 10G/s less for plain 
front+popFront consumer).

The goal here is: if we could agree on a standard interface then *any* producer 
and
consumer, including the ones in the std lib could take advantage of this 
(optional)
feature. It's not so much about function call overhead as /syscall/ and 
/locking/
costs. Retrieving or writing 100 elements with only one lock-unlock sequence 
makes
a large difference.

 To me, this is as obvious as not supporting linklist[index];  Sure, it can be 
 done, but who would ever use it?

This is not even related.

Your 'read(ref ubyte[])' approach can actually mean that one more copy of
the data is required. Think writer-range_or_stream-reader -- unless the
reader is already waiting with an empty buffer, the stream has to copy the
data to an internal buffer, which then has to be copied again when a reader
comes around. The 'slice[] = fronts' solution avoids the second copy.
Like I said, depending on the circumstances, sometimes you want one scheme,
sometimes the other. (TBH, right now i can't think of a case where i'd prefer
a non-range based approach; having the same i/f is just so convenient. But 
I'm sure there's one ;) )

artur


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Timon Gehr

On 05/16/2012 11:08 PM, Andrei Alexandrescu wrote:

On 5/16/12 1:00 PM, Steven Schveighoffer wrote:

What I think we would end up with is a streaming API with range
primitives tacked on.

- empty is clunky, but possible to implement. However, it may become
invalid (think of reading a file that is being appended to by another
process).
- popFront and front do not have any clear definition of what they refer
to. The only valid thing I can think of is bytes, and then nobody will
use them.

That's hardly saying it's range based. I refuse to believe that people
will be thrilled by having to 'pre-configure' each front and popFront
call in order to get work done. If you want to try and convince me, I'm
willing to listen, but so far I haven't seen anything that looks at all
appetizing.


Where the two meet is in the notion of buffered streams. Ranges are by
default buffered, i.e. user code can call front() several times without
an intervening popFront() and get the same thing.  So a range has by
definition a buffer of at least one element.



I don't think this necessarily holds. 'front' might be computed on the 
fly, as it is done for std.algorithm.map.



That makes the range interface unsuitable for strictly UNbuffered
streams. On the other hand, a range could no problem offer OPTIONAL
unbuffered reads (the existence of a buffer does not preclude
availability of unbuffered transfers).

So to tie it all nicely I think we need:

1. A STRICTLY UNBUFFERED streaming abstraction

2. A notion of range that supports unbuffered transfers.


Andrei




Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Andrei Alexandrescu

On 5/16/12 4:40 PM, Timon Gehr wrote:

On 05/16/2012 11:08 PM, Andrei Alexandrescu wrote:

On 5/16/12 1:00 PM, Steven Schveighoffer wrote:

What I think we would end up with is a streaming API with range
primitives tacked on.

- empty is clunky, but possible to implement. However, it may become
invalid (think of reading a file that is being appended to by another
process).
- popFront and front do not have any clear definition of what they refer
to. The only valid thing I can think of is bytes, and then nobody will
use them.

That's hardly saying it's range based. I refuse to believe that people
will be thrilled by having to 'pre-configure' each front and popFront
call in order to get work done. If you want to try and convince me, I'm
willing to listen, but so far I haven't seen anything that looks at all
appetizing.


Where the two meet is in the notion of buffered streams. Ranges are by
default buffered, i.e. user code can call front() several times without
an intervening popFront() and get the same thing. So a range has by
definition a buffer of at least one element.



I don't think this necessarily holds. 'front' might be computed on the
fly, as it is done for std.algorithm.map.


It used to be buffered in fact but that was too much trouble. The fair 
thing to say here is that map relies on the implicit buffering of its input.


Andrei




Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Stewart Gordon

On 16/05/2012 18:21, Walter Bright wrote:
snip

You can have that range read from byChunk(). It's really the same thing that 
C's stdio does.


And what if I want it to work on ranges that don't have a byChunk method?

Stewart.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-16 Thread Stewart Gordon

On 16/05/2012 17:48, H. S. Teoh wrote:

On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote:

snip

Why would anybody want to read a large binary file _one byte at a
time_?

[...]

import std.range;
byte[] readNBytes(R)(R range, size_t n)
if (isInputRange!R  hasSlicing!R)
{
return R[0..n];
}


What if I want it to work on ranges that don't have slicing?

Stewart.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-15 Thread Walter Bright

On 5/14/2012 9:54 PM, H. S. Teoh wrote:

On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:

On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:

While we're at it, do we want to keep std.outbuffer?


Since it's not range based, probably not.


Why not just fold this into std.io?


It's not I/O.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-15 Thread Dmitry Olshansky

On 15.05.2012 8:54, H. S. Teoh wrote:

On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:

On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:

While we're at it, do we want to keep std.outbuffer?


Since it's not range based, probably not.


Why not just fold this into std.io? I'm surprised that this is a
separate module, actually. It should either be folded into std.io, or
developed to be more generic (i.e., have range-based API, have more
features like auto-flushing past a certain size, etc.).




It's std.array Appender. The only difference is text vs binary output form.


--
Dmitry Olshansky


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-15 Thread Lars T. Kyllingstad

On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote:

On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
I keep trying to avoid talking about this, because I'm writing 
a replacement
library for std.stream, and I don't want to step on any toes 
while it's still

not accepted.

But I have to say, ranges are *not* a good interface for 
generic data providers.

They are *very* good for structured data providers.

In other words, a stream of bytes, not a good range (who wants 
to get one byte
at a time?). A stream of UTF text broken into lines, a very 
good range.


[...]


I'll say in advance without seeing your design that it'll be a 
tough sell if it is not range based.


I've been doing some range based work on the side. I'm 
convinced there is enormous potential there, despite numerous 
shortcomings with them I ran across in Phobos. Those 
shortcomings can be fixed, they are not fatal.


[...]


I have to say, I'm with Steve on this one.  While I do believe
ranges will have a very important role to play in D's future I/O
paradigm, I also think there needs to be a layer beneath the
ranges that more directly maps to OS primitives.  And as D is a
systems programming language, that layer needs to be publicly
available.  (Note that this is how std.stdio works now, more or
less.)

-Lars


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-15 Thread Lars T. Kyllingstad
On Tuesday, 15 May 2012 at 15:22:03 UTC, Lars T. Kyllingstad 
wrote:

On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote:

On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
I keep trying to avoid talking about this, because I'm 
writing a replacement
library for std.stream, and I don't want to step on any toes 
while it's still

not accepted.

But I have to say, ranges are *not* a good interface for 
generic data providers.

They are *very* good for structured data providers.

In other words, a stream of bytes, not a good range (who 
wants to get one byte
at a time?). A stream of UTF text broken into lines, a very 
good range.


[...]


I'll say in advance without seeing your design that it'll be a 
tough sell if it is not range based.


I've been doing some range based work on the side. I'm 
convinced there is enormous potential there, despite numerous 
shortcomings with them I ran across in Phobos. Those 
shortcomings can be fixed, they are not fatal.


[...]


I have to say, I'm with Steve on this one.  While I do believe
ranges will have a very important role to play in D's future I/O
paradigm, I also think there needs to be a layer beneath the
ranges that more directly maps to OS primitives.  And as D is a
systems programming language, that layer needs to be publicly
available.  (Note that this is how std.stdio works now, more or
less.)


Also, I wouldn't mind std.*stream getting deprecated.  
Personally, I've never used those modules -- not even once.  As a 
first step their documentation could be removed from dlang.org, 
so new users aren't tempted to start using them.  No 
functionality is better than poor functionality, IMO.


-Lars



Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-15 Thread Jonas Drewsen

On Sunday, 13 May 2012 at 22:26:17 UTC, Walter Bright wrote:

On 5/13/2012 3:16 PM, Nathan M. Swan wrote:
Trying to make it read lazily is even harder, as all std.utf 
functions work on

arrays, not ranges. I think this should change.


Yes, std.utf should be upgraded to present range interfaces.


+1 on that.

I really needed it when doing the std.net.curl stuff and would be
happy to move it to a more generic handling in std.utf.




Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-15 Thread Nathan M. Swan
On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer 
wrote:
In other words, a stream of bytes, not a good range (who wants 
to get one byte at a time?).  A stream of UTF text broken into 
lines, a very good range.


There are several cases where one would want one byte at the 
time; e.g. as an input to another range that produces the utf 
text as an output.


I do agree for e.g. with binary data some data can't be read with 
ranges (when you need to read small chunks of varying size), but 
that doesn't mean most things shouldn't be ranged-based.


NMS


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-15 Thread Sean Kelly
On May 15, 2012, at 3:34 PM, Nathan M. Swan nathanms...@gmail.com wrote:

 On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer wrote:
 In other words, a stream of bytes, not a good range (who wants to get one 
 byte at a time?).  A stream of UTF text broken into lines, a very good range.
 
 There are several cases where one would want one byte at the time; e.g. as an 
 input to another range that produces the utf text as an output.
 
 I do agree for e.g. with binary data some data can't be read with ranges 
 (when you need to read small chunks of varying size), but that doesn't mean 
 most things shouldn't be ranged-based.

You really want both, depending on the situation. I don't see what's weird 
about this. C++ iostreams have input and output iterators built on top as well, 
for much the same reason. The annoying part is that once you've moved to a 
range interface it's hard to go back. Like say I want a ZipRange on top of a 
FileRange.  But now I wan to read structs as binary blobs from that 
uncompressed output. 

One thing I'd like in a buffered input API is a way to perform transactional 
reads such that if the full read can't be performed, the read state remains 
unchanged. The best you can do with most APIs is to check for a desired length, 
but what I'd I don't want to read until a full line is available, and I don't 
know the exact length?  Typically, you end up having to double buffer, which 
stinks. 

Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-15 Thread H. S. Teoh
On Tue, May 15, 2012 at 04:43:05PM -0700, Sean Kelly wrote:
[...]
 One thing I'd like in a buffered input API is a way to perform
 transactional reads such that if the full read can't be performed, the
 read state remains unchanged. The best you can do with most APIs is to
 check for a desired length, but what I'd I don't want to read until a
 full line is available, and I don't know the exact length?  Typically,
 you end up having to double buffer, which stinks. 

This would be very nice to have, but how would you go about implementing
such a thing, though? Wouldn't you need OS-level support for it?


T

-- 
Let's eat some disquits while we format the biskettes.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-14 Thread Andrej Mitrovic
On 5/13/12, Kiith-Sa 4...@theanswer.com wrote:
 My D:YAML library (YAML parser) depends on std.stream

Also ae.xml depends on it.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-14 Thread Stewart Gordon

From the other thread

On 13/05/2012 21:58, Walter Bright wrote:

On 5/13/2012 1:48 PM, Stewart Gordon wrote:

On 13/05/2012 20:42, Walter Bright wrote:
snip

I'd like to see std.stream dumped.  I don't see any reason for it
to exist that std.stdio doesn't do (or should do).


So std.stdio.File is the replacement for the std.stream stuff?


Not exactly.  Ranges are the replacement.  std.stdio.File is merely
a range that deals with files.


I don't see any of the required range methods in it.

Moreover, I'm a bit confused about the means of retrieving multiple elements at once with 
the range API, such as a set number of bytes from a file.  We have popFrontN, which 
advances the range but doesn't return the data from it.  We have take and takeExactly, 
which seem to be the way to get a set number of elements from the range, but I'm confused 
about when/whether using these advances the original range.


If I'm writing a library to read a binary file format, I want to allow the data to come 
from a file, a socket or a memory image.  The stream API makes this straightforward.  But 
it seems some work is needed before std.stdio and the range API are up to it.


Stewart.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-14 Thread Steven Schveighoffer
On Sun, 13 May 2012 17:38:23 -0400, Walter Bright  
newshou...@digitalmars.com wrote:


This discussion started in the thread Getting the const-correctness of  
Object sorted once and for all, but it deserved its own thread.


These modules suffer from the following problems:

1. poor documentation, dearth of examples  rationale

2. toHash(), toString(), etc., all need to be const pure nothrow, but  
it's turning out to be problematic for doing it for these classes


3. overlapping functionality with std.stdio

4. they should present a range interface, not a streaming one


I keep trying to avoid talking about this, because I'm writing a  
replacement library for std.stream, and I don't want to step on any toes  
while it's still not accepted.


But I have to say, ranges are *not* a good interface for generic data  
providers.  They are *very* good for structured data providers.


In other words, a stream of bytes, not a good range (who wants to get one  
byte at a time?).  A stream of UTF text broken into lines, a very good  
range.


I have no problem with getting rid of std.stream.  I've never actually  
used it.  Still, we absolutely need a non-range based low-level streaming  
interface to data.  If nothing else, we need something we can build ranges  
upon, and I think my replacement does a very good job of that.


-Steve


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-14 Thread Alex Rønne Petersen

On 13-05-2012 23:38, Walter Bright wrote:

This discussion started in the thread Getting the const-correctness of
Object sorted once and for all, but it deserved its own thread.

These modules suffer from the following problems:

1. poor documentation, dearth of examples  rationale

2. toHash(), toString(), etc., all need to be const pure nothrow, but
it's turning out to be problematic for doing it for these classes

3. overlapping functionality with std.stdio

4. they should present a range interface, not a streaming one


While we're at it, do we want to keep std.outbuffer?

--
- Alex


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-14 Thread Walter Bright

On 5/13/2012 10:22 PM, Oleg Kuporosov wrote:

unfortunatelly std.stdio under Windows couldn't handle UTF16(wchar)-based file
names and text IO which are naturel there. The root of issues looks in both
underlying DMC C-stdio (something wrong with w* based functions?) and std.format
which provides only UTF8 strings. It make sense to depreciate for reasons but
only after std.stdio would support UTF16 names/flows or good replacement
(Steven's std.io?) would be ready. Currently std.[c]stream is only the way to
work with UTF16 filesystems in Phobos. Or switch to Tango which looks supports
it too (but I don't have expirience here).



Why not just convert the UTF16 strings to UTF8 ones? They have the same 
information.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-14 Thread Walter Bright

On 5/14/2012 4:43 AM, Stewart Gordon wrote:

If I'm writing a library to read a binary file format, I want to allow the data
to come from a file, a socket or a memory image. The stream API makes this
straightforward. But it seems some work is needed before std.stdio and the range
API are up to it.


I agree. But that's where the effort needs to be made, not in carrying stream 
forward.




Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-14 Thread Walter Bright

On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:

While we're at it, do we want to keep std.outbuffer?


Since it's not range based, probably not.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-14 Thread Walter Bright

On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:

I keep trying to avoid talking about this, because I'm writing a replacement
library for std.stream, and I don't want to step on any toes while it's still
not accepted.

But I have to say, ranges are *not* a good interface for generic data providers.
They are *very* good for structured data providers.

In other words, a stream of bytes, not a good range (who wants to get one byte
at a time?). A stream of UTF text broken into lines, a very good range.

I have no problem with getting rid of std.stream. I've never actually used it.
Still, we absolutely need a non-range based low-level streaming interface to
data. If nothing else, we need something we can build ranges upon, and I think
my replacement does a very good job of that.


I'll say in advance without seeing your design that it'll be a tough sell if it 
is not range based.


I've been doing some range based work on the side. I'm convinced there is 
enormous potential there, despite numerous shortcomings with them I ran across 
in Phobos. Those shortcomings can be fixed, they are not fatal.


The ability to do things like:

 void main() {
  stdin.byChunk(1024).
 map!(a = a.idup). // one of those shortcomings
 joiner().
 stripComments().
 copy(stdout.lockingTextWriter());
 }

is just kick ass.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-14 Thread H. S. Teoh
On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:
 On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:
 While we're at it, do we want to keep std.outbuffer?
 
 Since it's not range based, probably not.

Why not just fold this into std.io? I'm surprised that this is a
separate module, actually. It should either be folded into std.io, or
developed to be more generic (i.e., have range-based API, have more
features like auto-flushing past a certain size, etc.).


T

-- 
Prosperity breeds contempt, and poverty breeds consent. -- Suck.com


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-13 Thread Alex Rønne Petersen

On 13-05-2012 23:38, Walter Bright wrote:

This discussion started in the thread Getting the const-correctness of
Object sorted once and for all, but it deserved its own thread.

These modules suffer from the following problems:

1. poor documentation, dearth of examples  rationale

2. toHash(), toString(), etc., all need to be const pure nothrow, but
it's turning out to be problematic for doing it for these classes

3. overlapping functionality with std.stdio

4. they should present a range interface, not a streaming one


I'm all for it. I haven't used any of them, ever, and probably never 
will. Their APIs aren't particularly appealing, honestly.


--
- Alex


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-13 Thread Nathan M. Swan

On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:

4. they should present a range interface, not a streaming one


I was just about to make a post suggesting that! You could easily 
integrate std.io with std.algorithm to do some pretty cool things.


NMS


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-13 Thread Jonathan M Davis
On Sunday, May 13, 2012 14:38:23 Walter Bright wrote:
 This discussion started in the thread Getting the const-correctness of
 Object sorted once and for all, but it deserved its own thread.
 
 These modules suffer from the following problems:
 
 1. poor documentation, dearth of examples  rationale
 
 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's
 turning out to be problematic for doing it for these classes
 
 3. overlapping functionality with std.stdio
 
 4. they should present a range interface, not a streaming one

I think that it's been a foregone conclusion for some time that they were 
going to go. We just haven't done it, because we don't have replacements for 
them yet. IIRC Steven's std.stdio rewrite at least partially covers that, but 
he hasn't been able to finish it yet.

- Jonathan M Davis


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-13 Thread Kiith-Sa

On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:
This discussion started in the thread Getting the 
const-correctness of Object sorted once and for all, but it 
deserved its own thread.


These modules suffer from the following problems:

1. poor documentation, dearth of examples  rationale

2. toHash(), toString(), etc., all need to be const pure 
nothrow, but it's turning out to be problematic for doing it 
for these classes


3. overlapping functionality with std.stdio

4. they should present a range interface, not a streaming one



My D:YAML library (YAML parser) depends on std.stream
(e.g. for cross-endian compatibility and loading from memory),
and I've been waiting for a replacement since the first release.

I support removing std.stream, but it needs a replacement with
equivalent functionality.

Actually, I've postponed a 1.0 release _until_ std.stream is 
replaced.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-13 Thread Walter Bright

On 5/13/2012 3:16 PM, Nathan M. Swan wrote:

Trying to make it read lazily is even harder, as all std.utf functions work on
arrays, not ranges. I think this should change.


Yes, std.utf should be upgraded to present range interfaces.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-13 Thread H. S. Teoh
On Sun, May 13, 2012 at 02:38:23PM -0700, Walter Bright wrote:
 This discussion started in the thread Getting the const-correctness
 of Object sorted once and for all, but it deserved its own thread.
 
 These modules suffer from the following problems:
 
 1. poor documentation, dearth of examples  rationale
 
 2. toHash(), toString(), etc., all need to be const pure nothrow,
 but it's turning out to be problematic for doing it for these
 classes
 
 3. overlapping functionality with std.stdio
 
 4. they should present a range interface, not a streaming one

I agree with all of the above.

The only problem is, where's the replacement? We need std.io in usable
shape before we can feasibly carry out any of the above. It would make D
look utterly ridiculous if all of the above were deprecated with no
usable replacement.


T

-- 
If lightning were to ever strike an orchestra, it'd always hit the conductor 
first.


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-13 Thread Robert Clipsham

On 13/05/2012 22:38, Walter Bright wrote:

This discussion started in the thread Getting the const-correctness of
Object sorted once and for all, but it deserved its own thread.

These modules suffer from the following problems:

1. poor documentation, dearth of examples  rationale

2. toHash(), toString(), etc., all need to be const pure nothrow, but
it's turning out to be problematic for doing it for these classes

3. overlapping functionality with std.stdio

4. they should present a range interface, not a streaming one


I make use of std.stream quite a lot... It's horrible, it has to go.

I'm not too bothered if replacements aren't available straight away, as 
it doesn't take much to drop 10 lines of replacement in for the 
functionality I use from it until the actual replacement appears.


--
Robert
http://octarineparrot.com/


Re: deprecating std.stream, std.cstream, std.socketstream

2012-05-13 Thread Oleg Kuporosov

On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:


3. overlapping functionality with std.stdio


unfortunatelly std.stdio under Windows couldn't handle 
UTF16(wchar)-based file names and text IO which are naturel 
there. The root of issues looks in both underlying DMC C-stdio 
(something wrong with w* based functions?) and std.format which 
provides only UTF8 strings. It make sense to depreciate for 
reasons but only after std.stdio would support UTF16 names/flows 
or good replacement (Steven's std.io?) would be ready. Currently 
std.[c]stream is only the way to work with UTF16 filesystems in 
Phobos. Or switch to Tango which looks supports it too (but I 
don't have expirience here).