Re: deprecating std.stream, std.cstream, std.socketstream
On Tue, 15 May 2012 19:43:05 -0400, Sean Kelly s...@invisibleduck.org wrote: One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length? Typically, you end up having to double buffer, which stinks. My new design supports this. I have a function called readUntil: https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832 Essentially, it reads into its buffer until the condition is satisfied. Therefore, you are not double buffering. The return value is a slice of the buffer. There is a way to opt-out of reading any data if you determine you cannot do a full read. Just return 0 from the delegate. -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On Mon, 14 May 2012 22:56:08 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/14/2012 8:02 AM, Steven Schveighoffer wrote: I keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. I have no problem with getting rid of std.stream. I've never actually used it. Still, we absolutely need a non-range based low-level streaming interface to data. If nothing else, we need something we can build ranges upon, and I think my replacement does a very good job of that. I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. The ability to do things like: void main() { stdin.byChunk(1024). map!(a = a.idup). // one of those shortcomings joiner(). stripComments(). copy(stdout.lockingTextWriter()); } I think we may have a misunderstanding. My design is not range-based, but supports ranges, and actually makes them very easy to implement. byChunk is a perfect example of good range -- it defines a specific criteria for determining an element of data, appropriate for specific situations. But it must be built on top of something that allows reading arbitrary amounts of data. At the lowest level, this is the OS file descriptor/HANDLE. To be efficient, it should be based on a buffering stream. That buffering stream *does not* need to be a range, and I don't think shoehorning such a construct into a range interface makes any sense. To make this clear, I can say that any range File supports, my design will support *as a range*. To make it even clearer, the current std.stdio.File structure, which you have shown to kick ass with ranges, is *NOT* range-based by my definition. I should note, the output range idiom is directly supported, because the output range definition exactly maps to an output stream definition. -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/15/2012 4:43 PM, Sean Kelly wrote: One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length? Typically, you end up having to double buffer, which stinks. std.stdio.byLine()
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true.
Re: deprecating std.stream, std.cstream, std.socketstream
Steven Schveighoffer , dans le message (digitalmars.D:167548), a My new design supports this. I have a function called readUntil: https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832 Essentially, it reads into its buffer until the condition is satisfied. Therefore, you are not double buffering. The return value is a slice of the buffer. There is a way to opt-out of reading any data if you determine you cannot do a full read. Just return 0 from the delegate. Maybe I already told this some time ago, but I am not very comfortable with this design. The process delegate has to maintain an internal state, if you want to avoid reading everything again. It will be difficult to implement those process delegates. Do you have an example of moderately complicated reading process to show us it is not too complicated? To avoid this issue, the design could be reversed: A method that would like to read a certain amount of character could take a delegate from the stream, which provides additionnal bytes of data. Example: // create a T by reading from stream. returns true if the T was // successfully created, and false otherwise. bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t); The stream delegate returns a buffer of data to read from when called with consumed==0. It must return additionnal data when called repeatedly. When it is called with a consumed != 0, the corresponding amount of consumed bytes can be discared from the buffer. This stream delegate (if should have a better name) should not be more difficult to implement than readUntil, but makes it more easy to use by the client. Did I miss some important information ? -- Christophe
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On May 16, 2012, at 6:52 AM, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 4:43 PM, Sean Kelly wrote: One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length? Typically, you end up having to double buffer, which stinks. std.stdio.byLine() That was just an example. What if I want to do a formatted read and I'm reading from a file that someone else is writing to? I don't want to block or get a partial result and an EOF that needs to be reset.
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 10:03:42 -0400, Christophe Travert trav...@phare.normalesup.org wrote: Steven Schveighoffer , dans le message (digitalmars.D:167548), a My new design supports this. I have a function called readUntil: https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832 Essentially, it reads into its buffer until the condition is satisfied. Therefore, you are not double buffering. The return value is a slice of the buffer. There is a way to opt-out of reading any data if you determine you cannot do a full read. Just return 0 from the delegate. Maybe I already told this some time ago, but I am not very comfortable with this design. The process delegate has to maintain an internal state, if you want to avoid reading everything again. It will be difficult to implement those process delegates. The delegate is given which portion has already been processed, that is the 'start' parameter. If you can use this information, it's highly useful. If you need more context, yes, you have to store it elsewhere, but you do have a delegate which contains a context pointer. In a few places (take a look at TextStream's readln https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2149) I use inner functions that have access to the function call's frame pointer in order to configure or store data. Do you have an example of moderately complicated reading process to show us it is not too complicated? The most complicated I have so far is reading UTF data as a range of dchar: https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2209 Note that I hand-inlined all the decoding because using std.utf or the runtime was too slow, so although it looks huge, it's pretty basic stuff, and can largely be ignored for the terms of this discussion. The interesting part is how it specifies what to consume and what not to. I realize it's a different way of thinking about how to do I/O, but it gives more control to the buffer, so it can reason about how best to buffer things. I look at as a way of the buffered stream saying I'll read some data, you tell me when you see something interesting, and I'll give you a slice to it. The alternative is to double-buffer your data. Each call to read can invalidate the previously buffered data. But readUntil guarantees the data is contiguous and consumed all at once, no need to double-buffer To avoid this issue, the design could be reversed: A method that would like to read a certain amount of character could take a delegate from the stream, which provides additionnal bytes of data. Example: // create a T by reading from stream. returns true if the T was // successfully created, and false otherwise. bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t); The stream delegate returns a buffer of data to read from when called with consumed==0. It must return additionnal data when called repeatedly. When it is called with a consumed != 0, the corresponding amount of consumed bytes can be discared from the buffer. I can see use cases for both your method and mine. I think I can implement your idea in terms of mine. I might just do that. The only thing missing is, you need a way to specify to the delegate that it needs more data. Probably using size_t.max as an argument. In fact, I need a peek function anyways, your function will provide that ability as well. -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On 16/05/2012 15:38, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? -Steve A bit ugly but: // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; } -- Robert http://octarineparrot.com/
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham rob...@octarineparrot.com wrote: On 16/05/2012 15:38, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? -Steve A bit ugly but: // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; } Yeah, I've seen this before. It's not convincing. -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On 16.05.2012 19:32, Steven Schveighoffer wrote: On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham rob...@octarineparrot.com wrote: On 16/05/2012 15:38, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? -Steve A bit ugly but: // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; } Yeah, I've seen this before. It's not convincing. Yes, It's obvious that files do *not* generally follow range of items semantic. I mean not even range of various items. In case of binary data it's most of the time header followed by various data. Or hierarchical structure. Or table of links + raw data. Or whatever. I've yet to see standard way to deal with binary formats :) -- Dmitry Olshansky
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 11:48:32 -0400, Dmitry Olshansky dmitry.o...@gmail.com wrote: On 16.05.2012 19:32, Steven Schveighoffer wrote: On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham rob...@octarineparrot.com wrote: A bit ugly but: // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; } Yeah, I've seen this before. It's not convincing. Yes, It's obvious that files do *not* generally follow range of items semantic. I mean not even range of various items. In case of binary data it's most of the time header followed by various data. Or hierarchical structure. Or table of links + raw data. Or whatever. I've yet to see standard way to deal with binary formats :) The best solution would be a range that's specific to your format. My solution intends to support that. But that's only if your format fits within the range of elements model. Good old fashioned read X bytes needs to be supported, and insisting you do this range style is just plain wrong IMO. -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/16/2012 7:38 AM, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? std.byLine() does it. In general, you can read n bytes by calling empty, front, and popFront n times.
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/16/2012 7:49 AM, Sean Kelly wrote: On May 16, 2012, at 6:52 AM, Walter Brightnewshou...@digitalmars.com wrote: On 5/15/2012 4:43 PM, Sean Kelly wrote: One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length? Typically, you end up having to double buffer, which stinks. std.stdio.byLine() That was just an example. What if I want to do a formatted read and I'm reading from a file that someone else is writing to? I don't want to block or get a partial result and an EOF that needs to be reset. Then you'll need an input range that can be reset - a ForwardRange.
Re: deprecating std.stream, std.cstream, std.socketstream
On 16/05/2012 16:59, Walter Bright wrote: On 5/16/2012 7:38 AM, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? std.byLine() does it. And is what you want to do with a text file in many cases. In general, you can read n bytes by calling empty, front, and popFront n times. Why would anybody want to read a large binary file _one byte at a time_? Stewart.
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote: On 16/05/2012 16:59, Walter Bright wrote: On 5/16/2012 7:38 AM, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? std.byLine() does it. And is what you want to do with a text file in many cases. In general, you can read n bytes by calling empty, front, and popFront n times. Why would anybody want to read a large binary file _one byte at a time_? [...] import std.range; byte[] readNBytes(R)(R range, size_t n) if (isInputRange!R hasSlicing!R) { return R[0..n]; } T -- MAS = Mana Ada Sistem?
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 11:59:37 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/16/2012 7:38 AM, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? std.byLine() does it. Have you looked at how std.byLine works? It certainly does not use a range interface as a source. In general, you can read n bytes by calling empty, front, and popFront n times. I hope you are not serious! This will make D *the worst performing* i/o language. This should be evidence enough: steves@steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1 count=100 100+0 records in 100+0 records out 100 bytes (1.0 MB) copied, 0.74052 s, 1.4 MB/s real0m0.744s user0m0.176s sys 0m0.564s steves@steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1000 count=1000 1000+0 records in 1000+0 records out 100 bytes (1.0 MB) copied, 0.00194096 s, 515 MB/s real0m0.006s user0m0.000s sys 0m0.004s -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/16/2012 9:41 AM, Stewart Gordon wrote: On 16/05/2012 16:59, Walter Bright wrote: On 5/16/2012 7:38 AM, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? std.byLine() does it. And is what you want to do with a text file in many cases. In general, you can read n bytes by calling empty, front, and popFront n times. Why would anybody want to read a large binary file _one byte at a time_? You can have that range read from byChunk(). It's really the same thing that C's stdio does.
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/16/2012 10:18 AM, Steven Schveighoffer wrote: On Wed, 16 May 2012 11:59:37 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/16/2012 7:38 AM, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? std.byLine() does it. Have you looked at how std.byLine works? It certainly does not use a range interface as a source. It presents a range interface, though. Not a streaming one. In general, you can read n bytes by calling empty, front, and popFront n times. I hope you are not serious! This will make D *the worst performing* i/o language. You can read arbitrary numbers of bytes by tacking a range on after byChunk().
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 13:21:37 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/16/2012 9:41 AM, Stewart Gordon wrote: On 16/05/2012 16:59, Walter Bright wrote: On 5/16/2012 7:38 AM, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? std.byLine() does it. And is what you want to do with a text file in many cases. In general, you can read n bytes by calling empty, front, and popFront n times. Why would anybody want to read a large binary file _one byte at a time_? You can have that range read from byChunk(). It's really the same thing that C's stdio does. This is very wrong. byChunk doesn't cut it. The number of bytes to consume from the stream can depend on any number of factors, including the actual data in the stream. For instance, I challenge you to write an efficient (meaning no extra buffering) byLine using byChunk as a base. -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 13:23:07 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/16/2012 10:18 AM, Steven Schveighoffer wrote: On Wed, 16 May 2012 11:59:37 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/16/2012 7:38 AM, Steven Schveighoffer wrote: On Wed, 16 May 2012 09:50:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 5/15/2012 3:34 PM, Nathan M. Swan wrote: I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), I don't see why that should be true. How do you tell front and popFront how many bytes to read? std.byLine() does it. Have you looked at how std.byLine works? It certainly does not use a range interface as a source. It presents a range interface, though. Not a streaming one. But that is *the point*! The code deciding how much data to read (i.e. the entity I referenced above that 'tells front and popFront how many bytes to read') is *not* using a range interface. In other words, ranges aren't enough. Ranges can be built on top of streaming interfaces. But there is *still* a need for a comprehensive streaming toolkit. And C's streaming toolkit is not as good as a native D toolkit can be. In general, you can read n bytes by calling empty, front, and popFront n times. I hope you are not serious! This will make D *the worst performing* i/o language. You can read arbitrary numbers of bytes by tacking a range on after byChunk(). No, this doesn't work in most cases. See my other post. You can't get everything you want out of just byChunk and byLine. what about byMySpecificPacketProtocol? -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/16/12 12:34 PM, Steven Schveighoffer wrote: In other words, ranges aren't enough. This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges. Andrei
Re: deprecating std.stream, std.cstream, std.socketstream
tbh, I've found byChunk to be less than worthless in my experience; it's a liability because I still have to wrap it somehow to real real world files. Consider reading a series of strings in the format lengthdata,[...]. I'd like it to be this simple (neglecting priming the loop): string[] s; while(!file.eof)) { ubyte length = file.read!ubyte; s ~= file.read!string(length); } The C fgetc/fread interface can do this reasonably well. string[] s; while(!feof(fp)) { ubyte length = fgetc(fp); char[] buffer; buffer.length = length; fread(buffer.ptr, 1, length, fp); s ~= assumeUnique(buffer); } But, doing it with byChunk is an exercise in pain that I don't even feel like writing here. Another problem is consider a network interface. You want to handle the packets as they come in. byChunk doesn't work at all because it blocks until it gets the chunk of the requested size. foreach(chunk; socket.byChunk(1024)) suppose you get a packet of length 1000 and you have to answer it. That will block forever. So, if you use byChunk as the underlying thing to fill your buffer... you don't get anywhere. I think a better input primitive is byPacket(max_size). This works more like the read primitive on the operating system. Moreover, I want it to buffer, and control how much is consumed. auto packetSource = socket.byPacket(1024); foreach(packet; packetSource) { // as soon as some data comes in we can get the length if(packet.length 2) continue; auto length = packet.peek!(ushort); // neglect endian for now if(packet.length length + 2) continue; // wait for more data packet.consume(2); handle(packet.consume(length)); } In addition to the byChunk blocking problem... what if the length straddles the edge? byChunk is just a huge hassle to work with for every file format I've tried so far. byLine is a little better (some file formats are defined as being line based) but still a bit of a pain for anything that can spill into two lines.
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 13:48:49 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 5/16/12 12:34 PM, Steven Schveighoffer wrote: In other words, ranges aren't enough. This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges. What I think we would end up with is a streaming API with range primitives tacked on. - empty is clunky, but possible to implement. However, it may become invalid (think of reading a file that is being appended to by another process). - popFront and front do not have any clear definition of what they refer to. The only valid thing I can think of is bytes, and then nobody will use them. That's hardly saying it's range based. I refuse to believe that people will be thrilled by having to 'pre-configure' each front and popFront call in order to get work done. If you want to try and convince me, I'm willing to listen, but so far I haven't seen anything that looks at all appetizing. -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On Wednesday, 16 May 2012 at 17:48:52 UTC, Andrei Alexandrescu wrote: This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) I tried this in cgi.d somewhat recently. It ended up only vaguely looking like a range. /** A slight difference from regular ranges is you can give it the maximum number of bytes to consume. IMPORTANT NOTE: the default is to consume nothing, so if you don't call consume() yourself and use a regular foreach, it will infinitely loop! */ void popFront(size_t maxBytesToConsume = 0 /*size_t.max*/, size_t minBytesToSettleFor = 0) {} I called that a slight different in the comment, but it is actually a pretty major difference. In practice, it is nothing like a regular range. If I defaulted to size_t.max, you could foreach() it, but then you don't really get to take advantage of the buffer, since it is cleared out entirely for each iteration. If it defaults to 0, you can put it in a foreach... but you have to manually say how much of it is consumed, which no other range does, meaning it won't work with std.algorithm or anything. It sorta looks like a range, but isn't actually one at all. I'm sure something better is possible, but I don't think the range abstraction is a good fit for this use case. Of course, providing optional ranges (like how file gives byChunk, byLine, etc.) is probably a good idea.
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote: On 5/16/12 12:34 PM, Steven Schveighoffer wrote: In other words, ranges aren't enough. This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges. [...] One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary. These are all tentative names, of course. But the idea is that you can keep N elements of the range in view at a time, without having to individually read them out and save them in a second buffer, and you can advance this view once you're done with the current data and want to move on. Existing range operations like popFrontN, take, takeExactly, drop, etc., can be extended to take advantage of the extra functionality of ChunkedRanges. (Perhaps popFrontN can even be merged with popN, since they amount to the same thing.) Using a ChunkedRange allows you to write functions that parse a particular range and return a range of chunks (say, a deserializer that returns a range of objects given a range of bytes). Thinking on it a bit further, perhaps we can call this a WindowedRange, since it somewhat resembles the sliding window protocol where you keep a window of sequential packet ids in an active buffer, and remove them from the buffer as they get ack'ed (consumed by popN). The buffer thus acts like a window into the next n elements in the range, which can be slid forward as data is consumed. T -- Having a smoking section in a restaurant is like having a peeing section in a swimming pool. -- Edward Burr
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh hst...@quickfur.ath.cx wrote: On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote: On 5/16/12 12:34 PM, Steven Schveighoffer wrote: In other words, ranges aren't enough. This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges. [...] One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary. On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me. I still don't get the need to add this to ranges. The streaming API works fine on its own. But there is an omission with your proposed API regardless -- reading data is a mutating event. It destructively mutates the underlying data stream so that you cannot get the data again. This means you must double-buffer data in order to support frontN and popN that are not necessary with a simple read API. For example: auto buf = new ubyte[100]; stream.read(buf); does not need to first buffer the data inside the stream and then copy it to buf, it can read it from the OS *directly* into buf. -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On 05/16/12 21:38, H. S. Teoh wrote: On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote: On 5/16/12 12:34 PM, Steven Schveighoffer wrote: In other words, ranges aren't enough. This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges. [...] One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary. These are all tentative names, of course. But the idea is that you can keep N elements of the range in view at a time, without having to individually read them out and save them in a second buffer, and you can advance this view once you're done with the current data and want to move on. Existing range operations like popFrontN, take, takeExactly, drop, etc., can be extended to take advantage of the extra functionality of ChunkedRanges. (Perhaps popFrontN can even be merged with popN, since they amount to the same thing.) Using a ChunkedRange allows you to write functions that parse a particular range and return a range of chunks (say, a deserializer that returns a range of objects given a range of bytes). Thinking on it a bit further, perhaps we can call this a WindowedRange, since it somewhat resembles the sliding window protocol where you keep a window of sequential packet ids in an active buffer, and remove them from the buffer as they get ack'ed (consumed by popN). The buffer thus acts like a window into the next n elements in the range, which can be slid forward as data is consumed. Right now, everybody reinvents this, with a slightly different interface... It's really obvious, needed and just has to be standardized. A few notes: hasAtLeast is redundant as it can be better expressed as .length; what would be the point of wrapping 'r.length=n'? An '.available' property would be useful to find eg out how much can be consumed w/o blocking, but that one should return a size_t too. 'E[] frontN' should have a version that returns all available elements; i called it '@property E[] fronts()' here. It's more efficient that way and doesn't rely on the compiler to inline and optimize the limit checks away. PopN -- well, its signature here is 'void popFronts(size_t n)', other than that, there's no difference. Similar things are necessary for output ranges. Here, what i needed was: void put(ref E el) void puts(E[] els) @property size_t free() // Not the most intuitive name w/o context; // returns the number of E's that can be 'put()' // w/o blocking. Note that all of this doesn't address the consume-variable-sized-chunks issue. But that can now be efficiently handled by another layer on top. On 05/16/12 22:15, Steven Schveighoffer wrote: I still don't get the need to add this to ranges. The streaming API works fine on its own. This is not an argument against a streaming API (at least not for me), but for efficient ranges. With the API above I can shift tens of gigabytes of data per second between threads. And still use the 'std' range API and everything that works with it... But there is an omission with your proposed API regardless -- reading data is a mutating event. It destructively mutates the underlying data stream so that you cannot get the data again. This means you must double-buffer data in order to support frontN and popN that are not necessary with a simple read API. For example: auto buf = new ubyte[100]; stream.read(buf); does not need to first buffer the data inside the stream and then copy it to buf, it can read it from the OS *directly* into buf. Sometimes having the buffer managed by 'stream' and 'read()' returning a slice into it works (this is what 'fronts' above does). Reusing a caller managed buffer can be useful in other cases, yes. artur
Re: deprecating std.stream, std.cstream, std.socketstream
One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; I think it would be better to have a function that would return the number of elements left. - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary. I like the idea of frontN and popN. But is there any reason why a type that defines those (let's call it a stream) should also be a range? I would prefer to have a type that just defines those two functions, a function that returns the number of available elements and a functions that tells whether we are at the end of stream. If you need a range of elements with a blocking popFront, it's easy to build one on top of it. You can write a functions that takes any stream and returns a range of element. I think that's better than having to write front, popFront, and empty for every stream.
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 16:30:43 -0400, H. S. Teoh hst...@quickfur.ath.cx wrote: On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote: On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh hst...@quickfur.ath.cx wrote: [...] One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary. On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me. How so? It's still useful for implementing readByte, for example. readByte is covered by frontN(1). Why the need for front()? Let me answer that question for you -- so it can be treated as a normal range. But nobody will want to do that. i.e. copy to appender will read one byte at a time into the array. I still don't get the need to add this to ranges. The streaming API works fine on its own. But there is an omission with your proposed API regardless -- reading data is a mutating event. It destructively mutates the underlying data stream so that you cannot get the data again. This means you must double-buffer data in order to support frontN and popN that are not necessary with a simple read API. For example: auto buf = new ubyte[100]; stream.read(buf); does not need to first buffer the data inside the stream and then copy it to buf, it can read it from the OS *directly* into buf. [...] The idea is that by asking for N elements at a time instead of calling front/popFront N times, the underlying implementation can optimize the request by creating a buffer of size N and have the OS read exactly N bytes directly into that buffer. // Reads 1,000,000 bytes into newly allocated buffer and returns // buffer. auto buf = stream.frontN(1_000_000); OK, so stream is providing data via return value and allocation. // Since 1,000,000 bytes is already read into the buffer, this // simply returns a slice of the same buffer: auto buf2 = stream.frontN(1_000_000); Is buf2 mutable? If so, this is no good, buf could have mutated this data. But this can be fixed by making the return value of frontN be const(ubyte)[]. assert(buf is buf2); // This consumes the buffer: stream.popN(1_000_000); What does consume mean, discard? Obviously not reuse, due to line below... // This will read another 1,000,000 bytes into a new buffer auto buf3 = stream.frontN(1_000_000); OK, you definitely lost me here, this will not fly. The whole point of buffering is to avoid having to reallocate on every read. If you have to allocate every read, buffering is going to have a negative impact on performance! -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, 16 May 2012 16:38:54 -0400, Artur Skawina art.08...@gmail.com wrote: On 05/16/12 22:15, Steven Schveighoffer wrote: I still don't get the need to add this to ranges. The streaming API works fine on its own. This is not an argument against a streaming API (at least not for me), but for efficient ranges. With the API above I can shift tens of gigabytes of data per second between threads. And still use the 'std' range API and everything that works with it... But you never would want to. Don't get me wrong, the primitives here could work for a streaming API (I haven't implemented it that way, but it could be made to work). But the idea that it must *also* be a std.range input range makes zero sense. To me, this is as obvious as not supporting linklist[index]; Sure, it can be done, but who would ever use it? -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/16/12 1:00 PM, Steven Schveighoffer wrote: What I think we would end up with is a streaming API with range primitives tacked on. - empty is clunky, but possible to implement. However, it may become invalid (think of reading a file that is being appended to by another process). - popFront and front do not have any clear definition of what they refer to. The only valid thing I can think of is bytes, and then nobody will use them. That's hardly saying it's range based. I refuse to believe that people will be thrilled by having to 'pre-configure' each front and popFront call in order to get work done. If you want to try and convince me, I'm willing to listen, but so far I haven't seen anything that looks at all appetizing. Where the two meet is in the notion of buffered streams. Ranges are by default buffered, i.e. user code can call front() several times without an intervening popFront() and get the same thing. So a range has by definition a buffer of at least one element. That makes the range interface unsuitable for strictly UNbuffered streams. On the other hand, a range could no problem offer OPTIONAL unbuffered reads (the existence of a buffer does not preclude availability of unbuffered transfers). So to tie it all nicely I think we need: 1. A STRICTLY UNBUFFERED streaming abstraction 2. A notion of range that supports unbuffered transfers. Andrei
Re: deprecating std.stream, std.cstream, std.socketstream
On Wed, May 16, 2012 at 04:52:09PM -0400, Steven Schveighoffer wrote: On Wed, 16 May 2012 16:30:43 -0400, H. S. Teoh hst...@quickfur.ath.cx wrote: On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote: On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh hst...@quickfur.ath.cx wrote: [...] One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary. On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me. How so? It's still useful for implementing readByte, for example. readByte is covered by frontN(1). Why the need for front()? Let me answer that question for you -- so it can be treated as a normal range. But nobody will want to do that. i.e. copy to appender will read one byte at a time into the array. If this new type of range is recognized by std.range, then the relevant algorithms can be made to recognize the existence of frontN and make good use of it, instead of iterating front N times. Then front() can still be used by stuff that really only wants a single byte at a time. [...] The idea is that by asking for N elements at a time instead of calling front/popFront N times, the underlying implementation can optimize the request by creating a buffer of size N and have the OS read exactly N bytes directly into that buffer. // Reads 1,000,000 bytes into newly allocated buffer and returns // buffer. auto buf = stream.frontN(1_000_000); OK, so stream is providing data via return value and allocation. // Since 1,000,000 bytes is already read into the buffer, this // simply returns a slice of the same buffer: auto buf2 = stream.frontN(1_000_000); Is buf2 mutable? If so, this is no good, buf could have mutated this data. But this can be fixed by making the return value of frontN be const(ubyte)[]. assert(buf is buf2); // This consumes the buffer: stream.popN(1_000_000); What does consume mean, discard? Obviously not reuse, due to line below... Yes, discard. That's what popFront does right now for a single element. // This will read another 1,000,000 bytes into a new buffer auto buf3 = stream.frontN(1_000_000); OK, you definitely lost me here, this will not fly. The whole point of buffering is to avoid having to reallocate on every read. If you have to allocate every read, buffering is going to have a negative impact on performance! [...] I thought the whole point of buffering is to avoid excessive roundtrips to disk I/O. Though you do have a point that allocating on every read is a bad idea. T -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- Michael Beibl
Re: deprecating std.stream, std.cstream, std.socketstream
On 05/16/12 22:58, Steven Schveighoffer wrote: On Wed, 16 May 2012 16:38:54 -0400, Artur Skawina art.08...@gmail.com wrote: On 05/16/12 22:15, Steven Schveighoffer wrote: I still don't get the need to add this to ranges. The streaming API works fine on its own. This is not an argument against a streaming API (at least not for me), but for efficient ranges. With the API above I can shift tens of gigabytes of data per second between threads. And still use the 'std' range API and everything that works with it... But you never would want to. Don't get me wrong, the primitives here could work for a streaming API (I haven't implemented it that way, but it could be made to work). But the idea that it must *also* be a std.range input range makes zero sense. Well, I do want to. For example, I can pass the produced data to *any* range consumer, it may be as efficient as mine, but will still work reasonably (I just did a quick test and the difference seems to be about 10G/s less for plain front+popFront consumer). The goal here is: if we could agree on a standard interface then *any* producer and consumer, including the ones in the std lib could take advantage of this (optional) feature. It's not so much about function call overhead as /syscall/ and /locking/ costs. Retrieving or writing 100 elements with only one lock-unlock sequence makes a large difference. To me, this is as obvious as not supporting linklist[index]; Sure, it can be done, but who would ever use it? This is not even related. Your 'read(ref ubyte[])' approach can actually mean that one more copy of the data is required. Think writer-range_or_stream-reader -- unless the reader is already waiting with an empty buffer, the stream has to copy the data to an internal buffer, which then has to be copied again when a reader comes around. The 'slice[] = fronts' solution avoids the second copy. Like I said, depending on the circumstances, sometimes you want one scheme, sometimes the other. (TBH, right now i can't think of a case where i'd prefer a non-range based approach; having the same i/f is just so convenient. But I'm sure there's one ;) ) artur
Re: deprecating std.stream, std.cstream, std.socketstream
On 05/16/2012 11:08 PM, Andrei Alexandrescu wrote: On 5/16/12 1:00 PM, Steven Schveighoffer wrote: What I think we would end up with is a streaming API with range primitives tacked on. - empty is clunky, but possible to implement. However, it may become invalid (think of reading a file that is being appended to by another process). - popFront and front do not have any clear definition of what they refer to. The only valid thing I can think of is bytes, and then nobody will use them. That's hardly saying it's range based. I refuse to believe that people will be thrilled by having to 'pre-configure' each front and popFront call in order to get work done. If you want to try and convince me, I'm willing to listen, but so far I haven't seen anything that looks at all appetizing. Where the two meet is in the notion of buffered streams. Ranges are by default buffered, i.e. user code can call front() several times without an intervening popFront() and get the same thing. So a range has by definition a buffer of at least one element. I don't think this necessarily holds. 'front' might be computed on the fly, as it is done for std.algorithm.map. That makes the range interface unsuitable for strictly UNbuffered streams. On the other hand, a range could no problem offer OPTIONAL unbuffered reads (the existence of a buffer does not preclude availability of unbuffered transfers). So to tie it all nicely I think we need: 1. A STRICTLY UNBUFFERED streaming abstraction 2. A notion of range that supports unbuffered transfers. Andrei
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/16/12 4:40 PM, Timon Gehr wrote: On 05/16/2012 11:08 PM, Andrei Alexandrescu wrote: On 5/16/12 1:00 PM, Steven Schveighoffer wrote: What I think we would end up with is a streaming API with range primitives tacked on. - empty is clunky, but possible to implement. However, it may become invalid (think of reading a file that is being appended to by another process). - popFront and front do not have any clear definition of what they refer to. The only valid thing I can think of is bytes, and then nobody will use them. That's hardly saying it's range based. I refuse to believe that people will be thrilled by having to 'pre-configure' each front and popFront call in order to get work done. If you want to try and convince me, I'm willing to listen, but so far I haven't seen anything that looks at all appetizing. Where the two meet is in the notion of buffered streams. Ranges are by default buffered, i.e. user code can call front() several times without an intervening popFront() and get the same thing. So a range has by definition a buffer of at least one element. I don't think this necessarily holds. 'front' might be computed on the fly, as it is done for std.algorithm.map. It used to be buffered in fact but that was too much trouble. The fair thing to say here is that map relies on the implicit buffering of its input. Andrei
Re: deprecating std.stream, std.cstream, std.socketstream
On 16/05/2012 18:21, Walter Bright wrote: snip You can have that range read from byChunk(). It's really the same thing that C's stdio does. And what if I want it to work on ranges that don't have a byChunk method? Stewart.
Re: deprecating std.stream, std.cstream, std.socketstream
On 16/05/2012 17:48, H. S. Teoh wrote: On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote: snip Why would anybody want to read a large binary file _one byte at a time_? [...] import std.range; byte[] readNBytes(R)(R range, size_t n) if (isInputRange!R hasSlicing!R) { return R[0..n]; } What if I want it to work on ranges that don't have slicing? Stewart.
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/14/2012 9:54 PM, H. S. Teoh wrote: On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote: On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote: While we're at it, do we want to keep std.outbuffer? Since it's not range based, probably not. Why not just fold this into std.io? It's not I/O.
Re: deprecating std.stream, std.cstream, std.socketstream
On 15.05.2012 8:54, H. S. Teoh wrote: On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote: On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote: While we're at it, do we want to keep std.outbuffer? Since it's not range based, probably not. Why not just fold this into std.io? I'm surprised that this is a separate module, actually. It should either be folded into std.io, or developed to be more generic (i.e., have range-based API, have more features like auto-flushing past a certain size, etc.). It's std.array Appender. The only difference is text vs binary output form. -- Dmitry Olshansky
Re: deprecating std.stream, std.cstream, std.socketstream
On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote: On 5/14/2012 8:02 AM, Steven Schveighoffer wrote: I keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. [...] I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. [...] I have to say, I'm with Steve on this one. While I do believe ranges will have a very important role to play in D's future I/O paradigm, I also think there needs to be a layer beneath the ranges that more directly maps to OS primitives. And as D is a systems programming language, that layer needs to be publicly available. (Note that this is how std.stdio works now, more or less.) -Lars
Re: deprecating std.stream, std.cstream, std.socketstream
On Tuesday, 15 May 2012 at 15:22:03 UTC, Lars T. Kyllingstad wrote: On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote: On 5/14/2012 8:02 AM, Steven Schveighoffer wrote: I keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. [...] I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. [...] I have to say, I'm with Steve on this one. While I do believe ranges will have a very important role to play in D's future I/O paradigm, I also think there needs to be a layer beneath the ranges that more directly maps to OS primitives. And as D is a systems programming language, that layer needs to be publicly available. (Note that this is how std.stdio works now, more or less.) Also, I wouldn't mind std.*stream getting deprecated. Personally, I've never used those modules -- not even once. As a first step their documentation could be removed from dlang.org, so new users aren't tempted to start using them. No functionality is better than poor functionality, IMO. -Lars
Re: deprecating std.stream, std.cstream, std.socketstream
On Sunday, 13 May 2012 at 22:26:17 UTC, Walter Bright wrote: On 5/13/2012 3:16 PM, Nathan M. Swan wrote: Trying to make it read lazily is even harder, as all std.utf functions work on arrays, not ranges. I think this should change. Yes, std.utf should be upgraded to present range interfaces. +1 on that. I really needed it when doing the std.net.curl stuff and would be happy to move it to a more generic handling in std.utf.
Re: deprecating std.stream, std.cstream, std.socketstream
On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer wrote: In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. There are several cases where one would want one byte at the time; e.g. as an input to another range that produces the utf text as an output. I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), but that doesn't mean most things shouldn't be ranged-based. NMS
Re: deprecating std.stream, std.cstream, std.socketstream
On May 15, 2012, at 3:34 PM, Nathan M. Swan nathanms...@gmail.com wrote: On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer wrote: In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. There are several cases where one would want one byte at the time; e.g. as an input to another range that produces the utf text as an output. I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), but that doesn't mean most things shouldn't be ranged-based. You really want both, depending on the situation. I don't see what's weird about this. C++ iostreams have input and output iterators built on top as well, for much the same reason. The annoying part is that once you've moved to a range interface it's hard to go back. Like say I want a ZipRange on top of a FileRange. But now I wan to read structs as binary blobs from that uncompressed output. One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length? Typically, you end up having to double buffer, which stinks.
Re: deprecating std.stream, std.cstream, std.socketstream
On Tue, May 15, 2012 at 04:43:05PM -0700, Sean Kelly wrote: [...] One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length? Typically, you end up having to double buffer, which stinks. This would be very nice to have, but how would you go about implementing such a thing, though? Wouldn't you need OS-level support for it? T -- Let's eat some disquits while we format the biskettes.
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/13/12, Kiith-Sa 4...@theanswer.com wrote: My D:YAML library (YAML parser) depends on std.stream Also ae.xml depends on it.
Re: deprecating std.stream, std.cstream, std.socketstream
From the other thread On 13/05/2012 21:58, Walter Bright wrote: On 5/13/2012 1:48 PM, Stewart Gordon wrote: On 13/05/2012 20:42, Walter Bright wrote: snip I'd like to see std.stream dumped. I don't see any reason for it to exist that std.stdio doesn't do (or should do). So std.stdio.File is the replacement for the std.stream stuff? Not exactly. Ranges are the replacement. std.stdio.File is merely a range that deals with files. I don't see any of the required range methods in it. Moreover, I'm a bit confused about the means of retrieving multiple elements at once with the range API, such as a set number of bytes from a file. We have popFrontN, which advances the range but doesn't return the data from it. We have take and takeExactly, which seem to be the way to get a set number of elements from the range, but I'm confused about when/whether using these advances the original range. If I'm writing a library to read a binary file format, I want to allow the data to come from a file, a socket or a memory image. The stream API makes this straightforward. But it seems some work is needed before std.stdio and the range API are up to it. Stewart.
Re: deprecating std.stream, std.cstream, std.socketstream
On Sun, 13 May 2012 17:38:23 -0400, Walter Bright newshou...@digitalmars.com wrote: This discussion started in the thread Getting the const-correctness of Object sorted once and for all, but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming one I keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. I have no problem with getting rid of std.stream. I've never actually used it. Still, we absolutely need a non-range based low-level streaming interface to data. If nothing else, we need something we can build ranges upon, and I think my replacement does a very good job of that. -Steve
Re: deprecating std.stream, std.cstream, std.socketstream
On 13-05-2012 23:38, Walter Bright wrote: This discussion started in the thread Getting the const-correctness of Object sorted once and for all, but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming one While we're at it, do we want to keep std.outbuffer? -- - Alex
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/13/2012 10:22 PM, Oleg Kuporosov wrote: unfortunatelly std.stdio under Windows couldn't handle UTF16(wchar)-based file names and text IO which are naturel there. The root of issues looks in both underlying DMC C-stdio (something wrong with w* based functions?) and std.format which provides only UTF8 strings. It make sense to depreciate for reasons but only after std.stdio would support UTF16 names/flows or good replacement (Steven's std.io?) would be ready. Currently std.[c]stream is only the way to work with UTF16 filesystems in Phobos. Or switch to Tango which looks supports it too (but I don't have expirience here). Why not just convert the UTF16 strings to UTF8 ones? They have the same information.
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/14/2012 4:43 AM, Stewart Gordon wrote: If I'm writing a library to read a binary file format, I want to allow the data to come from a file, a socket or a memory image. The stream API makes this straightforward. But it seems some work is needed before std.stdio and the range API are up to it. I agree. But that's where the effort needs to be made, not in carrying stream forward.
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote: While we're at it, do we want to keep std.outbuffer? Since it's not range based, probably not.
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/14/2012 8:02 AM, Steven Schveighoffer wrote: I keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. I have no problem with getting rid of std.stream. I've never actually used it. Still, we absolutely need a non-range based low-level streaming interface to data. If nothing else, we need something we can build ranges upon, and I think my replacement does a very good job of that. I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. The ability to do things like: void main() { stdin.byChunk(1024). map!(a = a.idup). // one of those shortcomings joiner(). stripComments(). copy(stdout.lockingTextWriter()); } is just kick ass.
Re: deprecating std.stream, std.cstream, std.socketstream
On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote: On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote: While we're at it, do we want to keep std.outbuffer? Since it's not range based, probably not. Why not just fold this into std.io? I'm surprised that this is a separate module, actually. It should either be folded into std.io, or developed to be more generic (i.e., have range-based API, have more features like auto-flushing past a certain size, etc.). T -- Prosperity breeds contempt, and poverty breeds consent. -- Suck.com
Re: deprecating std.stream, std.cstream, std.socketstream
On 13-05-2012 23:38, Walter Bright wrote: This discussion started in the thread Getting the const-correctness of Object sorted once and for all, but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming one I'm all for it. I haven't used any of them, ever, and probably never will. Their APIs aren't particularly appealing, honestly. -- - Alex
Re: deprecating std.stream, std.cstream, std.socketstream
On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote: 4. they should present a range interface, not a streaming one I was just about to make a post suggesting that! You could easily integrate std.io with std.algorithm to do some pretty cool things. NMS
Re: deprecating std.stream, std.cstream, std.socketstream
On Sunday, May 13, 2012 14:38:23 Walter Bright wrote: This discussion started in the thread Getting the const-correctness of Object sorted once and for all, but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming one I think that it's been a foregone conclusion for some time that they were going to go. We just haven't done it, because we don't have replacements for them yet. IIRC Steven's std.stdio rewrite at least partially covers that, but he hasn't been able to finish it yet. - Jonathan M Davis
Re: deprecating std.stream, std.cstream, std.socketstream
On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote: This discussion started in the thread Getting the const-correctness of Object sorted once and for all, but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming one My D:YAML library (YAML parser) depends on std.stream (e.g. for cross-endian compatibility and loading from memory), and I've been waiting for a replacement since the first release. I support removing std.stream, but it needs a replacement with equivalent functionality. Actually, I've postponed a 1.0 release _until_ std.stream is replaced.
Re: deprecating std.stream, std.cstream, std.socketstream
On 5/13/2012 3:16 PM, Nathan M. Swan wrote: Trying to make it read lazily is even harder, as all std.utf functions work on arrays, not ranges. I think this should change. Yes, std.utf should be upgraded to present range interfaces.
Re: deprecating std.stream, std.cstream, std.socketstream
On Sun, May 13, 2012 at 02:38:23PM -0700, Walter Bright wrote: This discussion started in the thread Getting the const-correctness of Object sorted once and for all, but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming one I agree with all of the above. The only problem is, where's the replacement? We need std.io in usable shape before we can feasibly carry out any of the above. It would make D look utterly ridiculous if all of the above were deprecated with no usable replacement. T -- If lightning were to ever strike an orchestra, it'd always hit the conductor first.
Re: deprecating std.stream, std.cstream, std.socketstream
On 13/05/2012 22:38, Walter Bright wrote: This discussion started in the thread Getting the const-correctness of Object sorted once and for all, but it deserved its own thread. These modules suffer from the following problems: 1. poor documentation, dearth of examples rationale 2. toHash(), toString(), etc., all need to be const pure nothrow, but it's turning out to be problematic for doing it for these classes 3. overlapping functionality with std.stdio 4. they should present a range interface, not a streaming one I make use of std.stream quite a lot... It's horrible, it has to go. I'm not too bothered if replacements aren't available straight away, as it doesn't take much to drop 10 lines of replacement in for the functionality I use from it until the actual replacement appears. -- Robert http://octarineparrot.com/
Re: deprecating std.stream, std.cstream, std.socketstream
On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote: 3. overlapping functionality with std.stdio unfortunatelly std.stdio under Windows couldn't handle UTF16(wchar)-based file names and text IO which are naturel there. The root of issues looks in both underlying DMC C-stdio (something wrong with w* based functions?) and std.format which provides only UTF8 strings. It make sense to depreciate for reasons but only after std.stdio would support UTF16 names/flows or good replacement (Steven's std.io?) would be ready. Currently std.[c]stream is only the way to work with UTF16 filesystems in Phobos. Or switch to Tango which looks supports it too (but I don't have expirience here).