Re: protocol for using InputRanges
On Sat, 29 Mar 2014 17:02:30 -0400, Marc Schütz schue...@gmx.net wrote: On Saturday, 29 March 2014 at 01:36:46 UTC, Steven Schveighoffer wrote: On Fri, 28 Mar 2014 07:47:22 -0400, Marc Schütz schue...@gmx.net wrote: On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote: If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks. ... but of course lose laziness. In this case, laziness is not critical. Decoding the element is an O(1) operation, and when looping through, you will decode it anyway. When processing a 20 element string, the cost of checking to see if you have decoded on every empty or front call may override the front-loaded cost of decoding the first element on construction. It's sure to add to the cost if you are processing all 20 elements, since you decode them all anyway. On other ranges, it's more important when the first element costs a lot to fetch. HOWEVER, it's not critically important to delay that unless you are not going to process that element. For example, if you are foreach'ing over all the elements, the delay doesn't matter. I'd rather the higher level code decide whether to delay or not, depending on the situation. Requiring a protocol change for such detailed knowledge seems unbalanced. I was more thinking of the fact that you need to read something on construction, rather than on consumption, and this reading might be noticeable. There was the example of `stdin.byLine().filter(...)` (or something similar, don't remember exactly), which reads from stdin on construction. This changes the behaviour of the program, because the read operation will (probably) block. Blocking operations may have a noticeable impact, but they may not. It depends on what you do between construction and processing. For example, if you have: foreach(l; stdin.byLine) Lazily fetching the first line makes no operational difference whatsoever, even if it blocks, because you're immediately going to process it. But if it *is* lazily fetched, you are paying some possibly small, but nonzero, cost for that laziness that you didn't need to pay. This is why I said it's not critically important to delay unless you are not going to process the first element. Because there is no primitive to prime the range, we must do it on a property fetch, or on construction. But after that, popFront is a perfectly reasonable place to get the next element. All this fuss is over the *first* element in the range. I think providing a mechanism to decide whether you want it now or later is reasonable. For example a lazy range wrapper. Note, however, the case Andrei was arguing about was one of the string decoders. When you are blocking for input, hell, even if you aren't blocking, but need to call system calls to get it, the performance cost of checking a boolean every call is negligible. But when you are decoding 1-4 bytes of data in memory, checking a boolean becomes significant. -Steve
Re: protocol for using InputRanges
On Fri, 28 Mar 2014 22:23:29 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/28/2014 3:42 AM, Steven Schveighoffer wrote: I'm also curious what a generic read() should do when the stream is empty. Returns 0 elements read. Meaning it must also write through a pointer if it did read something. How is that faster than a 1 element buffer? I don't understand this point/question. -Steve
Re: protocol for using InputRanges
On Fri, 28 Mar 2014 00:58:35 -0400, Walter Bright newshou...@digitalmars.com wrote: Following the protocol empty-front-popFront always works. Where the differences come in is when people skip calling empty. -- when it is known empty will return false through logical deduction. -Steve
Re: protocol for using InputRanges
On Monday, 31 March 2014 at 11:40:11 UTC, Steven Schveighoffer wrote: Blocking operations may have a noticeable impact, but they may not. It depends on what you do between construction and processing. For example, if you have: foreach(l; stdin.byLine) Lazily fetching the first line makes no operational difference whatsoever, even if it blocks, because you're immediately going to process it. But if it *is* lazily fetched, you are paying some possibly small, but nonzero, cost for that laziness that you didn't need to pay. This is why I said it's not critically important to delay unless you are not going to process the first element. Because there is no primitive to prime the range, we must do it on a property fetch, or on construction. But after that, popFront is a perfectly reasonable place to get the next element. All this fuss is over the *first* element in the range. I think providing a mechanism to decide whether you want it now or later is reasonable. For example a lazy range wrapper. I've found the example I was talking about: http://forum.dlang.org/thread/mailman.323.1393458346.6445.digitalmar...@puremagic.com I misremembered, filter wasn't even involved. But similar situations may arise from using filter or another range that eagerly fetches the first element, even if its source doesn't. Note, however, the case Andrei was arguing about was one of the string decoders. When you are blocking for input, hell, even if you aren't blocking, but need to call system calls to get it, the performance cost of checking a boolean every call is negligible. But when you are decoding 1-4 bytes of data in memory, checking a boolean becomes significant. That's what I meant by a tight loop. My hypothesis is that in such cases the optimizers of at least GDC and LDC are good enough to remove the check for the boolean completely, and that at least LDC can then remove the boolean itself from the range structure.
Re: protocol for using InputRanges
On Mon, 31 Mar 2014 13:43:05 -0400, Marc Schütz schue...@gmx.net wrote: I've found the example I was talking about: http://forum.dlang.org/thread/mailman.323.1393458346.6445.digitalmar...@puremagic.com I misremembered, filter wasn't even involved. But similar situations may arise from using filter or another range that eagerly fetches the first element, even if its source doesn't. Sure, but fetching lazily adds cost. It's possible to add it as an option when needed, but it's not easy to reclaim the cost if it's the default implementation. That's what I meant by a tight loop. My hypothesis is that in such cases the optimizers of at least GDC and LDC are good enough to remove the check for the boolean completely, and that at least LDC can then remove the boolean itself from the range structure. I suspect that this is an optimization that will break down when not done in the right way. I'd rather be able to choose, do I need lazy evaluation or not? The cases where lazy evaluation is needed are not common. -Steve
Re: protocol for using InputRanges
On Saturday, 29 March 2014 at 09:06:45 UTC, monarch_dodra wrote: On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu wrote: It increasingly seems to me we need to do this. -- Andrei Question: Should this new restriction only be applied to *Input*Ranges? If so, then it might make more sense. Ranges that also satisfy the *Forward* interface (which has save), or specifically, the *Output* interface (that doesn't even have empty) should not have this obligation. In particular, output ranges *write*, so usually can't even buffer data. *Input*-only Ranges are the ones that are usually implemented over streams, and that are most subject to the test empty issue too, so they are the ones that need this restriction. Could this maybe be a good solution for everyone? Did this conversation die the instant I actually had something smart to say? @andralex, @WalterBright ? Thoughts?
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote: On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote: On 3/27/2014 12:21 PM, Rainer Schuetze wrote: This loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then. I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor. Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; } popFront is then required to return !empty. 'empty' as a separate property getter can stay but is not required for the protocol. This way, it's clear that the work to fetch the next element is always done in popFront. Generally I find dependecies between functions problematic that require a specific call sequence. If they can be removed, they should. With my proposed solution, there's still one minor dependency, namely that front is not valid before the first popFront. This could be solved by again combining the two, as proposed by someone else in this thread. Tobi First sorry for my english. I agree. All work is done in bool popFront() (well, this name may no longer be the most appropriate). In this function the first / next item is obtained and caches. Returns true if any element, false otherwise. front return de cached element as now, as many times as desired and without side effects. empty function is no longer necessary, but it might be useful to keep changing the return type for example to int: 1 - Sure there is more elements 0 - Sure there is NO more elements -1 - I don't known. Try popFront to know if more element Of course this function as front, without side effects
Re: protocol for using InputRanges
On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu wrote: It increasingly seems to me we need to do this. -- Andrei What I find funny here is that walter is saying we must enforce that empty always be called in the range - for performance. Yet about a year ago, he made a proposal for a SentinelInputRange, where you could do away with empty altogether because tl,dr: PERFORMANCE!. http://forum.dlang.org/thread/kgmatj$1v8b$1...@digitalmars.com If you need to deviate from the rules for extra performance, then fine. Just document that it is a lower level range, with useage limitations. However, making a global design change because *1* type of range needs it, for performance, I'm not fine with.
Re: protocol for using InputRanges
29-Mar-2014 06:47, Andrei Alexandrescu пишет: On 3/28/14, 11:40 AM, Dmitry Olshansky wrote: 28-Mar-2014 22:29, Walter Bright пишет: On 3/28/2014 10:11 AM, Dmitry Olshansky wrote: WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code. That's why we have things like byLine(). Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW. byLine doesn't use getc. -- Andrei Indeed not every version. Linux looks almost OK. There is still this stuff in empty: https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1604 And looking through the code I see that Win64, OSX and FreeBSD versions still use getc. Win32 hacks right into DMC run-time, and on linux getdelim is used. The point about ranges only ever making sense over buffered I/O still stands. Preferably not C runtime. -- Dmitry Olshansky
Re: protocol for using InputRanges
29-Mar-2014 02:05, Walter Bright пишет: On 3/28/2014 11:40 AM, Dmitry Olshansky wrote: Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW. How about a PR to fix it? Sorry, I'm not inclined to do any work on hacking arbitrary C runtime libraries. Too much work on reverse engineering and making sure it stays in sync. We need new I/O package and hopefully Steven has something brewing. -- Dmitry Olshansky
Re: protocol for using InputRanges
On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu wrote: It increasingly seems to me we need to do this. -- Andrei Question: Should this new restriction only be applied to *Input*Ranges? If so, then it might make more sense. Ranges that also satisfy the *Forward* interface (which has save), or specifically, the *Output* interface (that doesn't even have empty) should not have this obligation. In particular, output ranges *write*, so usually can't even buffer data. *Input*-only Ranges are the ones that are usually implemented over streams, and that are most subject to the test empty issue too, so they are the ones that need this restriction. Could this maybe be a good solution for everyone?
Re: protocol for using InputRanges
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote: It's become clear to me that we've underspecified what an InputRange is. As someone who has missed this discussion, can I just say this: can we please document the result somewhere, and possibly even announce it clearly so that people know that something has changed?
Re: protocol for using InputRanges
On Saturday, 29 March 2014 at 01:36:46 UTC, Steven Schveighoffer wrote: On Fri, 28 Mar 2014 07:47:22 -0400, Marc Schütz schue...@gmx.net wrote: On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote: If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks. ... but of course lose laziness. In this case, laziness is not critical. Decoding the element is an O(1) operation, and when looping through, you will decode it anyway. When processing a 20 element string, the cost of checking to see if you have decoded on every empty or front call may override the front-loaded cost of decoding the first element on construction. It's sure to add to the cost if you are processing all 20 elements, since you decode them all anyway. On other ranges, it's more important when the first element costs a lot to fetch. HOWEVER, it's not critically important to delay that unless you are not going to process that element. For example, if you are foreach'ing over all the elements, the delay doesn't matter. I'd rather the higher level code decide whether to delay or not, depending on the situation. Requiring a protocol change for such detailed knowledge seems unbalanced. I was more thinking of the fact that you need to read something on construction, rather than on consumption, and this reading might be noticeable. There was the example of `stdin.byLine().filter(...)` (or something similar, don't remember exactly), which reads from stdin on construction. This changes the behaviour of the program, because the read operation will (probably) block. I'd suggest to make it a requirement for ranges and algorithms _not_ to start consuming the underlying data until one of empty/front/popFront is called, even if that has a negative effect on performance. That's why I was asking for performance numbers, to see whether there even is an effect. If there isn't, that's just another argument for adding that requirement. This is then, IMO, a very acceptable additional burden to place on the writers of ranges. I agree, however, that it's not a good idea to change the range protocol, i.e. what _users_ of ranges have to abide by. That would be a breaking change, and it would be an especially bad one because there I see no way to detect that a user failed to call `empty` in an iteration if they knew that there are more elements available.
Re: protocol for using InputRanges
Am Fri, 28 Mar 2014 19:23:29 -0700 schrieb Walter Bright newshou...@digitalmars.com: On 3/28/2014 3:42 AM, Steven Schveighoffer wrote: I'm also curious what a generic read() should do when the stream is empty. Returns 0 elements read. I guess we all have a clear concept of streams in our mind from all kinds of programming languages. They typically operate on bytes, have an EOF flag and offer read/write methods that you pass a byte pointer and a length into. The result is the count of bytes read or written. Optionally they have an available property and handle any combination of the following: o all basic data types of the programming language o POD structs o formatted strings o bitwise operations o seeking after calculating offsets They are used for I/O on heterogeneous data like binary file formats or network protocols. Ranges on the other hand work on sequences of items of the same type, which is a small subset of what streams are supposed to support. While there should be a connection between both worlds, one cannot replace the other. There will always be a separate raw stream type in Phobos. Meaning it must also write through a pointer if it did read something. How is that faster than a 1 element buffer? You can write your stream in such a way that you map a memory area. E.g. you call stream.waitForSoManyBytes(123); and then ubyte[] arr = stream.mapMemory(123); where arr is then a slice into the stream's internal buffer. (This also requires a mechanism to release the buffer after use, so the stream can reuse it: stream.doneWithSoManyBytes(123);) -- Marco
Re: protocol for using InputRanges
Luís Marques wrote in message news:exmfmqftpykoaxdgl...@forum.dlang.org... - Is it allowed for an InputRange to become empty and then not empty again? For a finite RandomAccessRange to increase in length? E.g.: Not by itself - if a range's empty has returned true, calling empty repeatedly should continue to return true. If you change it externally (ie not by using the range interface) then it can become non-empty. eg int[] arr; assert(arr.empty); arr ~= 3; // no longer empty, but we used a method outside the range interface to change it. I don't think a random number generator or some hardware device producing more data would be a good reason to change empty - range.empty is not asking 'is there more data ready', it's asking 'are there items left to iterate over'. - Is it allowed to not call front? E.g.: Yes.
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 02:14:40 UTC, H. S. Teoh wrote: I'm with Walter on this one. Generic code should NOT assume anything about a range it was given, and therefore it should call .empty before calling .front or .popFront. If you know that a particular range doesn't require calling .empty beforehand, then by definition your code is no longer generic, and you just have to live with the consequences of that. Nothing stops you, for example, from exposing a .get function or whatever else, *in addition* to the standard range API. Then code that knows how to deal with .get will use it, and your range remains usable with other generic code. It's not that *you* know a particular range doesn't need empty called, it's that the algorithm you are using has already previously validated there are elements in it. For example, the splitter algorithm will first save a copy of its range, and then walk it, searching for the splitting elements. It then realizes it has walked N elements. From there, it takes the original range, and packs it into a takeExactly(N) of the original range. Iterating that takeExactly(N) is faster than a raw take, *because* takeExactly was already promised that the range holds N elements, and as such, *doesn't* check for empty. Ditto for findSplit. And again, there are functions, such as copy, then simply *require* that a certain range have at least a certain amount of elements. Why check for it, if not providing it is a violation of its interface. On Thursday, 27 March 2014 at 23:52:46 UTC, Walter Bright wrote: I know that you want to get rid of empty. But getting rid of empty means that front may fail. This is why there is an empty, and any generic code MUST respect that. Front only fails *if* the range is empty. Not if you fail to call empty. Generic code respects that. What you can do is, in your range: enum empty = true; That's not the same. That's making the assumption your range will *never* be empty, which is a whole other concept (infinite ranges).
Re: protocol for using InputRanges
Am Thu, 27 Mar 2014 17:20:25 -0700 schrieb Walter Bright newshou...@digitalmars.com: On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote: On 3/27/14, 2:24 PM, Walter Bright wrote: The range protocol is designed to work with streams. It's designed to work with containers. I know we talked about streams when we designed it. It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams. It's not a giant fail, we just need to adjust the notion. Are you suggesting that ranges needn't support streams? Note also that I suggested a way Steven could create an adapter with the behavior he desired, yet still adhere to protocol. No notion adjustments required. Ranges have equivalents in other languages: iterators in c++, IEnumerator in c#, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 00:41:42 UTC, Walter Bright wrote: On 3/27/2014 3:23 PM, QAston wrote: The protocol is not intuitive. I find empty-front-popFront as perfectly intuitive. I don't find the counter proposals, which come with baggage like constructors that may fail, and front() that may fail in unspecified ways, or throwing entire paradigms out the window, as intuitive. But I concede that other people think differently. Not everyone thinks the same. But consider this: floating point math is not intuitive. There has never been a shortage of proposals to make fp intuitive, but they've all failed because they are impractical. Sometimes ya gotta go with what works. I _strongly_ agree with Walter: people learning D in my groups have no problems with the empty-front-popFront sequence. Please don't complicate or change the notion of range: you can find an adjustment that don't break code, but for sure that will break the mindset of people. For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront. -- Paolo
Re: protocol for using InputRanges
On Thursday, 27 March 2014 at 21:54:00 UTC, Andrei Alexandrescu wrote: On 3/27/14, 1:58 PM, Walter Bright wrote: On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote: Yah, agreed. -- Andrei Completely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY. That's a good point too. Andrei Maybe new kind of range can help? Which has bool popFront() instead of bool empty() + void popFront().
Re: protocol for using InputRanges
On Fri, 28 Mar 2014 08:59:34 -, Paolo Invernizzi paolo.invernizzi@no.address wrote: For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront. This is actually not true. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: protocol for using InputRanges
On 3/28/2014 1:32 AM, Johannes Pfau wrote: Ranges have equivalents in other languages: iterators in c++, IEnumerator in c#, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D. Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string? empty-front-popFront works with streams and non-streams. Why break this?
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 2:31 PM, Steven Schveighoffer wrote: Adding range primitives on top of a stream does not make sense. Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone? Not necessary. You just need to implement one range on top of a buffered stream, and then it works with all other algorithms that accept input ranges. I'm also curious what a generic read() should do when the stream is empty. Returns 0 elements read. -Steve
Re: protocol for using InputRanges
On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote: On Thursday, 27 March 2014 at 15:45:14 UTC, Andrei Alexandrescu wrote: On 3/27/14, 8:44 AM, Steven Schveighoffer wrote: On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 3/27/14, 5:35 AM, Steven Schveighoffer wrote: BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell. I think byXchar are important and are not streams. -- Andrei What's broken about those? Speed. I call shenanigans. This is the code: @property bool empty() { if (nLeft) return false; if (r.empty) return true; If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks. ... but of course lose laziness. Furthermore, popFront() has a guaranteed 1 call per element. Weareas empty and front may be called several times in a row for a single element. If you want efficiency, stop being member-wise lazy, and put some eagerness in your code. Does a flag isInitialized even have an effect on performance where it matters? It should only be relevant in tight loops, and there I expect a moderately well-working optimizer to inline calls to `empty` and `front`. It can then see that it has just set isInitialized to true/false, and remove the checks completely, effectively generating code as if the value were only fetched/computed in `empty`. That only leaves the larger struct size as a potential problem. However I've seen LDC being able to decompose a struct into separate variables, so I guess it can remove the unnecessary member in our case. In other situations, where you already have more complex code, an additional conditional branch will not have as much weight anyway. Does somehow have numbers for realistic programs, with GDC/LDC/DMD? I suspect this entire discussion is moot, because we can easily have non-eagerly initialized ranges without sacrificing performance in most situations.
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 04:58:28 UTC, Walter Bright wrote: On 3/27/2014 3:21 AM, Chris wrote: I agree. I've been using ranges for a while now and have tried different implementations based on advice given on this forum and depending on each case. After reading this thread I looked at some of my ranges and I have to say I still have no clue as to what should and should _not_ be done (regardless of whether you _can_ do it). For a while I thought that it's my lack of understanding, but this thread shows that everyone has a different view of ranges. Guidelines with use cases would be great. I remember I mentioned this in another thread already. The thing is that experimenting without proper guidelines leaves your code in an inconsistent state where you have two or more ranges doing technically the same thing but each with a different logic. Following the protocol empty-front-popFront always works. Where the differences come in is when people skip calling empty. Not only there. The whole issue of front and popFront has not been made clear yet (unless it's my lack of understanding). What should and should not be done there, even if you stick to the basic protocol (!empty front popFront). I like the concept of ranges and use them a lot, because they help me to create independent components. But sometimes I think they are just glorified for-loops in the sense that we know more or less what the input is, and part of the reason why we have this thread is that we want everything to be specialized and generic at the same time. E.g. @monarch_dodra wrote: It's not that *you* know a particular range doesn't need empty called, it's that the algorithm you are using has already previously validated there are elements in it. For example, the splitter algorithm will first save a copy of its range, and then walk it, searching for the splitting elements. It then realizes it has walked N elements. But then it's no longer generic. If your range depends on another algorithm (that you assume or know to have checked something), then it's no longer generic. In many cases generic means an overhead of checks, and is anathema to performance tuning. I think it's the issue of generic vs. specialized / optimized that started this thread. And this is a tough one to sort out.
Re: protocol for using InputRanges
I think something has gone wrong in this discussion somewhere. It is like this. Do not assume .empty is const, this is unreasonable. Do not assume .front is const, this is unreasonable. .popFront has the only high-level observable difference in removing something from the front of an InputRange. I say it is unreasonable to expect .empty or .front to be const because a range which is strictly an InputRange (not a ForwardRange...) is likely to be doing some kind of work where stepping through its sequence of data is destructive. As soon as the thing is read in the implementation of the InputRange, you can't read that thing again. You've cached it or it is gone. Thus .empty and .front must be non-const and lazily evaluate caching the next value. I do not see it as a major issue to check .empty or call .front many times, even with this caching behaviour. Commonly this is as simple as checking whether or not you still have a value to provide inside the InputRange. We use assert(!empty) etc. because honestly I cannot imagine another way to make this safe. If you really, really hate this double-checking in .empty or .front, you could try to change InputRanges so they behave like Java Iterators. I believe this is a really bad idea. You have to start thinking about returning nullable types. You have to push the job of caching values on to users of the InputRanges, as in each and every algorithm. It's just a really bad choice, and with the kind of boxing you'll need to do for nullable types, you'll surely end off worse as far as performance goes anyway. Really, the range protocol is as Walter specified at the start of this thread. The documentation should reflect how .empty or .front may result in caching the next value. I really don't think this is a major performance problem worth worrying about, though. If it's really, really worth it for certain standard algorithms, perhaps something can be done in private range methods or similar to offer some kind of speed boost. ('package' privacy for std.* is feasible.) I don't think there is a lot to be gained from this in general, though. Streams are different, and may be more appropriate for some functions for performance reasons. I still prefer ranges for code where performance is less critical, because if you aren't very worried about performance, the range API is a lot nicer. I'm not dead set on maximum performance if it makes life a lot harder in my code, personally. I view these kinds of improvements as premature optimisations if I'm writing something new.
Re: protocol for using InputRanges
On Fri, 28 Mar 2014 14:15:10 -, Chris wend...@tcd.ie wrote: Earlier Walter wrote: I don't like being in the position of when I need high performance code, I have to implement my own ranges algorithms, or telling customers they need to do so. I don't think there is a one size fits all. What if customers ask for maximum security? In any language, if I want high performance, I have to be prepared to walk on thin ice. If I want things to be safe and / or generic, I have to accept additonal checks (= perfomance penalties). I don't think that a language can solve the fundamental problems concerning programming / mathematical logic with all the contradictory demands involved. It can give us the tools to cope with those problems, but not solve them out of the box. You can build safety on top of performance. You cannot do the opposite. Meaning, one could wrap an unsafe/fast range with a safe/slower one. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 15:49:06 UTC, Regan Heath wrote: On Fri, 28 Mar 2014 14:15:10 -, Chris wend...@tcd.ie wrote: Earlier Walter wrote: I don't like being in the position of when I need high performance code, I have to implement my own ranges algorithms, or telling customers they need to do so. I don't think there is a one size fits all. What if customers ask for maximum security? In any language, if I want high performance, I have to be prepared to walk on thin ice. If I want things to be safe and / or generic, I have to accept additonal checks (= perfomance penalties). I don't think that a language can solve the fundamental problems concerning programming / mathematical logic with all the contradictory demands involved. It can give us the tools to cope with those problems, but not solve them out of the box. You can build safety on top of performance. You cannot do the opposite. Meaning, one could wrap an unsafe/fast range with a safe/slower one. R But should unsafe+fast be the default or rather an option for cases when you really need it?
Re: protocol for using InputRanges
On Fri, 28 Mar 2014 16:30:36 -, John Stahara john.stahara+dl...@gmail.com wrote: On Fri, 28 Mar 2014 16:23:11 +, Paolo Invernizzi wrote: On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote: On Fri, 28 Mar 2014 08:59:34 -, Paolo Invernizzi paolo.invernizzi@no.address wrote: For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront. This is actually not true. R What I'm meaning, it's that we don't care: we are always respecting the sequence empty front pop, and everybody here find it natural. To clarify for Mr. Invernizzi: the we to which he refers is the group of people he works with, and /not/ the members of this newsgroup. --jjs Thanks, that was confusing me :) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: protocol for using InputRanges
28-Mar-2014 13:55, Walter Bright пишет: On 3/28/2014 1:32 AM, Johannes Pfau wrote: Ranges have equivalents in other languages: iterators in c++, IEnumerator in c#, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D. Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string? Certainly NOT a socket. There is no escaping the fact that there are specifics to unbuffered direct streams. What you mention only makes sense with buffering either implicit or (I'd prefer) explicit. empty-front-popFront works with streams and non-streams. Why break this? Pipe dream. -- Dmitry Olshansky
Re: protocol for using InputRanges
On Fri, 28 Mar 2014 16:04:29 -, Chris wend...@tcd.ie wrote: On Friday, 28 March 2014 at 15:49:06 UTC, Regan Heath wrote: On Fri, 28 Mar 2014 14:15:10 -, Chris wend...@tcd.ie wrote: Earlier Walter wrote: I don't like being in the position of when I need high performance code, I have to implement my own ranges algorithms, or telling customers they need to do so. I don't think there is a one size fits all. What if customers ask for maximum security? In any language, if I want high performance, I have to be prepared to walk on thin ice. If I want things to be safe and / or generic, I have to accept additonal checks (= perfomance penalties). I don't think that a language can solve the fundamental problems concerning programming / mathematical logic with all the contradictory demands involved. It can give us the tools to cope with those problems, but not solve them out of the box. You can build safety on top of performance. You cannot do the opposite. Meaning, one could wrap an unsafe/fast range with a safe/slower one. R But should unsafe+fast be the default or rather an option for cases when you really need it? Pass. My point was only that it needs to exist. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: protocol for using InputRanges
Am Fri, 28 Mar 2014 02:55:45 -0700 schrieb Walter Bright newshou...@digitalmars.com: On 3/28/2014 1:32 AM, Johannes Pfau wrote: Ranges have equivalents in other languages: iterators in c++, IEnumerator in c#, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D. Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string? Sorry, but no. It sure sounds nice first, but I can't really imagine a use case where I would want any generic algorithms to work directly on a socket. Having a look on the cheat sheet of std.algorithm 99% of these algorithms do not make sense to operate on sockets, those that do (count, until) would be bad in performance terms when operating directly byte per byte. You at least want buffered reads when doing IO operations. If you then extend range interface to give access to an internal buffer you just reinvented streams. empty-front-popFront works with streams and non-streams. Why break this? It 'works' with streams but it's way too slow. You don't want to read byte-per-byte. Of course you can always implement ranges on top of streams. Usually these will not provide byte-per-byte access but efficient higher level abstractions (byLine, byChunk, decodeText). The point is you can implement ranges on streams easily, but you can't use ranges as the generic primitive for raw data. What's the element type of a data range? ubyte - performance sucks ubyte[n], ubyte[] now you have a range of ranges, most algorithms wont work as expected (find, count, ...). (the call empty/don't call empty discussion is completely unrelated to this, btw. You can implement ranges on streams either way, but again, using ranges for raw data streams is not a good idea.)
Re: protocol for using InputRanges
On Fri, Mar 28, 2014 at 04:30:36PM +, John Stahara wrote: On Fri, 28 Mar 2014 16:23:11 +, Paolo Invernizzi wrote: On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote: On Fri, 28 Mar 2014 08:59:34 -, Paolo Invernizzi paolo.invernizzi@no.address wrote: For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront. This is actually not true. R What I'm meaning, it's that we don't care: we are always respecting the sequence empty front pop, and everybody here find it natural. To clarify for Mr. Invernizzi: the we to which he refers is the group of people he works with, and /not/ the members of this newsgroup. [...] FWIW I find it perfectly natural as well. (But I know not everyone agrees. :P) T -- The two rules of success: 1. Don't tell everything you know. -- YHL
Re: protocol for using InputRanges
28-Mar-2014 04:20, Walter Bright пишет: On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote: On 3/27/14, 2:24 PM, Walter Bright wrote: The range protocol is designed to work with streams. It's designed to work with containers. I know we talked about streams when we designed it. If streams are like hot raw sockets then certainly it doesn't make sense. Ranges work nice when there are some slots down below, that is a buffer. It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams. It's not a giant fail, we just need to adjust the notion. Are you suggesting that ranges needn't support streams? Yes they don't support streams. They are an abstraction somewhere on top of buffering. This is where they fit nicely. -- Dmitry Olshansky
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote: On Fri, 28 Mar 2014 08:59:34 -, Paolo Invernizzi paolo.invernizzi@no.address wrote: For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront. This is actually not true. R What I'm meaning, it's that we don't care: we are always respecting the sequence empty front pop, and everybody here find it natural. -- Paolo
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 16:59:05 UTC, Johannes Pfau wrote: It 'works' with streams but it's way too slow. You don't want to read byte-per-byte. Of course you can always implement ranges on top of streams. Usually these will not provide byte-per-byte access but efficient higher level abstractions (byLine, byChunk, decodeText). The point is you can implement ranges on streams easily, but you can't use ranges as the generic primitive for raw data. What's the element type of a data range? ubyte - performance sucks ubyte[n], ubyte[] now you have a range of ranges, most algorithms wont work as expected (find, count, ...). (the call empty/don't call empty discussion is completely unrelated to this, btw. You can implement ranges on streams either way, but again, using ranges for raw data streams is not a good idea.) I think a key is to offer something with gives you chunks at a time right at the top, and the use .joiner on that. I read files this way currently. auto fileByteRange = File(something).byChunk(chunkSize).joiner; I believe this to be a very good way to get good performance without losing the functionality of std.algorithm.
Re: protocol for using InputRanges
On Fri, 28 Mar 2014 16:23:11 +, Paolo Invernizzi wrote: On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote: On Fri, 28 Mar 2014 08:59:34 -, Paolo Invernizzi paolo.invernizzi@no.address wrote: For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront. This is actually not true. R What I'm meaning, it's that we don't care: we are always respecting the sequence empty front pop, and everybody here find it natural. To clarify for Mr. Invernizzi: the we to which he refers is the group of people he works with, and /not/ the members of this newsgroup. --jjs
Re: protocol for using InputRanges
28-Mar-2014 21:07, Walter Bright пишет: On 3/28/2014 9:48 AM, Dmitry Olshansky wrote: 28-Mar-2014 13:55, Walter Bright пишет: On 3/28/2014 1:32 AM, Johannes Pfau wrote: Ranges have equivalents in other languages: iterators in c++, IEnumerator in c#, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D. Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string? Certainly NOT a socket. There is no escaping the fact that there are specifics to unbuffered direct streams. What you mention only makes sense with buffering either implicit or (I'd prefer) explicit. Yes, it does require a one element buffer. But seriously, does a one character buffer from a socket have a measurable impact on reading from a network? WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code. I'm an efficiency wonk as much or more than anyone, and this appears to me to be a false savings. Oh crap. This is very wrong. Do you often work with I/O and networking? -- Dmitry Olshansky
Re: protocol for using InputRanges
On 3/28/2014 9:48 AM, Dmitry Olshansky wrote: 28-Mar-2014 13:55, Walter Bright пишет: On 3/28/2014 1:32 AM, Johannes Pfau wrote: Ranges have equivalents in other languages: iterators in c++, IEnumerator in c#, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D. Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string? Certainly NOT a socket. There is no escaping the fact that there are specifics to unbuffered direct streams. What you mention only makes sense with buffering either implicit or (I'd prefer) explicit. Yes, it does require a one element buffer. But seriously, does a one character buffer from a socket have a measurable impact on reading from a network? I'm an efficiency wonk as much or more than anyone, and this appears to me to be a false savings.
Re: protocol for using InputRanges
Am Fri, 28 Mar 2014 17:22:26 + schrieb w0rp devw...@gmail.com: I think a key is to offer something with gives you chunks at a time right at the top, and the use .joiner on that. I read files this way currently. auto fileByteRange = File(something).byChunk(chunkSize).joiner; byChunk is implemented on top of the file rawRead API though, and that's a stream API ;-) As said before implementing ranges on top of streams is fine, but if you want ranges to replace streams as the lowest level interface you'll either suffer from performance issues or you'll have to extend the range interface and effectively make it a stream interface. (For example byChunk doesn't offer a way to provide a buffer. I'd expect a low level API to offer this, but it'll complicate range API a lot. File.rawRead on the other hand provides exactly that and you can implement byChunk on top of rawRead easily. The other way round is not as easy). BTW: If this code performs well of course depends what you with that fileByteRange range. For example if you only read the complete file into a memory buffer joiner would reduce performance significantly. I believe this to be a very good way to get good performance without losing the functionality of std.algorithm. Yes, that's exactly how ranges/streams should interface, there's no real problem for users. stream.getSomeRange().rangeAPICalls
Re: protocol for using InputRanges
Earlier Walter wrote: I don't like being in the position of when I need high performance code, I have to implement my own ranges algorithms, or telling customers they need to do so. I don't think there is a one size fits all. What if customers ask for maximum security? In any language, if I want high performance, I have to be prepared to walk on thin ice. If I want things to be safe and / or generic, I have to accept additonal checks (= perfomance penalties). I don't think that a language can solve the fundamental problems concerning programming / mathematical logic with all the contradictory demands involved. It can give us the tools to cope with those problems, but not solve them out of the box.
Re: protocol for using InputRanges
On 3/28/2014 10:11 AM, Dmitry Olshansky wrote: WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code. That's why we have things like byLine().
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 08:34:08 UTC, Johannes Pfau wrote: Am Thu, 27 Mar 2014 17:20:25 -0700 schrieb Walter Bright newshou...@digitalmars.com: On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote: On 3/27/14, 2:24 PM, Walter Bright wrote: The range protocol is designed to work with streams. It's designed to work with containers. I know we talked about streams when we designed it. It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams. It's not a giant fail, we just need to adjust the notion. Are you suggesting that ranges needn't support streams? Note also that I suggested a way Steven could create an adapter with the behavior he desired, yet still adhere to protocol. No notion adjustments required. Ranges have equivalents in other languages: iterators in c++, IEnumerator in c#, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D. There are stream iterators in C++: http://www.cplusplus.com/reference/iterator/istream_iterator/
Re: protocol for using InputRanges
28-Mar-2014 22:29, Walter Bright пишет: On 3/28/2014 10:11 AM, Dmitry Olshansky wrote: WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code. That's why we have things like byLine(). Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW. -- Dmitry Olshansky
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 16:30:36 UTC, John Stahara wrote: On Fri, 28 Mar 2014 16:23:11 +, Paolo Invernizzi wrote: On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote: On Fri, 28 Mar 2014 08:59:34 -, Paolo Invernizzi paolo.invernizzi@no.address wrote: For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront. This is actually not true. R What I'm meaning, it's that we don't care: we are always respecting the sequence empty front pop, and everybody here find it natural. To clarify for Mr. Invernizzi: the we to which he refers is the group of people he works with, and /not/ the members of this newsgroup. --jjs Thank you John, that's exact: I'm talking about my colleagues working with D. -- Paolo
Re: protocol for using InputRanges
On 3/28/2014 11:40 AM, Dmitry Olshansky wrote: Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW. How about a PR to fix it?
Re: protocol for using InputRanges
On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote: On 3/27/2014 12:21 PM, Rainer Schuetze wrote: This loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then. I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor. Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; } popFront is then required to return !empty. 'empty' as a separate property getter can stay but is not required for the protocol. This way, it's clear that the work to fetch the next element is always done in popFront. Generally I find dependecies between functions problematic that require a specific call sequence. If they can be removed, they should. With my proposed solution, there's still one minor dependency, namely that front is not valid before the first popFront. This could be solved by again combining the two, as proposed by someone else in this thread. Tobi
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote: On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote: On 3/27/2014 12:21 PM, Rainer Schuetze wrote: This loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then. I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor. Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; } In c++ if you had a list or deque you would obviously did if(empty()) before calling front() too, and the before a call to pop_front(). Container is kind of range and in c++ it looks exactly the same: while (!cont.empty()) { auto var = cont.front(); cont.pop_front(); // consume } 3 operations are needed.
Re: protocol for using InputRanges
On 3/28/14, 3:42 AM, Steven Schveighoffer wrote: On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 2:31 PM, Steven Schveighoffer wrote: Adding range primitives on top of a stream does not make sense. Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone? Not necessary. You just need to implement one range on top of a buffered stream, and then it works with all other algorithms that accept input ranges. I'm also curious what a generic read() should do when the stream is empty. Returns 0 elements read. It increasingly seems to me we need to do this. -- Andrei
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 22:05:33 UTC, Walter Bright wrote: On 3/28/2014 11:40 AM, Dmitry Olshansky wrote: Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW. How about a PR to fix it? Yes. I hold the opinion that there is not a whole lot of reason why something like byChunk can't be fast like we might desire. If it's down to getting chunks of data the right way, whatever that is, and the problem is an IO bound problem, then I don't see why we can't submit pull requests to offer optimisations on it. I don't think we need to go down this the way it does IO isn't fast enough so we need to replace it route. Just optimise the existing thing more.
Re: protocol for using InputRanges
On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote: On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote: On 3/27/2014 12:21 PM, Rainer Schuetze wrote: This loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then. I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor. Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; } popFront is then required to return !empty. 'empty' as a separate property getter can stay but is not required for the protocol. This way, it's clear that the work to fetch the next element is always done in popFront. Generally I find dependecies between functions problematic that require a specific call sequence. If they can be removed, they should. With my proposed solution, there's still one minor dependency, namely that front is not valid before the first popFront. This could be solved by again combining the two, as proposed by someone else in this thread. Tobi Well, it might seem kind of weird at first that an InputRange has these concepts in three distinct methods, but they really are three separate operations. After I have spent enough time working with Java Iterators, I have found it much nicer to be able to get the current value or see if I'm at the end or not without having to worry about caching the current value myself. Also as I've said before it gives you some opportunity to make trying to fetch something out of an empty range less error prone, because you don't have to check for a null value, etc.
Re: protocol for using InputRanges
On Fri, 28 Mar 2014 20:14:39 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 3/28/14, 3:42 AM, Steven Schveighoffer wrote: On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 2:31 PM, Steven Schveighoffer wrote: Adding range primitives on top of a stream does not make sense. Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone? Not necessary. You just need to implement one range on top of a buffered stream, and then it works with all other algorithms that accept input ranges. I'm also curious what a generic read() should do when the stream is empty. Returns 0 elements read. It increasingly seems to me we need to do this. -- Andrei It is in the works. I need to finish it. I've been incorporating Dmitry's I/O buffer into my design. -Steve
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 19:52:52 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 2:46 PM, Steven Schveighoffer wrote: What I am protesting is the idea that you *know* input exists, you must still call empty. Front is enough in that case. You can't use that on generic code, because generic code cannot assume that front will not fail. There is a difference here that I want to establish. First, I completely agree that for generic code, for when an unknown range is passed in, empty is required to be called to verify that front is valid. Second, while I agree that empty should be required to verify front is valid on unknown ranges, it should not be required, when it is known that it's not empty. What you are proposing is that we take advantage of the requirement for empty on unknown ranges to require it on known ones. I disagree, the code looks awkward and incorrect. When I know something is empty, calling empty again doesn't give me information. instead empty now becomes is it empty, and if it needs priming, prime it. I'd rather see another primitive, since empty does not convey that message. A counter proposal -- What if we create a global method in std.range called prime. prime(r) will call r.prime if it exists, otherwise calls r.empty, and ignores the result. Code that wants to work with primable ranges, can call that function, and it's harmless for normal ranges, but will allow primable ranges to fetch the first element. I think penalizing all range-using code to force a non-functional empty call would 1. artificially invalidate lots of currently valid code and 2. look completely awkward in cases where it's not normally necessary. -Steve
Re: protocol for using InputRanges
On Fri, 28 Mar 2014 07:47:22 -0400, Marc Schütz schue...@gmx.net wrote: On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote: I call shenanigans. This is the code: @property bool empty() { if (nLeft) return false; if (r.empty) return true; If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks. ... but of course lose laziness. In this case, laziness is not critical. Decoding the element is an O(1) operation, and when looping through, you will decode it anyway. When processing a 20 element string, the cost of checking to see if you have decoded on every empty or front call may override the front-loaded cost of decoding the first element on construction. It's sure to add to the cost if you are processing all 20 elements, since you decode them all anyway. On other ranges, it's more important when the first element costs a lot to fetch. HOWEVER, it's not critically important to delay that unless you are not going to process that element. For example, if you are foreach'ing over all the elements, the delay doesn't matter. I'd rather the higher level code decide whether to delay or not, depending on the situation. Requiring a protocol change for such detailed knowledge seems unbalanced. -Steve
Re: protocol for using InputRanges
On 3/28/2014 3:42 AM, Steven Schveighoffer wrote: I'm also curious what a generic read() should do when the stream is empty. Returns 0 elements read. Meaning it must also write through a pointer if it did read something. How is that faster than a 1 element buffer?
Re: protocol for using InputRanges
On 3/28/14, 11:40 AM, Dmitry Olshansky wrote: 28-Mar-2014 22:29, Walter Bright пишет: On 3/28/2014 10:11 AM, Dmitry Olshansky wrote: WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code. That's why we have things like byLine(). Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW. byLine doesn't use getc. -- Andrei
Re: protocol for using InputRanges
On Wednesday, March 26, 2014 10:36:08 Andrei Alexandrescu wrote: On 3/26/14, 8:37 AM, Steven Schveighoffer wrote: On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath re...@netmail.co.nz wrote: On Wed, 26 Mar 2014 12:30:53 -, Steven Schveighoffer schvei...@yahoo.com wrote: Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary. I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc. Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty. I think requiring users to call empty before front on input ranges is a concession we should make. I don't know. It's not the end of the world if we require it (at least with a range that doesn't have length), since you're almost always going to be forced to call empty anyway, because ranges are generally used in generic code. However, it seems really off to me for there to be any real work going on in empty, and I'd be leery of any range which actually required empty to be called before calling front (though I definitely think that calling front on an empty range should not be considered okay and would expect that to generally be undefined behavior). Regardless, if the range has length, then I'd think that quite a few of the algorithms which actually used length wouldn't actually need to call empty, making calling empty unnecessary overhead. But for the most part, I think that a lot of the weird cases go away when you end up with length and random-access ranges and the like, since that usually means that the range isn't generative but likely has an array underneath (though map certainly has its quirks thanks to the fact that front could technically allocate a new object on every call, and it can be random-access). - Jonathan M Davis
Re: protocol for using InputRanges
On 27.03.2014 03:48, Walter Bright wrote: On 3/26/2014 7:19 PM, Steven Schveighoffer wrote: if(!r.empty) { auto r2 = map!(x = x * 2)(r); do { auto x = r2.front; ... } while(!r2.empty); } Should we be required to re-verify that r2 is not empty before using it? It clearly is not, and would be an artificial requirement (one that map likely would not enforce!). This sounds so much like a convention that simply won't be followed, and the result will be perfectly valid code. The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways. If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down. To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol. IIUC what you are proposing would be covered better by removing front and return the value by popFront. Instead of forcing strong, breaking and sometimes unintuitive semantics we could also improve dmd's optimizer. This caching range example: /// T getc(T)(); struct irange(T) { bool _cached; T _cache; this(T[] arr) { _cached = false; } bool empty() { if(_cached) return false; _cache = getc!T(); return (_cache 0); } T front() { empty(); return _cache; } void popFront() { _cached = false; } } int foo(irange!int rg) { int sum = 0; while(!rg.empty) { sum += rg.front; rg.popFront; } return sum; } /// generates this code with dmd2.065 -O -inline -release -noboundscheck -m64: _D7irange23fooFS7irange213__T6irangeTiZ6irangeZi: : pushrbp 0001: mov rbp,rsp 0004: pushrax 0005: pushrsi 0006: mov qword ptr [rbp+10h],rcx 000A: xor esi,esi 000C: lea rcx,[rbp+10h] 0010: sub rsp,20h 0014: call_D7irange213__T6irangeTiZ6irange5emptyMFZb 0019: add rsp,20h 001D: xor al,1 001F: je 0050 0021: lea rcx,[rbp+10h] 0025: sub rsp,20h 0029: call_D7irange213__T6irangeTiZ6irange5emptyMFZb 002E: add rsp,20h 0032: mov eax,dword ptr [rbp+14h] 0035: add esi,eax 0037: mov byte ptr [rbp+10h],0 003B: lea rcx,[rbp+10h] 003F: sub rsp,20h 0043: call_D7irange213__T6irangeTiZ6irange5emptyMFZb 0048: add rsp,20h 004C: xor al,1 004E: jne 0021 0050: mov eax,esi 0052: pop rsi 0053: lea rsp,[rbp] 0057: pop rbp 0058: ret _D7irange213__T6irangeTiZ6irange5emptyMFZb: : pushrbp 0001: mov rbp,rsp 0004: mov qword ptr [rbp+10h],rcx 0008: cmp byte ptr [rcx],0 000B: je 0011 000D: xor eax,eax 000F: pop rbp 0010: ret 0011: sub rsp,20h 0015: call_D7irange211__T4getcTiZ4getcFZi 001A: add rsp,20h 001E: mov rcx,qword ptr [rbp+10h] 0022: mov dword ptr [rcx+4],eax 0025: shr eax,1Fh 0028: mov byte ptr [rcx],al 002A: xor al,1 002C: pop rbp 002D: ret and this with gdc (-O2 on godbolt.org): int example.foo(example.irange!(int).irange): mov rax, rdi pushrbx xor ebx, ebx shr rax, 32 testdil, dil je .L5 .L2: add ebx, eax .L5: callint example.getc!(int).getc() testeax, eax js .L2 mov eax, ebx pop rbx ret All traces of the caching but the initial state check have been removed, no extra calls.
Re: protocol for using InputRanges
On 27.03.2014 07:53, Rainer Schuetze wrote: bool empty() { if(_cached) return false; _cache = getc!T(); return (_cache 0); } Ouch, pasted before fixing: bool empty() { if(_cached) return false; _cache = getc!T(); _cached = _cache = 0; // this line added return !_cached; } The asm is with an unintended _cached = _cache 0;, but that's the same, just accepting negative input instead of non-negative.
Re: protocol for using InputRanges
On Wednesday, 26 March 2014 at 18:04:44 UTC, Ola Fosheim Grøstad wrote: On Wednesday, 26 March 2014 at 17:36:08 UTC, Andrei Alexandrescu wrote: I think requiring users to call empty before front on input ranges is a concession we should make. Then the name should change to ready. It makes sense to require the user to check that the range is ready, but not to check that it is not empty. This will also make more sense for async implementations that will block if not ready. IMO the whole interface needs rethinking if you want to gracefully support async data streams where you need to distinguish between: ready vs empty, front vs firstavailable. Both quick-sort, merge-sort, filter and map work well with async data streams. Why not? It is how iterators work in most OO languages. -- Paulo
Re: protocol for using InputRanges
On 3/26/2014 11:53 PM, Rainer Schuetze wrote: IIUC what you are proposing would be covered better by removing front and return the value by popFront. Different things being ranged over make different decisions as to what goes into each function. Instead of forcing strong, breaking and sometimes unintuitive semantics I don't understand what is so unintuitive about: while (!empty) { ... front ...; popFront(); } What I do find unintuitive and unattractive is all those flags in the range implementation. we could also improve dmd's optimizer. Yes, that's true. The thing about optimizers, though, is they work on patterns. If your code doesn't quite match the right pattern, it won't tickle the optimization you might be expecting. The pattern for recognizing the ROL and ROR patterns is one such.
Re: protocol for using InputRanges
On 3/26/2014 11:53 PM, Rainer Schuetze wrote: This caching range example: /// T getc(T)(); struct irange(T) { bool _cached; T _cache; this(T[] arr) { _cached = false; } bool empty() { if(_cached) return false; _cache = getc!T(); return (_cache 0); What happens if empty is called twice in a row? Two characters get read! That isn't right. } T front() { empty(); return _cache; } What happens if empty returns true? EOF? I don't think that's intuitive. You could have front throw, but that prevents anyone from building nothrow ranges. void popFront() { _cached = false; } }
Re: protocol for using InputRanges
On 3/26/2014 11:48 PM, Jonathan M Davis wrote: However, it seems really off to me for there to be any real work going on in empty, Many things, like some ttys, can not be tested for being empty without reading it, and reading of a tty is destructive. and I'd be leery of any range which actually required empty to be called before calling front (though I definitely think that calling front on an empty range should not be considered okay and would expect that to generally be undefined behavior). That's exactly why empty should be called before front - because front is expected to succeed.
Re: protocol for using InputRanges
On Sunday, 23 March 2014 at 01:07:27 UTC, Rikki Cattermole wrote: On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote: It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is: while (!r.empty) { auto e = r.front; ... do something with e ... r.popFront(); } no argument there. But there are two issues: 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first? If this is true, extra logic will need to be added to r.front in many cases. 2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases. It would be nice to have ranges full stop described. And how to use/make them in D. I have yet to learn them because of no official documentation on them. On this topic we also need to work on common patterns and creating documentation on e.g. the site or wiki for it. Saw that we do have a bit in the wiki under tutorials. Maybe if I get some time I'll work on that. I agree. I've been using ranges for a while now and have tried different implementations based on advice given on this forum and depending on each case. After reading this thread I looked at some of my ranges and I have to say I still have no clue as to what should and should _not_ be done (regardless of whether you _can_ do it). For a while I thought that it's my lack of understanding, but this thread shows that everyone has a different view of ranges. Guidelines with use cases would be great. I remember I mentioned this in another thread already. The thing is that experimenting without proper guidelines leaves your code in an inconsistent state where you have two or more ranges doing technically the same thing but each with a different logic.
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 02:44:13 -, Daniel Murphy yebbliesnos...@gmail.com wrote: Regan Heath wrote in message news:op.xdb9a9v354x...@puck.auriga.bhead.co.uk... What guarantees range2 is longer than range1? The isArray case checks explicitly, but the generic one doesn't. Is it a property of being an output range that it will expand as required, or.. Some ranges will give you their length... Sure. And generally you could use it. copy() doesn't, and I was talking specifically about that example. :) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: protocol for using InputRanges
On Thursday, 27 March 2014 at 04:17:16 UTC, Walter Bright wrote: On 3/26/2014 7:55 PM, Steven Schveighoffer wrote: OK, but it's logical to assume you *can* avoid a call to empty if you know what's going on under the hood, no? Then at that point, you have lost the requirement -- people will avoid calling empty because they can get away with it, and then altering the under-the-hood requirements cause code breakage later. Case in point, the pull request I referenced, the author originally tried to just use empty to lazily initialize filter, but it failed due to existing code in phobos that did not call empty on filtered data before processing. He had to instrument all 3 calls. As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows. If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it. I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): Calling r.front is allowed only if calling r.empty has, or would have, returned false. (And the same for `popFront()`.) That is, the documentation more or less explicitly states that you don't actually need to call `empty` if you know it returned `true`. [1] http://dlang.org/phobos/std_range.html
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 02:19:13 -, Steven Schveighoffer schvei...@yahoo.com wrote: if(!r.empty) { auto r2 = map!(x = x * 2)(r); do { auto x = r2.front; ... } while(!r2.empty); } if(r.empty) return; auto r2 = map!(x = x * 2)(r); while(!r2.empty) { auto x = r2.front; ... r2.popFront(); //bug fix for your version which I noticed because I followed the pattern :D } ahh.. much better. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 10:49:42 -, Marc Schütz schue...@gmx.net wrote: On Thursday, 27 March 2014 at 04:17:16 UTC, Walter Bright wrote: On 3/26/2014 7:55 PM, Steven Schveighoffer wrote: OK, but it's logical to assume you *can* avoid a call to empty if you know what's going on under the hood, no? Then at that point, you have lost the requirement -- people will avoid calling empty because they can get away with it, and then altering the under-the-hood requirements cause code breakage later. Case in point, the pull request I referenced, the author originally tried to just use empty to lazily initialize filter, but it failed due to existing code in phobos that did not call empty on filtered data before processing. He had to instrument all 3 calls. As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows. If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it. I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): Calling r.front is allowed only if calling r.empty has, or would have, returned false. (And the same for `popFront()`.) That is, the documentation more or less explicitly states that you don't actually need to call `empty` if you know it returned `true`. [1] http://dlang.org/phobos/std_range.html That's because up until now we've made no attempt to set this in stone, and as such many interpretations have surfaced. This documentation would, of course, change to match the final decision made. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: protocol for using InputRanges
On Thursday, 27 March 2014 at 08:13:28 UTC, Paulo Pinto wrote: Why not? It is how iterators work in most OO languages. Why not what? A query for empty() should not have any side effects. In what library does that happen? Tests should in general not have side effects unless the name suggest so. Anyway, I wish D was more clear about use scenarios. If D is going to be efficient in the server space then it should also make low latency programming easier, meaning supporting async collections out-of-the-box.
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 00:17:21 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/26/2014 7:55 PM, Steven Schveighoffer wrote: OK, but it's logical to assume you *can* avoid a call to empty if you know what's going on under the hood, no? Then at that point, you have lost the requirement -- people will avoid calling empty because they can get away with it, and then altering the under-the-hood requirements cause code breakage later. Case in point, the pull request I referenced, the author originally tried to just use empty to lazily initialize filter, but it failed due to existing code in phobos that did not call empty on filtered data before processing. He had to instrument all 3 calls. As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows. Like range.save. It's required, but frequently omitted, without any consequences. I'm not saying requiring it would be an invalid decision, I'm saying requiring it would be a futile gesture -- the requirement would be ignored. What happens when one of our clients code breaks severely because we make a change in phobos that assumes empty is always called first on a newly-created range? If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it. I think we should work on making a protocol that does not require awkward calls, rather than alienating developers who don't follow the awkward protocol. BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell. -Steve
Re: protocol for using InputRanges
On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer wrote: What happens when one of our clients code breaks severely because we make a change in phobos that assumes empty is always called first on a newly-created range? You probably can run-time test this in Debug builds, but… I think we should work on making a protocol that does not require awkward calls, rather than alienating developers who don't follow the awkward protocol. This is it. There is way too much awkwardness in D already. And essentially, the closer the design is to mainstream the more annoying it is when you diverge and are inconsistent. Code should behave as expected without any need for memorizing. Having to differentiate between dialects is so much harder than differentiating between orthogonal languages.
Re: protocol for using InputRanges
On 3/27/14, 3:49 AM, Marc Schütz schue...@gmx.net wrote: I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): Calling r.front is allowed only if calling r.empty has, or would have, returned false. Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively. Andrei
Re: protocol for using InputRanges
On 3/27/14, 5:35 AM, Steven Schveighoffer wrote: BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell. I think byXchar are important and are not streams. -- Andrei
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 11:19:15 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 3/27/14, 3:49 AM, Marc Schütz schue...@gmx.net wrote: I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): Calling r.front is allowed only if calling r.empty has, or would have, returned false. Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively. This is a misleading statement. An array is probably the most efficient of ranges, which does not require this. We should be more specific here. For certain types of *nonempty* ranges, front is only guaranteed to work if empty/length was called before front is called. I would not necessarily lump popFront into that requirement automatically, popFront may be able to cope with not having cached the first element without losing efficiency. Questions to answer: 1. What types of ranges does this rule apply to? 2. What is the cost of not requiring this in terms of efficiency and ease of implementation. 3. Is there a better mechanism to use than misusing empty? should we introduce another property/call? At the moment, the only candidates I can think of are lazy caching ranges. How many of those exist? -Steve
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 3/27/14, 5:35 AM, Steven Schveighoffer wrote: BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell. I think byXchar are important and are not streams. -- Andrei What's broken about those? -Steve
Re: protocol for using InputRanges
On 3/27/14, 8:44 AM, Steven Schveighoffer wrote: At the moment, the only candidates I can think of are lazy caching ranges. How many of those exist? filter comes to mind. -- Andrei
Re: protocol for using InputRanges
On 3/27/14, 8:44 AM, Steven Schveighoffer wrote: On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 3/27/14, 5:35 AM, Steven Schveighoffer wrote: BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell. I think byXchar are important and are not streams. -- Andrei What's broken about those? Speed.
Re: protocol for using InputRanges
On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu wrote: On 3/27/14, 3:49 AM, Marc Schütz schue...@gmx.net wrote: I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): Calling r.front is allowed only if calling r.empty has, or would have, returned false. Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively. I just want to point out that I think allowing empty to have an *observable* side effect will have cataclysmic repercussion in validating a release build, what with all our assert(!empty); calls and all.
Re: protocol for using InputRanges
On Thursday, 27 March 2014 at 15:45:14 UTC, Andrei Alexandrescu wrote: On 3/27/14, 8:44 AM, Steven Schveighoffer wrote: On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 3/27/14, 5:35 AM, Steven Schveighoffer wrote: BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell. I think byXchar are important and are not streams. -- Andrei What's broken about those? Speed. I call shenanigans. This is the code: @property bool empty() { if (nLeft) return false; if (r.empty) return true; If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks. Furthermore, popFront() has a guaranteed 1 call per element. Weareas empty and front may be called several times in a row for a single element. If you want efficiency, stop being member-wise lazy, and put some eagerness in your code. filter comes to mind. -- Andrei You *just* turned down a pull that did exactly that, for exactly the same reasons. Have byXChar function like filter: construction and popFront do the work, and front/empty is 0(1), and const, and strongly pure to boot. I *know* the compiler likes that in terms of speed.
Re: protocol for using InputRanges
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote: It's become clear to me that we've underspecified what an InputRange is. Some thoughts: - A while ago I realized I did not know how some of those underspecified details should be implemented (I think my initial doubt was if the range was expected/required to enforce(!empty)). Assuming it was just my ignorance (and incomplete or hard to find documentation), I asked in the IRC channel and, IIRC, the answer I got there was that ranges are an abstract concept, so such minutia were supposed to be implementation details. From this forum thread and that IRC feedback, it seems that, indeed, people have been operating under an illusion of a shared universal assumption. I'm very happy to see this addressed. - (Lots of text deleted because I was no longer sure about it ...) - Is it allowed for an InputRange to become empty and then not empty again? For a finite RandomAccessRange to increase in length? E.g.: // Example 1: (with outside control flow) if(someQueue.empty) addStuffToQueue(); assert(!someQueue.empty); // Example 2: (with only range-related control flow) if(!someQueue.empty) { auto n = someQueue.length; someQueue.popFront(); assert(someQueue.length == n-1); // can this fail? } - Is it allowed to not call front? E.g.: void dropAll(Range)(Range range) { while(!empty) range.popFront(); }
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 12:02:16 -0400, monarch_dodra monarchdo...@gmail.com wrote: On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu wrote: On 3/27/14, 3:49 AM, Marc Schütz schue...@gmx.net wrote: I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): Calling r.front is allowed only if calling r.empty has, or would have, returned false. Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively. I just want to point out that I think allowing empty to have an *observable* side effect will have cataclysmic repercussion in validating a release build, what with all our assert(!empty); calls and all. This is an excellent point. assert(!empty) is everywhere. -Steve
Re: protocol for using InputRanges
On 3/27/14, 9:12 AM, monarch_dodra wrote: filter comes to mind. -- Andrei You *just* turned down a pull that did exactly that, for exactly the same reasons. I was on the verge, and we need to discuss this more. A constructor that does arbitrary work for a lazy range doesn't sit well. -- Andrei
Re: protocol for using InputRanges
On Thursday, 27 March 2014 at 19:03:26 UTC, Andrei Alexandrescu wrote: On 3/27/14, 9:12 AM, monarch_dodra wrote: filter comes to mind. -- Andrei You *just* turned down a pull that did exactly that, for exactly the same reasons. I was on the verge, and we need to discuss this more. A constructor that does arbitrary work for a lazy range doesn't sit well. -- Andrei I still think there's ambiguity in the word lazy. I think this is the distinction most make: Not lazy: // auto r = getSomeRange(); auto arr = someRange.array(); foreach( ref e ; arr) doSomething; remove!fun(arr); return arr; // Lazy: // return getSomeRange() .map!doSomething() .filter!fun() .array(); // What you are talking about (IMO) is better described as eager vs not eager. A constructor does some work for a lazy range that does eager processing of its elements: I think this is a better description, and I think is acceptable.
Re: protocol for using InputRanges
On 27.03.2014 09:42, Walter Bright wrote: On 3/26/2014 11:53 PM, Rainer Schuetze wrote: IIUC what you are proposing would be covered better by removing front and return the value by popFront. Different things being ranged over make different decisions as to what goes into each function. Instead of forcing strong, breaking and sometimes unintuitive semantics I don't understand what is so unintuitive about: while (!empty) { ... front ...; popFront(); } This loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then. What I do find unintuitive and unattractive is all those flags in the range implementation. we could also improve dmd's optimizer. Yes, that's true. The thing about optimizers, though, is they work on patterns. If your code doesn't quite match the right pattern, it won't tickle the optimization you might be expecting. The pattern for recognizing the ROL and ROR patterns is one such. I guess gcc doesn't use patterns here, but value propagation and notices that all the checks are unnecessary.
Re: protocol for using InputRanges
On 27.03.2014 10:06, Walter Bright wrote: On 3/26/2014 11:53 PM, Rainer Schuetze wrote: This caching range example: /// T getc(T)(); struct irange(T) { bool _cached; T _cache; this(T[] arr) { _cached = false; } bool empty() { if(_cached) return false; _cache = getc!T(); return (_cache 0); What happens if empty is called twice in a row? Two characters get read! That isn't right. Good catch. Somehow I pasted the wrong version, but posted the corrections a few minutes later. } T front() { empty(); return _cache; } What happens if empty returns true? EOF? I don't think that's intuitive. You could have front throw, but that prevents anyone from building nothrow ranges. The caller is told to guarantee that empty must not return true. I just didn't wanted to repeat the stuff in empty. GDC removed it anyway...
Re: protocol for using InputRanges
On 3/27/14, 11:53 AM, Steven Schveighoffer wrote: On Thu, 27 Mar 2014 12:02:16 -0400, monarch_dodra monarchdo...@gmail.com wrote: On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu wrote: On 3/27/14, 3:49 AM, Marc Schütz schue...@gmx.net wrote: I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): Calling r.front is allowed only if calling r.empty has, or would have, returned false. Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively. I just want to point out that I think allowing empty to have an *observable* side effect will have cataclysmic repercussion in validating a release build, what with all our assert(!empty); calls and all. This is an excellent point. assert(!empty) is everywhere. -Steve Yah, agreed. -- Andrei
Re: protocol for using InputRanges
On 3/27/2014 12:24 PM, Rainer Schuetze wrote: The caller is told to guarantee that empty must not return true. I think that is the very definition of looking under the hood in order to violate protocol. A range CANNOT be written generically and support that.
Re: protocol for using InputRanges
On 3/27/2014 12:21 PM, Rainer Schuetze wrote: This loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then. I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.
Re: protocol for using InputRanges
On 3/27/2014 9:12 AM, monarch_dodra wrote: If you initialized the next element in both constructor As mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it. As also mentioned earlier, this throws under the bus entire categories of use cases for ranges, all to avoid a single call to 'empty'.
Re: protocol for using InputRanges
On 3/27/2014 3:49 AM, Marc Schütz schue...@gmx.net wrote: I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): Calling r.front is allowed only if calling r.empty has, or would have, returned false. (And the same for `popFront()`.) That is, the documentation more or less explicitly states that you don't actually need to call `empty` if you know it returned `true`. [1] http://dlang.org/phobos/std_range.html Right, it does say that, and it is seriously wrong.
Re: protocol for using InputRanges
On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote: Yah, agreed. -- Andrei Completely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 9:12 AM, monarch_dodra wrote: If you initialized the next element in both constructor As mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it. Not impossible. Use lazy initialization. As also mentioned earlier, this throws under the bus entire categories of use cases for ranges, all to avoid a single call to 'empty'. This is going to look weird: r.empty; // now we can use it... It only makes sense if you are using empty in your processing of the range. -Steve
Re: protocol for using InputRanges
On 3/27/2014 5:45 AM, Ola Fosheim Grøstad ola.fosheim.grostad+dl...@gmail.com wrote: On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer wrote: I think we should work on making a protocol that does not require awkward calls, rather than alienating developers who don't follow the awkward protocol. This is it. Sure, but what is awkward about empty-front-popFront? You can still always use: foreach (e; range) { ... } which will follow the correct protocol automatically.
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 16:58:12 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote: Yah, agreed. -- Andrei Completely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY. In the land of ranges, it's construction and popFront that generally advances the range. Empty does not. This is why assert(!empty) is all over the place, it's considered to be non-destructive. Note that a stream makes a terrible range, simply because of what you say -- reading is destructive, and determining emptiness is dependent on reading. You need a buffered stream to make a good range, and then the logic becomes much more straightforward. The awkwardness for shoehorning streams into ranges I see is that the advancement (popFront) and the check for termination (empty) are decoupled, when in reality they are synced for a stream. This REQUIRES a sticky bit for popFront to communicate with empty on whether it is EOF or not. Even a single byte buffer is not enough, you need a bool to indicate the stream is done. -Steve
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 17:06:35 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 5:45 AM, Ola Fosheim Grøstad ola.fosheim.grostad+dl...@gmail.com wrote: On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer wrote: I think we should work on making a protocol that does not require awkward calls, rather than alienating developers who don't follow the awkward protocol. This is it. Sure, but what is awkward about empty-front-popFront? You can still always use: foreach (e; range) { ... } which will follow the correct protocol automatically. If this is all we needed, ranges would never have been invented. -Steve
Re: protocol for using InputRanges
On 3/27/2014 2:13 PM, Steven Schveighoffer wrote: Note that a stream makes a terrible range, simply because of what you say -- reading is destructive, and determining emptiness is dependent on reading. You need a buffered stream to make a good range, and then the logic becomes much more straightforward. The range becomes the one element buffer in this case. It is completely workable. The awkwardness for shoehorning streams into ranges I see is that the advancement (popFront) and the check for termination (empty) are decoupled, when in reality they are synced for a stream. This REQUIRES a sticky bit for popFront to communicate with empty on whether it is EOF or not. The range protocol is designed to work with streams. It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams. Even a single byte buffer is not enough, you need a bool to indicate the stream is done. Right. But empty for a stream still has to read. Just follow the protocol, and the range will work, even with streams.
Re: protocol for using InputRanges
On 3/27/2014 1:59 PM, Steven Schveighoffer wrote: On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 9:12 AM, monarch_dodra wrote: If you initialized the next element in both constructor As mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it. Not impossible. Use lazy initialization. That still doesn't work, because then you could be calling front, and front might fail if there's no input. What is wrong with following the protocol? Why must we introduce all these caveats for ranges, like streams not working, front that can fail, can't do C# style LINQ, etc.? For what gain? It only makes sense if you are using empty in your processing of the range. I.e. follow the danged protocol and it works.
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 17:24:11 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 2:13 PM, Steven Schveighoffer wrote: Note that a stream makes a terrible range, simply because of what you say -- reading is destructive, and determining emptiness is dependent on reading. You need a buffered stream to make a good range, and then the logic becomes much more straightforward. The range becomes the one element buffer in this case. It is completely workable. A 1 byte buffered stream? If it's performance you are looking for, you will not find it there. The range protocol is designed to work with streams. It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams. Ranges work well on top of buffered streams, but not AS streams. Even a single byte buffer is not enough, you need a bool to indicate the stream is done. Right. But empty for a stream still has to read. Just follow the protocol, and the range will work, even with streams. A stream requires one primitive -- read. In one function call, you get the data you want, you can tell if it's empty, and the stream object does not need to concern itself with projecting a cached element. Adding range primitives on top of a stream does not make sense. -Steve
Re: protocol for using InputRanges
On Thu, 27 Mar 2014 17:28:05 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 1:59 PM, Steven Schveighoffer wrote: On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/27/2014 9:12 AM, monarch_dodra wrote: If you initialized the next element in both constructor As mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it. Not impossible. Use lazy initialization. That still doesn't work, because then you could be calling front, and front might fail if there's no input. In the cases where you know there is input, that's not true. What I am protesting is the idea that you *know* input exists, you must still call empty. Front is enough in that case. What is wrong with following the protocol? Why must we introduce all these caveats for ranges, like streams not working, front that can fail, can't do C# style LINQ, etc.? For what gain? There is no gain to requiring empty on ranges where you logically prove they are not empty. It's a wasted call. For things where you're not sure if they are empty or not (e.g. an input file), of course empty is required to be called, and of course it may do some processing to determine that. But I would only expect that on the first call. Subsequent checks would already be done by popFront. It only makes sense if you are using empty in your processing of the range. I.e. follow the danged protocol and it works. I mean, if you are calling empty regularly, because you aren't sure whether the elements are there. That's not always the case, sometimes you are sure the elements are there. Requiring a call to empty to do anything other than to check whether a range is empty, is just plain wrong. I'm sure there's better ways to implement what you want without stitching the functionality onto empty. -Steve
Re: protocol for using InputRanges
On 3/27/14, 1:58 PM, Walter Bright wrote: On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote: Yah, agreed. -- Andrei Completely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY. That's a good point too. Andrei
Re: protocol for using InputRanges
On 3/27/14, 2:24 PM, Walter Bright wrote: The range protocol is designed to work with streams. It's designed to work with containers. It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams. It's not a giant fail, we just need to adjust the notion. Andrei