Re: [rust-dev] Proposal for clarifying the iterator protocol

Kevin Ballard Sun, 04 Aug 2013 20:34:16 -0700

You could certainly design a non-blocking IO iterator that way. My example of a 
non-blocking IO iterator wasn't meant to illustrate something I'd actually 
recommend you do (since the iterator protocol isn't really a great fit for 
this), but rather show the flexibility of the approach.


That said, in the non-blocking IO approach, it does still "exhaust the elements 
available", it just so happens that more elements become available over time.

A better example might be an iterator that allows you to push more elements 
onto the end. Or one that allows you to move the iterator backwards in time, so 
to speak, so it re-emits the same elements it emitted previously (e.g. give it 
a .prev() method). Or merely one that allows you to push back a single element 
(that gets yielded on the next call to .next()), which is actually fairly 
common in character iterators in other languages (gives you the equivalent of a 
.peek() operation, by consuming a value and then immediately pushing it back if 
you don't want it). None of these examples are possible with approaches #1 or 
#2, only with approach #3. And all of them are still compatible with the basic 
for loop.

-Kevin

On Aug 4, 2013, at 5:53 PM, Jason Fager <jfa...@gmail.com> wrote:

> Not confused, I understand your point about for loops not caring about what 
> happens with additional 'next' calls.  
> 
> But as a user of an iterator, I have the expectation that a for loop exhausts 
> the elements available from an iterator unless I return early or break.  
> Designing an iterator that intentionally sidesteps that expectation seems 
> like a bad idea.  Principle of least astonishment, etc.  
> 
> And yes, of course, iterators already return Option.  But they return Option 
> to satisfy the *iterator protocol*, not the use cases you described.  I'm 
> talking about adding another layer of Option  
> 
> So say I want to implement non-blocking io that plays nice w/ the iterator 
> protocol.  Using an implementation taking advantage of option#3 look 
> something like:
> 
> loop {
>     for i in iter {
>         foo(i);
>     }
>     // other stuff
>     if(noReallyImDone) {
>         break;
>     }
> }
> 
> 
> While hoisting the Iterator's type into another layer of Option looks like:
> 
> for i in iter {
>     match i {
>         Some(i) => foo(i);
>         None => {
>            //other stuff
>         }
>     }
> }
> 
> The outer Option allows the iterator protocol to work as expected, i.e. 
> iterate over all available elements in the iterator, and the inner implements 
> the non-blocking protocol you're looking for.
> 
> 
> 
> 
> 
> 
> 
> On Sun, Aug 4, 2013 at 8:12 PM, Kevin Ballard <ke...@sb.org> wrote:
> I suspect you're confused about something.
> 
> The for loop doesn't care in the slightest what an iterator does after it's 
> returned None. All 3 approaches work equally well as far as the for loop is 
> concerned.
> 
> And I'm not sure what you mean by "design Iterators that would take advantage 
> of the undefined behavior". If an iterator defines how it behaves after 
> returning None, then it's defined behavior. If you're using iterators and you 
> know your entire iterator pipeline, then you can use whatever behavior the 
> iterators involved define. You only need to restrict yourself to what the 
> iterator protocol defines if you don't know what iterator you're consuming.
> 
> I also don't understand your suggestion about using Option. Iterators already 
> return an Option.
> 
> -Kevin
> 
> On Aug 4, 2013, at 4:45 PM, Jason Fager <jfa...@gmail.com> wrote:
> 
>> Of course.  I think I'm reacting more to the possible use cases you 
>> described for option 3 than the actual meaning of it.  It seems like a 
>> really bad idea to design iterators that would take advantage of the 
>> undefined behavior, not least b/c it's unexpected and not supported by the 
>> most pervasive client of the iterator protocol (the for loop, in the sense 
>> of actually iterating through all elements available through the iterator), 
>> but that doesn't mean option 3 is in itself the wrong thing to do.  
>> 
>> But addressing the use cases you mentioned, if you need that kind of 
>> functionality, shouldn't you be hoisting the iterator's return type into its 
>> own Option?  i.e., an Iterator<T> should be become an Iterator<Option<T>>?
>> 
>> 
>> On Sun, Aug 4, 2013 at 6:23 PM, Kevin Ballard <ke...@sb.org> wrote:
>> The new for loop works with all 3 of these. Your output shows that it 
>> queried .next() twice, and got a single Some(1) result back. Once it gets 
>> None, it never calls .next() again, whereas the 3 behaviors stated 
>> previously are exclusively concerned with what happens if you call .next() 
>> again after it has already returned None.
>> 
>> -Kevin
>> 
>> P.S. I changed the email address that I'm subscribed to this list with, so 
>> apologies for any potential confusion.
>> 
>> On Aug 4, 2013, at 6:18 AM, Jason Fager <jfa...@gmail.com> wrote:
>> 
>>> The new for loop already assumes #2, right?
>>> 
>>> let x = [1,2,3];
>>> let mut it = x.iter().peek_(|x| printfln!(*x)).scan(true, |st, &x| { if *st 
>>> { *st = false; Some(x) } else { None } });
>>> 
>>> for i in it {
>>>     printfln!("from for loop: %?", i);
>>> }
>>> 
>>> 
>>> Which produces:
>>> 
>>> &1
>>> from for loop: 1
>>> &2
>>> 
>>> 
>>> 
>>> On Sun, Aug 4, 2013 at 1:49 AM, Daniel Micay <danielmi...@gmail.com> wrote:
>>> On Sat, Aug 3, 2013 at 9:18 PM, Kevin Ballard <kball...@gmail.com> wrote:
>>> > The iterator protocol, as I'm sure you're aware, is the protocol that
>>> > defines the behavior of the Iterator trait. Unfortunately, at the moment 
>>> > the
>>> > trait does not document what happens if you call `.next()` on an iterator
>>> > after a previous call has returned `None`. According to Daniel Micay, the
>>> > intention was that the iterator would return `None` forever. However, this
>>> > is not guaranteed by at least one iterator adaptor (Scan), nor is it
>>> > documented. Furthermore, no thought has been given to what happens if an
>>> > iterator pipeline has side-effects. A trivial example of the side-effect
>>> > problem is this:
>>> >
>>> >     let x = [1,2,3];
>>> >     let mut it = x.iter().peek_(|x| printfln!(*x)).scan(true, |st, &x| { 
>>> > if
>>> > *st { *st = false; Some(x) } else { None } });
>>> >     (it.next(), it.next(), it.next())
>>> >
>>> > This results in `(Some(1), None, None)` but it prints out
>>> >
>>> >     &1
>>> >     &2
>>> >     &3
>>> >
>>> > After giving it some thought, I came up with 3 possible definitions for
>>> > behavior in this case:
>>> >
>>> > 1. Once `.next()` has returned `None`, it will return None forever.
>>> > Furthermore, calls to `.next()` after `None` has been returned will not
>>> > trigger side-effects in the iterator pipeline. This means that once
>>> > `.next()` has returned `None`, it becomes idempotent.
>>> >
>>> >    This is most likely going to be what people will assume the iterator
>>> > protocol defines, in the absence of any explicit statement. What's more,
>>> > they probably won't even consider the side-effects case.
>>> >
>>> >    Implementing this will require care be given to every single iterator 
>>> > and
>>> > iterator adaptor. Most iterators will probably behave like this (unless 
>>> > they
>>> > use a user-supplied closure), but a number of different iterator adaptors
>>> > will need to track this explicitly with a bool flag. It's likely that
>>> > user-supplied iterator adaptors will forget to enforce this and will
>>> > therefore behave subtlely wrong in the face of side-effects.
>>> >
>>> > 2. Once `.next()` has returned `None`, it will return `None` forever. No
>>> > statement is made regarding side-effects.
>>> >
>>> >    This is what most people will think they're assuming, if asked. The
>>> > danger here is that they will almost certainly actaully assume #1, and 
>>> > thus
>>> > may write subtlely incorrect code if they're given an iterator pipeline 
>>> > with
>>> > side-effects.
>>> >
>>> >    This is easier to implement than #1. Most iterators will do this 
>>> > already.
>>> > Iterator adaptors will generally only have to take care when they use a
>>> > user-supplied closure (e.g. `scan()`).
>>> >
>>> > 3. The behavior of `.next()` after `None` has been returned is left
>>> > undefined. Individual iterators may choose to define behavior here however
>>> > they see fit.
>>> >
>>> >    This is what we actually have implemented in the standard libraries
>>> > today. It's also by far the easiest to implement, as iterators and 
>>> > adaptors
>>> > may simply choose to not define any particular behavior.
>>> >
>>> >    This is made more attractive by the fact that some iterators may choose
>>> > to actually define behavior that's different than "return `None` forever".
>>> > For example, a user may write an iterator that wraps non-blocking I/O,
>>> > returning `None` when there's no data available and returning `Some(x)`
>>> > again once more data comes in. Or if you don't like that example, they 
>>> > could
>>> > write an iterator that may be updated to contain more data after being
>>> > exhausted.
>>> >
>>> >    The downside is that users may assume #1 when #3 holds, which is why 
>>> > this
>>> > needs to be documented properly.
>>> >
>>> > ---
>>> >
>>> > I believe that #3 is the right behavior to define. This gives the most
>>> > flexibility to individual iterators, and we can provide an iterator 
>>> > adaptor
>>> > that gives any iterator the behavior defined by #1 (see Fuse in PR #8276).
>>> >
>>> > I am not strongly opposed to defining #1 instead, but I am mildly worried
>>> > about the likelihood that users will implement iterators that don't have
>>> > this guarantee, as this is not something that can be statically checked by
>>> > the compiler. What's more, if an iterator breaks this guarantee, the 
>>> > problem
>>> > will show up in the code that calls it, rather than in the iterator 
>>> > itself,
>>> > which may make debugging harder.
>>> >
>>> > I am strongly opposed to #2. If we guarantee that an iterator that returns
>>> > `None` once will return `None` forever, users will assume that this means
>>> > that `.next()` becomes idempotent (with regards to side-effects) after
>>> > `None` is returned, but this will not be true. Furthermore, users will
>>> > probably not even realize they've made a bad assumption, as most users 
>>> > will
>>> > not be thinking about side-effects when consuming iterators.
>>> >
>>> > I've already gone ahead and implemented #3 in pull request #8276.
>>> >
>>> > -Kevin
>>> 
>>> I'm leaning towards #2 or #3, mostly because adaptors *not*
>>> dispatching to the underlying next() implementation are too complex.
>>> 
>>> I took a look at the behaviour of Python's iterators in these corner
>>> cases as good baseline for comparison:
>>> 
>>> ~~~
>>> >>> def peek(it):
>>> ...     for x in it:
>>> ...         print(x)
>>> ...         yield x
>>> ...
>>> >>> xs = [1, 2, 3]
>>> >>> ys = [1, 2, 3, 4, 5]
>>> ~~~
>>> 
>>> You can tell their `zip` function short-circuits, and simply
>>> dispatches to the underlying implementations. Rust's `zip` is similar
>>> but doesn't currently short-circuit (it might as well).
>>> 
>>> ~~~
>>> >>> it = zip(peek(ys), xs)
>>> >>> next(it)
>>> 1
>>> (1, 1)
>>> >>> next(it)
>>> 2
>>> (2, 2)
>>> >>> next(it)
>>> 3
>>> (3, 3)
>>> >>> next(it)
>>> 4
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>> StopIteration
>>> >>> next(it)
>>> 5
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>> StopIteration
>>> >>> next(it)
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>> StopIteration
>>> >>> it = zip(xs, peek(ys))
>>> >>> next(it)
>>> 1
>>> (1, 1)
>>> >>> next(it)
>>> 2
>>> (2, 2)
>>> >>> next(it)
>>> 3
>>> (3, 3)
>>> >>> next(it)
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>> StopIteration
>>> ~~~
>>> 
>>> It also makes no attempt to store whether it has stopped internally,
>>> and will start yielding again if each iterator yields an element when
>>> zip asks for them one by one (keeping in mind that it short-circuits).
>>> 
>>> Most other language keep `hasNext` and `next` separate (D and Scala,
>>> among others) leading to more corner cases, and they do not seem to
>>> clearly define the semantics for side effects down the pipeline.
>>> 
>>> http://dlang.org/phobos/std_range.html
>>> http://www.scala-lang.org/api/current/scala/collection/Iterator.html
>>> _______________________________________________
>>> Rust-dev mailing list
>>> Rust-dev@mozilla.org
>>> https://mail.mozilla.org/listinfo/rust-dev
>>> 
>>> _______________________________________________
>>> Rust-dev mailing list
>>> Rust-dev@mozilla.org
>>> https://mail.mozilla.org/listinfo/rust-dev
>> 
>> 
> 
>

_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Proposal for clarifying the iterator protocol

Reply via email to