Re: [rust-dev] Proposal for clarifying the iterator protocol

Kevin Ballard Sun, 04 Aug 2013 15:24:10 -0700

The new for loop works with all 3 of these. Your output shows that it queried 
.next() twice, and got a single Some(1) result back. Once it gets None, it 
never calls .next() again, whereas the 3 behaviors stated previously are 
exclusively concerned with what happens if you call .next() again after it has 
already returned None.


-Kevin

P.S. I changed the email address that I'm subscribed to this list with, so 
apologies for any potential confusion.

On Aug 4, 2013, at 6:18 AM, Jason Fager <jfa...@gmail.com> wrote:

> The new for loop already assumes #2, right?
> 
> let x = [1,2,3];
> let mut it = x.iter().peek_(|x| printfln!(*x)).scan(true, |st, &x| { if *st { 
> *st = false; Some(x) } else { None } });
> 
> for i in it {
>     printfln!("from for loop: %?", i);
> }
> 
> 
> Which produces:
> 
> &1
> from for loop: 1
> &2
> 
> 
> 
> On Sun, Aug 4, 2013 at 1:49 AM, Daniel Micay <danielmi...@gmail.com> wrote:
> On Sat, Aug 3, 2013 at 9:18 PM, Kevin Ballard <kball...@gmail.com> wrote:
> > The iterator protocol, as I'm sure you're aware, is the protocol that
> > defines the behavior of the Iterator trait. Unfortunately, at the moment the
> > trait does not document what happens if you call `.next()` on an iterator
> > after a previous call has returned `None`. According to Daniel Micay, the
> > intention was that the iterator would return `None` forever. However, this
> > is not guaranteed by at least one iterator adaptor (Scan), nor is it
> > documented. Furthermore, no thought has been given to what happens if an
> > iterator pipeline has side-effects. A trivial example of the side-effect
> > problem is this:
> >
> >     let x = [1,2,3];
> >     let mut it = x.iter().peek_(|x| printfln!(*x)).scan(true, |st, &x| { if
> > *st { *st = false; Some(x) } else { None } });
> >     (it.next(), it.next(), it.next())
> >
> > This results in `(Some(1), None, None)` but it prints out
> >
> >     &1
> >     &2
> >     &3
> >
> > After giving it some thought, I came up with 3 possible definitions for
> > behavior in this case:
> >
> > 1. Once `.next()` has returned `None`, it will return None forever.
> > Furthermore, calls to `.next()` after `None` has been returned will not
> > trigger side-effects in the iterator pipeline. This means that once
> > `.next()` has returned `None`, it becomes idempotent.
> >
> >    This is most likely going to be what people will assume the iterator
> > protocol defines, in the absence of any explicit statement. What's more,
> > they probably won't even consider the side-effects case.
> >
> >    Implementing this will require care be given to every single iterator and
> > iterator adaptor. Most iterators will probably behave like this (unless they
> > use a user-supplied closure), but a number of different iterator adaptors
> > will need to track this explicitly with a bool flag. It's likely that
> > user-supplied iterator adaptors will forget to enforce this and will
> > therefore behave subtlely wrong in the face of side-effects.
> >
> > 2. Once `.next()` has returned `None`, it will return `None` forever. No
> > statement is made regarding side-effects.
> >
> >    This is what most people will think they're assuming, if asked. The
> > danger here is that they will almost certainly actaully assume #1, and thus
> > may write subtlely incorrect code if they're given an iterator pipeline with
> > side-effects.
> >
> >    This is easier to implement than #1. Most iterators will do this already.
> > Iterator adaptors will generally only have to take care when they use a
> > user-supplied closure (e.g. `scan()`).
> >
> > 3. The behavior of `.next()` after `None` has been returned is left
> > undefined. Individual iterators may choose to define behavior here however
> > they see fit.
> >
> >    This is what we actually have implemented in the standard libraries
> > today. It's also by far the easiest to implement, as iterators and adaptors
> > may simply choose to not define any particular behavior.
> >
> >    This is made more attractive by the fact that some iterators may choose
> > to actually define behavior that's different than "return `None` forever".
> > For example, a user may write an iterator that wraps non-blocking I/O,
> > returning `None` when there's no data available and returning `Some(x)`
> > again once more data comes in. Or if you don't like that example, they could
> > write an iterator that may be updated to contain more data after being
> > exhausted.
> >
> >    The downside is that users may assume #1 when #3 holds, which is why this
> > needs to be documented properly.
> >
> > ---
> >
> > I believe that #3 is the right behavior to define. This gives the most
> > flexibility to individual iterators, and we can provide an iterator adaptor
> > that gives any iterator the behavior defined by #1 (see Fuse in PR #8276).
> >
> > I am not strongly opposed to defining #1 instead, but I am mildly worried
> > about the likelihood that users will implement iterators that don't have
> > this guarantee, as this is not something that can be statically checked by
> > the compiler. What's more, if an iterator breaks this guarantee, the problem
> > will show up in the code that calls it, rather than in the iterator itself,
> > which may make debugging harder.
> >
> > I am strongly opposed to #2. If we guarantee that an iterator that returns
> > `None` once will return `None` forever, users will assume that this means
> > that `.next()` becomes idempotent (with regards to side-effects) after
> > `None` is returned, but this will not be true. Furthermore, users will
> > probably not even realize they've made a bad assumption, as most users will
> > not be thinking about side-effects when consuming iterators.
> >
> > I've already gone ahead and implemented #3 in pull request #8276.
> >
> > -Kevin
> 
> I'm leaning towards #2 or #3, mostly because adaptors *not*
> dispatching to the underlying next() implementation are too complex.
> 
> I took a look at the behaviour of Python's iterators in these corner
> cases as good baseline for comparison:
> 
> ~~~
> >>> def peek(it):
> ...     for x in it:
> ...         print(x)
> ...         yield x
> ...
> >>> xs = [1, 2, 3]
> >>> ys = [1, 2, 3, 4, 5]
> ~~~
> 
> You can tell their `zip` function short-circuits, and simply
> dispatches to the underlying implementations. Rust's `zip` is similar
> but doesn't currently short-circuit (it might as well).
> 
> ~~~
> >>> it = zip(peek(ys), xs)
> >>> next(it)
> 1
> (1, 1)
> >>> next(it)
> 2
> (2, 2)
> >>> next(it)
> 3
> (3, 3)
> >>> next(it)
> 4
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> StopIteration
> >>> next(it)
> 5
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> StopIteration
> >>> next(it)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> StopIteration
> >>> it = zip(xs, peek(ys))
> >>> next(it)
> 1
> (1, 1)
> >>> next(it)
> 2
> (2, 2)
> >>> next(it)
> 3
> (3, 3)
> >>> next(it)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> StopIteration
> ~~~
> 
> It also makes no attempt to store whether it has stopped internally,
> and will start yielding again if each iterator yields an element when
> zip asks for them one by one (keeping in mind that it short-circuits).
> 
> Most other language keep `hasNext` and `next` separate (D and Scala,
> among others) leading to more corner cases, and they do not seem to
> clearly define the semantics for side effects down the pipeline.
> 
> http://dlang.org/phobos/std_range.html
> http://www.scala-lang.org/api/current/scala/collection/Iterator.html
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
> 
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev

_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Proposal for clarifying the iterator protocol

Reply via email to