Github user stefanobaghino commented on the pull request:
https://github.com/apache/flink/pull/1704#issuecomment-199837531
A quick status update: unfortunately I have had no time to work on this in
the past week, I plan to get back to the fixes on Thursday or Friday.
In the meanwhile, I have a small doubt on the usage of `Iterator`: they are
indeed very good to allow the user to have access both to the whole
`DataSet`/`DataStream` or accessing it one item at a time; however, they are
not particularly useful when used with the case-style partial functions: they
offer an edge to de-structure a single item like a tuple or a collection like
`Seq` (e.g. using the `_ +: rest` operator to only get the item after the
first).
Are we sure we want to keep the `Iterator`? Is there an advantage in having
an `Iterator` with this extension? I see to possible solutions:
1. the easy one: having two methods, one materializing the `Iterator` into
a collection and another one accessing the items one a time: the only issue
with this would be the need to have to methods with distinct names (otherwise
we would be back to square one); this means the user can use the case-style
functions to destructure the collection or each item separately; otherwise we
can
2. adopt a slightly more sophisticated solution: wrap the `Iterator` in a
`Stream`, which is lazy but also fully destructurable in case-style functions
(e.g.: using the `#::` operator). This would require some work as the
`Iterator` is stateful with regards of the traversal while the `Stream` is not
and we can't just use a naive solution or the semantic difference could lead to
some nasty bugs in user code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---