Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2083#issuecomment-54124826 @nchammas to be clear, the question isn't about ordering really. The issue is that result of the same RDD in this example changes when it is reevaluated. It's more like having a ResultSet change under you while iterating. @mateiz explained that this is working as intended. I think some straightforward uses of `zipWithIndex` may surprise people then. For example I add an index to each datum, do some computation, and later go back to look up a datum by index on the very same RDD. I am not likely to get back the same datum -- unless it has been persisted. Something to keep in mind to see if it actually bites people regularly.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org