Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-94351472 No, we do that at the moment. But doing it that way results in a few rather ugly constructs in the application code that can be rather painful, as soon as one starts passing data constructs around. As soon as one starts passing the collection structures between modules, say, for instance, between stages in a pipeline, one instantly needs to duplicate the entire pipeline for batch and streaming cases. It isn't just one place where one has to do this replacement - it's every little pipeline operation, for every algorithm, 90% of which are using just the most basic RDD and DStream functions should be easily consolidated. I'd also note that, where there is an interface change, it is there because the original methods in RDD and DStream were declared inconsistently. Unless there is a good reason to keep them inconsistent (which so far I don't see in any of these three cases), I would suggest that isn't a good thing to begin with - just in terms of consistency and usability of the library, where they can be the same, they should be. It reduces the learning curve, and removes some esoteric, hard-to-track-down gotchas that are bound occasionally to bite people newly switching from one case to the other. On a final note, if this is the intended use of dstream, why have the map, flatMap, reduceByKey, etc functions on it at all? It seems clear it was intended to be used this way (Hm, that reminds me of a fourth small interface change I'll add above, but as you'll see, it's very, very minor), so why not make sure the use is the same?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org