Github user nkronenfeld commented on the pull request:

    https://github.com/apache/spark/pull/5565#issuecomment-94351472
  
    No, we do that at the moment.
    
    But doing it that way results in a few rather ugly constructs in the 
application code that can be rather painful, as soon as one starts passing data 
constructs around.  As soon as one starts passing the collection structures 
between modules, say, for instance, between stages in a pipeline, one instantly 
needs to duplicate the entire pipeline for batch and streaming cases.
    
    It isn't just one place where one has to do this replacement - it's every 
little pipeline operation, for every algorithm, 90% of which are using just the 
most basic RDD and DStream functions should be easily consolidated.
    
    I'd also note that, where there is an interface change, it is there because 
the original methods in RDD and DStream were declared inconsistently.  Unless 
there is a good reason to keep them inconsistent (which so far I don't see in 
any of these three cases), I would suggest that isn't a good thing to begin 
with - just in terms of consistency and usability of the library, where they 
can be the same, they should be.  It reduces the learning curve, and removes 
some esoteric, hard-to-track-down gotchas that are bound occasionally to bite 
people newly switching from one case to the other.
    
    On a final note, if this is the intended use of dstream, why have the map, 
flatMap, reduceByKey, etc functions on it at all?  It seems clear it was 
intended to be used this way (Hm, that reminds me of a fourth small interface 
change I'll add above, but as you'll see, it's very, very minor), so why not 
make sure the use is the same?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to