[GitHub] spark pull request: SPARK-1729. Make Flume pull data from source, ...

pwendell Mon, 21 Jul 2014 23:45:44 -0700

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/807#issuecomment-49703872
  
    Hey folks - I just took a light pass on this.
    
    One thing I'm a bit confused about - the sink here is not really specific 
to Spark at all (there is no dependency on Spark). The classes used are 
entirely general avro classes that other applications could use easily. So why 
is this being contributed to Spark instead of being put inside of Flume as a 
general purpose sink that supports polling from external sources? I could 
imagine that other systems might also want to integrate with flume via a 
pull/polling based model than via a push model.
    
    Was there any previous consideration to put this directly in the Flume 
project?
    
    Adding this in Spark adds some nontrivial steps to the Spark build (such as 
an avro compiler). So naturally a question is whether it belongs in Spark or 
Flume. I don't think we can easily change this later since we expose the 
`SparkFlumePollingEvent` type directly to users, and that type is in the spark 
package. Actually one related thing, can we not convert these to 
`SparkFlumeEvents`'s so that the types do not differ depending on which type of 
flume integration they are using? This could be confusing for users to have to 
deal with a different type.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1729. Make Flume pull data from source, ...

Reply via email to