Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/807#issuecomment-49703872 Hey folks - I just took a light pass on this. One thing I'm a bit confused about - the sink here is not really specific to Spark at all (there is no dependency on Spark). The classes used are entirely general avro classes that other applications could use easily. So why is this being contributed to Spark instead of being put inside of Flume as a general purpose sink that supports polling from external sources? I could imagine that other systems might also want to integrate with flume via a pull/polling based model than via a push model. Was there any previous consideration to put this directly in the Flume project? Adding this in Spark adds some nontrivial steps to the Spark build (such as an avro compiler). So naturally a question is whether it belongs in Spark or Flume. I don't think we can easily change this later since we expose the `SparkFlumePollingEvent` type directly to users, and that type is in the spark package. Actually one related thing, can we not convert these to `SparkFlumeEvents`'s so that the types do not differ depending on which type of flume integration they are using? This could be confusing for users to have to deal with a different type.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---