Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/807#issuecomment-46398801 @tdas Use the following configuration file to start flume: agent.sources = seqGenSrc agent.channels = memoryChannel agent.sinks = spark # For each one of the sources, the type is defined agent.sources.seqGenSrc.type = seq # The channel can be defined as follows. agent.sources.seqGenSrc.channels = memoryChannel # Each sink's type must be defined agent.sinks.spark.type = org.apache.spark.flume.sink.SparkSink agent.sinks.spark.hostname = 0.0.0.0 agent.sinks.spark.port = 9999 #Specify the channel the sink should use agent.sinks.spark.channel = memoryChannel # Each channel's type is defined. agent.channels.memoryChannel.type = memory # Other config values specific to each type of channel(sink or source) # can be defined as well # In this case, it specifies the capacity of the memory channel agent.channels.memoryChannel.capacity = 100 You can start the flume agent by downloading the latest release binary tarball (or checking it out and building it), and running this command from the top level: bin/flume-ng agent -n agent -f <path/to/config/file> -c conf (Make sure you drop the spark-streaming-flume-sink_2.10-1.0.0-SNAPSHOT.jar from the target directory (mvn build generates this one, not sure how to get this with sbt) and scala-library.jar in the lib directory under the flume top-level directory. On the spark side, you can start a streaming application by mimicking the FlumePollingReceiverSuite class (FlumeUtils has a couple methods to start the receiver).
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---