[GitHub] spark pull request: SPARK-1729. Make Flume pull data from source, ...

harishreedharan Tue, 17 Jun 2014 22:58:33 -0700

Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/807#issuecomment-46398801
  
    @tdas Use the following configuration file to start flume:
    
    agent.sources = seqGenSrc
    agent.channels = memoryChannel
    agent.sinks = spark
    
    # For each one of the sources, the type is defined
    agent.sources.seqGenSrc.type = seq
    
    # The channel can be defined as follows.
    agent.sources.seqGenSrc.channels = memoryChannel
    
    # Each sink's type must be defined
    agent.sinks.spark.type = org.apache.spark.flume.sink.SparkSink
    agent.sinks.spark.hostname = 0.0.0.0
    agent.sinks.spark.port = 9999
    
    #Specify the channel the sink should use
    agent.sinks.spark.channel = memoryChannel
    
    # Each channel's type is defined.
    agent.channels.memoryChannel.type = memory
    
    # Other config values specific to each type of channel(sink or source)
    # can be defined as well
    # In this case, it specifies the capacity of the memory channel
    agent.channels.memoryChannel.capacity = 100
    
    You can start the flume agent by downloading the latest release binary 
tarball (or checking it out and building it), and running this command from the 
top level:
    bin/flume-ng agent -n agent -f <path/to/config/file> -c conf
    
    (Make sure you drop the spark-streaming-flume-sink_2.10-1.0.0-SNAPSHOT.jar 
from the target directory (mvn build generates this one, not sure how to get 
this with sbt) and scala-library.jar in the lib directory under the flume 
top-level directory.
    
    On the spark side, you can start a streaming application by mimicking the 
FlumePollingReceiverSuite class (FlumeUtils has a couple methods to start the 
receiver).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1729. Make Flume pull data from source, ...

Reply via email to