[ 
https://issues.apache.org/jira/browse/FLUME-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated FLUME-506:
---------------------------------

    Resolution: Duplicate
      Assignee: Jonathan Hsieh
        Status: Resolved  (was: Patch Available)

This version was never completed -- it needed unit tests.  This functionality 
was later added by FLUME-635.

> Sink that writes output to a sequenceFile in hdfs
> -------------------------------------------------
>
>                 Key: FLUME-506
>                 URL: https://issues.apache.org/jira/browse/FLUME-506
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>    Affects Versions: v0.9.1u1
>            Reporter: Disabled imported user
>            Assignee: Jonathan Hsieh
>            Priority: Trivial
>              Labels: sink
>         Attachments: FLUME-506.patch, collector_sink_plugins.zip, 
> collector_sink_plugins.zip, sink_plugins.zip
>
>
> I have written a code to be able to write to hdfs in this style:
> roll(1000) [ escapedCustomDfs("hdfs://namenode/flume/file-%{rolltag}") ]
> But writing each event body as value in a SequenceFile (and NullWritable as 
> key).
> As a possible scenario, there would be a client sending flume events to a 
> rpcSource. The body of each event is a thrift object. I have a sink that 
> opens/closes a SequenceFile every X minutes, writes the body of each event (a 
> thrift object converted to bytes) as BytesWritable as the Value of the 
> SequenceFile and writes the key as NullWritable.
> The purpose of this is that these SequenceFiles can be used as input in a 
> MapReduce Job. Each BytesWritable value can be easily deserialized into a 
> Thrift object (the initially generated by the client and set to the event 
> body) in the map phase.
> The plugins are (placed in the plugins folder in the zip):
> com.cloudera.flume.handlers.hdfs.EscapedThriftSeqfileDfsSink
> com.cloudera.flume.handlers.hdfs.ThriftSeqfileDfsSink
> An example of node configuration:
> node.name : tSource(38575) | roll(60000) { 
> escapedThriftSeqfileDfsSink("hdfs://host.local/user/flume/file-%{rolltag}")}
> The zip contains also a couple of test:
> 1.-Php client that generates thrift objects and sends them via rpc to the 
> source (The example objects are instances of TObject class, generated with 
> thrift). This can be found in /test/php_rpc_writer.
> 2.-Java app that reads the sequenceFile generated by flume. The file has to 
> be manually downloaded from hdfs and we must specify the path where we place 
> it in the app. This app has the same TObject class than the php so it can 
> deserialize the BytesWritable from the SequenceFile to the proper thrift 
> object.
> *The examples have hardcoded paths which should be properly set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to