[
https://issues.apache.org/jira/browse/FLUME-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Hsieh updated FLUME-506:
---------------------------------
Resolution: Duplicate
Assignee: Jonathan Hsieh
Status: Resolved (was: Patch Available)
This version was never completed -- it needed unit tests. This functionality
was later added by FLUME-635.
> Sink that writes output to a sequenceFile in hdfs
> -------------------------------------------------
>
> Key: FLUME-506
> URL: https://issues.apache.org/jira/browse/FLUME-506
> Project: Flume
> Issue Type: New Feature
> Components: Sinks+Sources
> Affects Versions: v0.9.1u1
> Reporter: Disabled imported user
> Assignee: Jonathan Hsieh
> Priority: Trivial
> Labels: sink
> Attachments: FLUME-506.patch, collector_sink_plugins.zip,
> collector_sink_plugins.zip, sink_plugins.zip
>
>
> I have written a code to be able to write to hdfs in this style:
> roll(1000) [ escapedCustomDfs("hdfs://namenode/flume/file-%{rolltag}") ]
> But writing each event body as value in a SequenceFile (and NullWritable as
> key).
> As a possible scenario, there would be a client sending flume events to a
> rpcSource. The body of each event is a thrift object. I have a sink that
> opens/closes a SequenceFile every X minutes, writes the body of each event (a
> thrift object converted to bytes) as BytesWritable as the Value of the
> SequenceFile and writes the key as NullWritable.
> The purpose of this is that these SequenceFiles can be used as input in a
> MapReduce Job. Each BytesWritable value can be easily deserialized into a
> Thrift object (the initially generated by the client and set to the event
> body) in the map phase.
> The plugins are (placed in the plugins folder in the zip):
> com.cloudera.flume.handlers.hdfs.EscapedThriftSeqfileDfsSink
> com.cloudera.flume.handlers.hdfs.ThriftSeqfileDfsSink
> An example of node configuration:
> node.name : tSource(38575) | roll(60000) {
> escapedThriftSeqfileDfsSink("hdfs://host.local/user/flume/file-%{rolltag}")}
> The zip contains also a couple of test:
> 1.-Php client that generates thrift objects and sends them via rpc to the
> source (The example objects are instances of TObject class, generated with
> thrift). This can be found in /test/php_rpc_writer.
> 2.-Java app that reads the sequenceFile generated by flume. The file has to
> be manually downloaded from hdfs and we must specify the path where we place
> it in the app. This app has the same TObject class than the php so it can
> deserialize the BytesWritable from the SequenceFile to the proper thrift
> object.
> *The examples have hardcoded paths which should be properly set.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira