[
https://issues.apache.org/jira/browse/FLUME-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078955#comment-13078955
]
Jonathan Hsieh commented on FLUME-720:
--------------------------------------
Here's a straightforward ways to reproduce this problem:
Create data that is supposed to be seq file:
bin/flume sink 'collectorSink("file:///tmp/bz","bzip",5000, seqfile("bzip2"))'
...
Type stuff and write some events.
Read file that is supposed to be seq file:
bin/flume source 'seqfile("/tmp/bz/bzipxxxxxx")'
The latter command will fail if the file is not a seq file. If you look at the
generated files you could see if it is a avrojson text file, or look for magic
bytes that say SEQ (sequence file) and java classnames for the selected codec.
> CollectorSink doesn't pass the new format parameter
> ----------------------------------------------------
>
> Key: FLUME-720
> URL: https://issues.apache.org/jira/browse/FLUME-720
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v0.9.5
> Reporter: Eran Kutner
>
> CollectorSink doesn't properly pass the format parameter down to the
> EscapedCustomDfs sink.
> For example, this is working fine:
> collectorSource(54001) | escapedCustomDfs("hdfs://hadoop1-m1:8020/", "test",
> seqfile("SnappyCodec") );
> However, this is using the codec defined in flume-conf.xml
> collectorSource(54001) | collectorSink("hdfs://hadoop1-m1:8020/", "test-",
> 600000, seqfile("SnappyCodec") );
> By itself this bug would not be very serious, however the problem is that
> escapedCustomDfs/customDfs are using the same compressor, and they apply it
> on the whole file, in addition to the compression done natively by the
> sequence file - this makes the sequence file double compressed and invalid.
> As far as I can tell, the only way to get a valid compressed sequence file is
> by setting flume.collector.dfs.compress.codec to "None" in flume-site.xml and
> use the format parameter to specify which compression to use for the sequence
> file, except that doesn't work...
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira