[ 
https://issues.apache.org/jira/browse/FLUME-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078822#comment-13078822
 ] 

Jonathan Hsieh commented on FLUME-720:
--------------------------------------

You are correct about the double compression if the compression setting is set 
in the xml file as well as an argument to the seqfile format.  I've confirmed 
that the collectorSink not taking the format as an argument is problem.  For 
now here is a work around:

Replace the collectorSink with (escapedFormatDfs is the same as 
escapedCustomDfs, but a better named):

collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/", 
"test-%{rolltag}", seqfile("SnappyCodec")) }

> CollectorSink doesn't pass the new format parameter 
> ----------------------------------------------------
>
>                 Key: FLUME-720
>                 URL: https://issues.apache.org/jira/browse/FLUME-720
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.5
>            Reporter: Eran Kutner
>
> CollectorSink doesn't properly pass the format parameter down to the 
> EscapedCustomDfs sink.
> For example, this is working fine:
> collectorSource(54001) | escapedCustomDfs("hdfs://hadoop1-m1:8020/", "test", 
> seqfile("SnappyCodec") );
> However, this is using the codec defined in flume-conf.xml
> collectorSource(54001) | collectorSink("hdfs://hadoop1-m1:8020/", "test-", 
> 600000, seqfile("SnappyCodec") );
> By itself this bug would not be very serious, however the problem is that 
> escapedCustomDfs/customDfs are using the same compressor, and they apply it 
> on the whole file, in addition to the compression done natively by the 
> sequence file - this makes the sequence file double compressed and invalid.
> As far as I can tell, the only way to get a valid compressed sequence file is 
> by setting flume.collector.dfs.compress.codec to "None" in flume-site.xml and 
> use the format parameter to specify which compression to use for the sequence 
> file, except that doesn't work...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to