[jira] [Commented] (FLUME-1200) HDFSEventSink causes *.snappy file to be created in HDFS even when snappy isn't used (due to missing lib)

Leslin (Hong Xiang Lin) (JIRA) Sun, 10 Jun 2012 21:28:51 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292638#comment-13292638
 ]


Leslin (Hong Xiang Lin) commented on FLUME-1200:
------------------------------------------------

Below are from today's investigation:
1. There will be exception when snappy is not available like below. 
12/06/11 12:06:19 ERROR hdfs.HDFSEventSink: process failed
java.lang.RuntimeException: native snappy library not available
        at 
org.apache.hadoop.io.compress.SnappyCodec.createCompressor(SnappyCodec.java:135)
        at 
org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:84)
        at 
org.apache.flume.sink.hdfs.HDFSCompressedDataStream.open(HDFSCompressedDataStream.java:66)
        at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:189)
        at 
org.apache.flume.sink.hdfs.BucketWriter.access$0(BucketWriter.java:160)
        at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:150)
        at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:1)
        at 
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:120)
        at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:147)
        at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:291)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:671)
        at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:1)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
2. The agent will not fail and try to deliver event and write many snappy 
extended files into hdfs again and again.  All files are 0 size as exception 
above. 

-rw-r--r--   1 leslin supergroup          0 2012-06-11 12:06 
/tmp/data/flume/FlumeData.1339387579269.snappy.tmp
-rw-r--r--   1 leslin supergroup          0 2012-06-11 12:06 
/tmp/data/flume/FlumeData.1339387579270.snappy.tmp
-rw-r--r--   1 leslin supergroup          0 2012-06-11 12:06 
/tmp/data/flume/FlumeData.1339387579271.snappy.tmp
-rw-r--r--   1 leslin supergroup          0 2012-06-11 12:06 
/tmp/data/flume/FlumeData.1339387579272.snappy.tmp
-rw-r--r--   1 leslin supergroup          0 2012-06-11 12:06 
/tmp/data/flume/FlumeData.1339387579273.snappy.tmp
-rw-r--r--   1 leslin supergroup          0 2012-06-11 12:06 
/tmp/data/flume/FlumeData.1339387579274.snappy.tmp
-rw-r--r--   1 leslin supergroup          0 2012-06-11 12:06 
/tmp/data/flume/FlumeData.1339387579275.snappy.tmp
-rw-r--r--   1 leslin supergroup          0 2012-06-11 12:07 
/tmp/data/flume/FlumeData.1339387579276.snappy.tmp
-rw-r--r--   1 leslin supergroup          0 2012-06-11 12:07 
/tmp/data/flume/FlumeData.1339387579277.snappy.tmp
-rw-r--r--   1 leslin supergroup          0 2012-06-11 12:07 
/tmp/data/flume/FlumeData.1339387579278.snappy

3.If user want to compress data,  hdfs.fileType must be set CompressedStream 
while hdfs.codeC is set to be any available codec like SnappyCodec together. If 
hdfs.fileType is DataStream, file does NOT compressed though it has compressed 
file type extension, like .snappy, *.gz. Misunderstand is caused by incomplete 
user doc. I have updated user guide in FLUME-1270. 

I agree with Mingjie that this warning can't be changed.  As a compromise, 
flume can do some pre-check to make sure:
(1) if user set hdfs.codeC while fileType is DataStream, agent will halt and 
log error instead of going on. (In fact, this error can also be found in HBase)
(2) if user set hdfs.codeC while fileType is CompressedStream, but codec class 
is unavailable. the agent will halt instead of going on write many 0 size files 
again and again. 

Any more suggestion? Thanks!
                
> HDFSEventSink causes *.snappy file to be created in HDFS even when snappy 
> isn't used (due to missing lib)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-1200
>                 URL: https://issues.apache.org/jira/browse/FLUME-1200
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>         Environment: RHEL 6.2 64-bit
>            Reporter: Will McQueen
>            Assignee: Leslin (Hong Xiang Lin)
>             Fix For: v1.2.0
>
>
> If I use HDFSEventSink and specify the codec to be snappy, then the sink 
> writes data to HDFS with the ".snappy" extension... but the content of those 
> HDFS files is not in snappy format when the snappy libs aren't found. The log 
> files mention this:
>      2012-05-11 19:38:49,868 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
>      2012-05-11 19:38:49,868 WARN snappy.LoadSnappy: Snappy native library 
> not loaded
> ...and I think it should be an error rather than a warning... the sink 
> shouldn't write data at all to HDFS if it's not in the format expected by the 
> config file (ie, not compressed with snappy). The config file I used is:
> agent.channels = c1
> agent.sources = r1
> agent.sinks = k1
> #
> agent.channels.c1.type = MEMORY
> #
> agent.sources.r1.channels = c1
> agent.sources.r1.type = SEQ
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = LOGGER
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = HDFS
> agent.sinks.k1.hdfs.path = hdfs://<host>:<port>:<path>
> agent.sinks.k1.hdfs.fileType = DataStream
> agent.sinks.k1.hdfs.codeC = SnappyCodec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1200) HDFSEventSink causes *.snappy file to be created in HDFS even when snappy isn't used (due to missing lib)

Reply via email to