[
https://issues.apache.org/jira/browse/FLUME-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292638#comment-13292638
]
Leslin (Hong Xiang Lin) commented on FLUME-1200:
------------------------------------------------
Below are from today's investigation:
1. There will be exception when snappy is not available like below.
12/06/11 12:06:19 ERROR hdfs.HDFSEventSink: process failed
java.lang.RuntimeException: native snappy library not available
at
org.apache.hadoop.io.compress.SnappyCodec.createCompressor(SnappyCodec.java:135)
at
org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:84)
at
org.apache.flume.sink.hdfs.HDFSCompressedDataStream.open(HDFSCompressedDataStream.java:66)
at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:189)
at
org.apache.flume.sink.hdfs.BucketWriter.access$0(BucketWriter.java:160)
at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:150)
at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:1)
at
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:120)
at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:147)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:291)
at
org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:671)
at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
2. The agent will not fail and try to deliver event and write many snappy
extended files into hdfs again and again. All files are 0 size as exception
above.
-rw-r--r-- 1 leslin supergroup 0 2012-06-11 12:06
/tmp/data/flume/FlumeData.1339387579269.snappy.tmp
-rw-r--r-- 1 leslin supergroup 0 2012-06-11 12:06
/tmp/data/flume/FlumeData.1339387579270.snappy.tmp
-rw-r--r-- 1 leslin supergroup 0 2012-06-11 12:06
/tmp/data/flume/FlumeData.1339387579271.snappy.tmp
-rw-r--r-- 1 leslin supergroup 0 2012-06-11 12:06
/tmp/data/flume/FlumeData.1339387579272.snappy.tmp
-rw-r--r-- 1 leslin supergroup 0 2012-06-11 12:06
/tmp/data/flume/FlumeData.1339387579273.snappy.tmp
-rw-r--r-- 1 leslin supergroup 0 2012-06-11 12:06
/tmp/data/flume/FlumeData.1339387579274.snappy.tmp
-rw-r--r-- 1 leslin supergroup 0 2012-06-11 12:06
/tmp/data/flume/FlumeData.1339387579275.snappy.tmp
-rw-r--r-- 1 leslin supergroup 0 2012-06-11 12:07
/tmp/data/flume/FlumeData.1339387579276.snappy.tmp
-rw-r--r-- 1 leslin supergroup 0 2012-06-11 12:07
/tmp/data/flume/FlumeData.1339387579277.snappy.tmp
-rw-r--r-- 1 leslin supergroup 0 2012-06-11 12:07
/tmp/data/flume/FlumeData.1339387579278.snappy
3.If user want to compress data, hdfs.fileType must be set CompressedStream
while hdfs.codeC is set to be any available codec like SnappyCodec together. If
hdfs.fileType is DataStream, file does NOT compressed though it has compressed
file type extension, like .snappy, *.gz. Misunderstand is caused by incomplete
user doc. I have updated user guide in FLUME-1270.
I agree with Mingjie that this warning can't be changed. As a compromise,
flume can do some pre-check to make sure:
(1) if user set hdfs.codeC while fileType is DataStream, agent will halt and
log error instead of going on. (In fact, this error can also be found in HBase)
(2) if user set hdfs.codeC while fileType is CompressedStream, but codec class
is unavailable. the agent will halt instead of going on write many 0 size files
again and again.
Any more suggestion? Thanks!
> HDFSEventSink causes *.snappy file to be created in HDFS even when snappy
> isn't used (due to missing lib)
> ---------------------------------------------------------------------------------------------------------
>
> Key: FLUME-1200
> URL: https://issues.apache.org/jira/browse/FLUME-1200
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.2.0
> Environment: RHEL 6.2 64-bit
> Reporter: Will McQueen
> Assignee: Leslin (Hong Xiang Lin)
> Fix For: v1.2.0
>
>
> If I use HDFSEventSink and specify the codec to be snappy, then the sink
> writes data to HDFS with the ".snappy" extension... but the content of those
> HDFS files is not in snappy format when the snappy libs aren't found. The log
> files mention this:
> 2012-05-11 19:38:49,868 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 2012-05-11 19:38:49,868 WARN snappy.LoadSnappy: Snappy native library
> not loaded
> ...and I think it should be an error rather than a warning... the sink
> shouldn't write data at all to HDFS if it's not in the format expected by the
> config file (ie, not compressed with snappy). The config file I used is:
> agent.channels = c1
> agent.sources = r1
> agent.sinks = k1
> #
> agent.channels.c1.type = MEMORY
> #
> agent.sources.r1.channels = c1
> agent.sources.r1.type = SEQ
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = LOGGER
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = HDFS
> agent.sinks.k1.hdfs.path = hdfs://<host>:<port>:<path>
> agent.sinks.k1.hdfs.fileType = DataStream
> agent.sinks.k1.hdfs.codeC = SnappyCodec
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira