Chang Zong created FLUME-3079:
---------------------------------
Summary: HDFS sink using snappy compression cannot process .tmp
file correctly when data is writing in
Key: FLUME-3079
URL: https://issues.apache.org/jira/browse/FLUME-3079
Project: Flume
Issue Type: Bug
Components: Sinks+Sources
Affects Versions: 1.5.2
Reporter: Chang Zong
I'm using HDFS sink with Snappy compression codec. When JSON events is writing
into HDFS, there is a .snappy.tmp file generated. If I want to access data in
that tmp file with hive, there would be a JSON parsing error.
I think the reason is HDFS sink already put some Snappy format content into the
tmp file, but as the file is not finished, writing Snappy format is not
completed yet, which cannot be recognised by Hive JSON Serde. After the file is
rolled up to a normal Snappy file, it can be processed corrected.
So is there a way to keep text format while writing data into the tmp file, and
convert it to Snappy format after the tmp file is rolled up?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)