[
https://issues.apache.org/jira/browse/FLUME-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943005#comment-15943005
]
darkz edited comment on FLUME-1702 at 3/27/17 11:16 AM:
--------------------------------------------------------
I select the .tmp data in hive,then it cauth a error:
Failed with exception
java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException:
org.codehaus.jackson.JsonParseException: Illegal character ((CTRL-CHAR, code
1)): only regular white space (\r, \n, \t) is allowed between tokens
at [Source: java.io.ByteArrayInputStream@7730ef88; line: 1, column: 2]
I think is the compressed file with '.tmp' suffix is in use and is not a
completed compressed file,so codec in hadoop colud not recognize the content of
it
After all:Yes,I use the "." prefix to skip ".tmp" file,but the flume docuent
dos not mention it...
was (Author: darkz):
Yes,I use the "." prefix to skip ".tmp" file,but the flume document dos not
mention it...
> HDFSEventSink should write to a hidden file as opposed to a .tmp file
> ---------------------------------------------------------------------
>
> Key: FLUME-1702
> URL: https://issues.apache.org/jira/browse/FLUME-1702
> Project: Flume
> Issue Type: Improvement
> Reporter: Brock Noland
> Assignee: Jarek Jarcec Cecho
> Fix For: 1.4.0
>
> Attachments: bugFLUME-1702.patch, bugFLUME-1702.patch
>
>
> Currently we write to a .tmp file. The problem is that if MR jobs are being
> run on the directory we are writing to, then it's common for an MR job to
> list the directory, get a .tmp file and then in the mean time the .tmp file
> is renamed causing the job to fail when run.
> Using JavaMR you can use a PathFilter to avoid this, however a custom
> solution is required for Pig, Hive, etc.
> Perhaps we should write to a hidden file so that MR never tries to process
> data in flight.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)