[
https://issues.apache.org/jira/browse/FLUME-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389178#comment-15389178
]
Mike Percy commented on FLUME-2458:
-----------------------------------
[~jfield], to me this sounds like a bug in HDFS snapshots, or maybe a bug in
distcp.
Particularly, if I take a snapshot and then copy the snapshotted data with
distcp then I would expect a resulting state of the filesystem that is isolated
from renames that occur after the snapshot was taken. It sounds like there are
holes in that isolation. I would also expect a consistent snapshot across the
whole filesystem.
Admittedly, I am not really familiar with HDFS snapshots semantics or internals
or how distcp interacts with those snapshots.
Would you agree? Is this a bug in one of those systems?
> Separate hdfs tmp directory for flume hdfs sink
> -----------------------------------------------
>
> Key: FLUME-2458
> URL: https://issues.apache.org/jira/browse/FLUME-2458
> Project: Flume
> Issue Type: Improvement
> Components: Sinks+Sources
> Affects Versions: v1.5.0.1
> Reporter: Sverre Bakke
> Assignee: Neerja Khattar
> Priority: Minor
> Attachments: FLUME-2458.patch, patch-2458.txt
>
>
> The current HDFS sink will write temporary files to the same directory as the
> final file will be stored. This is a problem for several reasons:
> 1) File moving
> When mapreduce fetches a list of files to be processed and then processes
> files that are then gone (i.e. are moved from .tmp to whatever final name it
> is suppose to have), then the mapreduce job will crash.
> 2) File type
> When mapreduce decides how to process files, then it looks at files
> extension. If using compressed files, then it will decompress it for you. If
> the file has a .tmp file extension (in the same folder) then it will treat a
> compressed file as an uncompressed files, thus breaking the results of the
> mapreduce job.
> I propose that the sink gets an optional tmp path for storing these files to
> avoid these issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)