[jira] [Commented] (FLUME-2458) Separate hdfs tmp directory for flume hdfs sink

Harsh J (JIRA) Tue, 23 Jun 2015 09:56:37 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597940#comment-14597940
 ]


Harsh J commented on FLUME-2458:
--------------------------------

The con with using a path-like string in the filePrefix attribute is that you 
will also need to make sure that provided path pre-exists. Creates of the file 
will otherwise simply fail quoting their parent does not exist.

Its a fine workaround, but more manual than Flume should let be.

> Separate hdfs tmp directory for flume hdfs sink
> -----------------------------------------------
>
>                 Key: FLUME-2458
>                 URL: https://issues.apache.org/jira/browse/FLUME-2458
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.5.0.1
>            Reporter: Sverre Bakke
>            Priority: Minor
>         Attachments: FLUME-2458.patch, patch-2458.txt
>
>
> The current HDFS sink will write temporary files to the same directory as the 
> final file will be stored. This is a problem for several reasons:
> 1) File moving
> When mapreduce fetches a list of files to be processed and then processes 
> files that are then gone (i.e. are moved from .tmp to  whatever final name it 
> is suppose to have), then the mapreduce job will crash.
> 2) File type
> When mapreduce decides how to process files, then it looks at files 
> extension. If using compressed files, then it will decompress it for you. If 
> the file has a .tmp file extension (in the same folder) then it will treat a 
> compressed file as an uncompressed files, thus breaking the results of the 
> mapreduce job.
> I propose that the sink gets an optional tmp path for storing these files to 
> avoid these issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLUME-2458) Separate hdfs tmp directory for flume hdfs sink

Reply via email to