[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691008#comment-13691008 ]
Kousuke Saruta commented on MAPREDUCE-5247: ------------------------------------------- As Devaraj said, we can use "mapred.input.pathFilter.class" but, as far as I know, the name of the temporary file is undocumented and I think changes of the specification or implementation of HDFS should not affect users who have ever used HDFS. So, I think we should consider the name of the temporary file. It may good that the name of the temporary file starts with "." or "_". > FileInputFormat should filter files with '._COPYING_' sufix > ----------------------------------------------------------- > > Key: MAPREDUCE-5247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Stan Rosenberg > > FsShell copy/put creates staging files with '._COPYING_' suffix. These files > should be considered hidden by FileInputFormat. (A simple fix is to add the > following conjunct to the existing hiddenFilter: > {code} > !name.endsWith("._COPYING_") > {code} > After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data > loader which uses 'hadoop fs -put' to load data into hourly partitions. We > also have intra-hourly jobs which are scheduled to execute several times per > hour using the same hourly partition as input. Thus, as the new data is > continuously loaded, these staging files (i.e., ._COPYING_) are breaking our > jobs (since when copy/put completes staging files are moved). > As a workaround, we've defined a custom input path filter and loaded it with > "mapred.input.pathFilter.class". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira