[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680846#comment-13680846 ]
Stan Rosenberg commented on MAPREDUCE-5247: ------------------------------------------- Correct, the above holds in the community version; before submitting this jira I checked the (apache) trunk. > FileInputFormat should filter files with '._COPYING_' sufix > ----------------------------------------------------------- > > Key: MAPREDUCE-5247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Stan Rosenberg > > FsShell copy/put creates staging files with '._COPYING_' suffix. These files > should be considered hidden by FileInputFormat. (A simple fix is to add the > following conjunct to the existing hiddenFilter: > {code} > !name.endsWith("._COPYING_") > {code} > After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data > loader which uses 'hadoop fs -put' to load data into hourly partitions. We > also have intra-hourly jobs which are scheduled to execute several times per > hour using the same hourly partition as input. Thus, as the new data is > continuously loaded, these staging files (i.e., ._COPYING_) are breaking our > jobs (since when copy/put completes staging files are moved). > As a workaround, we've defined a custom input path filter and loaded it with > "mapred.input.pathFilter.class". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira