[ https://issues.apache.org/jira/browse/HBASE-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Robertson updated HBASE-21183: ---------------------------------- Summary: loadIncrementalHFiles sometimes throws FileNotFoundException on retry (was: loadincrementalHFiles sometimes throws FileNotFoundException on retry) > loadIncrementalHFiles sometimes throws FileNotFoundException on retry > --------------------------------------------------------------------- > > Key: HBASE-21183 > URL: https://issues.apache.org/jira/browse/HBASE-21183 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.0 > Reporter: Tim Robertson > Priority: Major > > On a nightly batch job which prepares 100s of well balanced HFiles at around > 2GB each, we see sporadic failures in a bulk load. > I'm unable to paste the logs here (different network) but they show e.g. the > following on a failing day: > {code:java} > Trying to load hfile... /my/input/path/... > Attempt to bulk load region containing ... failed. This is recoverable and > will be retried > Attempt to bulk load region containing ... failed. This is recoverable and > will be retried > Attempt to bulk load region containing ... failed. This is recoverable and > will be retried > Split occurred while grouping HFiles, retry attempt 1 with 3 files remaining > to group or split > Trying to load hfile... > IOException during splitting > java.io.FileNotFoundException: File does not exist: /my/input/path/... > {code} > The exception get's thrown from [this > line|https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java#L685]. > > I should note that this is a secure cluster (CDH 5.12.x). > I've tried to go through the code, and don't spot an obvious race condition. > I don't spot any changes related to this for the later 1.x versions so > presume this exists in 1.5. > I'm yet to get access to the NameNode audit logs when this occurs to trace > through the rename() calls around these particular files. > I don't see timeouts like HBASE-4030 -- This message was sent by Atlassian JIRA (v7.6.3#76005)