Chunyi Yang created HADOOP-18847: ------------------------------------ Summary: mapreduce job encounters java.io.IOException when dfs.client.rbf.observer.read.enable is true Key: HADOOP-18847 URL: https://issues.apache.org/jira/browse/HADOOP-18847 Project: Hadoop Common Issue Type: Bug Components: common Reporter: Chunyi Yang
When running mapreduce job in a Router-Based Federation+Observer read enabled environment, we see approximately a 1% probability of encountering the following error. {code:java} "java.io.IOException: Resource hdfs://XXXX/user/XXXX/.staging/job_XXXXXX/.tez/application_XXXXXX/tez-conf.pb changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: \"2023-07-07T12:41:16.822+0900\", current time: \"2023-07-07T12:41:22.386+0900\"", {code} This error happens in function verifyAndCopy inside FSDownload.java when nodemanager tries to download a file right after the file has been written to the HDFS. The write operation runs on active namenode and read operation runs on observer namenode as expected. The edits file and hdfs-audit files show that the expected time seen in error message is the OP_CLOSE MTIME of the tez-conf.pb file(which is correct) while the actual timestamp it gets from the read operation is OP_ADD MTIME of the target tez-conf.pf file (which is wrong). This mismatch shows that the observer namenode responses to client before its edits file updates to the lastest stateid which causes the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org