[jira] [Commented] (SPARK-39269) spark3.2.0 commit tmp file is not found when rename
[ https://issues.apache.org/jira/browse/SPARK-39269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553957#comment-17553957 ] cxb commented on SPARK-39269: - [~srowen] my problem is spark still rename the temporary commit file but the temporary commit file is already renamed to a official file , so i get a error and exit the spark program, this problem only occurs when namenode is busy, i think this just isn't actionable too, but the error still throw by spark, I don't know if this is spark's problem or hdfs's problem > spark3.2.0 commit tmp file is not found when rename > > > Key: SPARK-39269 > URL: https://issues.apache.org/jira/browse/SPARK-39269 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL, Structured Streaming >Affects Versions: 3.2.0 > Environment: spark 3.2.0 > yarn > 2 executors and 1 driver > a job include of 4 stream query >Reporter: cxb >Priority: Major > Original Estimate: 72h > Remaining Estimate: 72h > > a job include 4 stream query and one of query throw offset tmp file is not > found in running that lead to exit the job > but it hasn't happen to me ever when i using spark 3.0.0 > i look at code of implement in spark3.2 that it is not big different to > spark3.0 > maybe jackson of new version problem? > > {code:java} > java.io.FileNotFoundException: rename source > /tmp/chenxiaobin/regist_gp_bmhb_v2/commits/.35362.b4684b94-c0bb-4d87-baf0-cd1a508d7be7.tmp > is not found. > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.validateRenameSource(FSDirRenameOp.java:561) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:361) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:300) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:247) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3931) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename2(NameNodeRpcServer.java:1039) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename2(ClientNamenodeProtocolServerSideTranslatorPB.java:610) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1991) > at org.apache.hadoop.fs.Hdfs.renameInternal(Hdfs.java:341) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:690) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at > org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.renameTempFile(CheckpointFileManager.scala:346) > at > org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.close(CheckpointFileManager.scala:154) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.$anonfun$addNewBatchByStream$2(HDFSMetadataLog.scala:176) > at > scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.addNewBatchByStream(HDFSMetadataLog.scala:171) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:116) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$18(MicroB
[jira] [Commented] (SPARK-39269) spark3.2.0 commit tmp file is not found when rename
[ https://issues.apache.org/jira/browse/SPARK-39269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553826#comment-17553826 ] Sean R. Owen commented on SPARK-39269: -- I think this just isn't actionable - not clear how it is reproduced or what you are saying the problem is > spark3.2.0 commit tmp file is not found when rename > > > Key: SPARK-39269 > URL: https://issues.apache.org/jira/browse/SPARK-39269 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL, Structured Streaming >Affects Versions: 3.2.0 > Environment: spark 3.2.0 > yarn > 2 executors and 1 driver > a job include of 4 stream query >Reporter: cxb >Priority: Major > Original Estimate: 72h > Remaining Estimate: 72h > > a job include 4 stream query and one of query throw offset tmp file is not > found in running that lead to exit the job > but it hasn't happen to me ever when i using spark 3.0.0 > i look at code of implement in spark3.2 that it is not big different to > spark3.0 > maybe jackson of new version problem? > > {code:java} > java.io.FileNotFoundException: rename source > /tmp/chenxiaobin/regist_gp_bmhb_v2/commits/.35362.b4684b94-c0bb-4d87-baf0-cd1a508d7be7.tmp > is not found. > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.validateRenameSource(FSDirRenameOp.java:561) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:361) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:300) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:247) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3931) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename2(NameNodeRpcServer.java:1039) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename2(ClientNamenodeProtocolServerSideTranslatorPB.java:610) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1991) > at org.apache.hadoop.fs.Hdfs.renameInternal(Hdfs.java:341) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:690) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958) > at > org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.renameTempFile(CheckpointFileManager.scala:346) > at > org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.close(CheckpointFileManager.scala:154) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.$anonfun$addNewBatchByStream$2(HDFSMetadataLog.scala:176) > at > scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.addNewBatchByStream(HDFSMetadataLog.scala:171) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:116) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$18(MicroBatchExecution.scala:615) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:627) > at