arw357 edited a comment on issue #590: Failed to APPEND_FILE /HUDI_MOR/2019/03/01/.abca886a-e213-4199-b571-475919d34fe5_20190301123358.log.1 for DFSClient_NONMAPREDUCE_-263902482_1 on 172.29.0.10 because lease recovery is in progress URL: https://github.com/apache/incubator-hudi/issues/590#issuecomment-516377266 I found something alike . I have only one datanode - it happens when the archive log is being appended. I can reproduce it. I see that this build failed as well : https://api.travis-ci.org/v3/job/547004744/log.txt ```:08:04.613 [DataLakeSystem-akka.actor.default-dispatcher-5] INFO com.uber.hoodie.common.table.log.HoodieLogFormat$WriterBuilder - Building HoodieLogFormat Writer 11:08:04.613 [DataLakeSystem-akka.actor.default-dispatcher-5] INFO com.uber.hoodie.common.table.log.HoodieLogFormat$WriterBuilder - Computing the next log version for commits in hdfs://namenode:8020/data/lake/3ecfc2fc-8f48-46e1-8e0c-be8fc13eb596/converted/.hoodie/archived 11:08:04.621 [DataLakeSystem-akka.actor.default-dispatcher-5] INFO com.uber.hoodie.common.table.log.HoodieLogFormat$WriterBuilder - Computed the next log version for commits in hdfs://namenode:8020/data/lake/3ecfc2fc-8f48-46e1-8e0c-be8fc13eb596/converted/.hoodie/archived as 1 11:08:04.621 [DataLakeSystem-akka.actor.default-dispatcher-5] INFO com.uber.hoodie.common.table.log.HoodieLogFormat$WriterBuilder - HoodieLogFile on path hdfs://namenode:8020/data/lake/3ecfc2fc-8f48-46e1-8e0c-be8fc13eb596/converted/.hoodie/archived/.commits_.archive.1 11:08:04.630 [DataLakeSystem-akka.actor.default-dispatcher-5] INFO com.uber.hoodie.common.table.log.HoodieLogFormatWriter - HoodieLogFile {hdfs://namenode:8020/data/lake/3ecfc2fc-8f48-46e1-8e0c-be8fc13eb596/converted/.hoodie/archived/.commits_.archive.1} exists. Appending to existing file 11:08:04.652 [DataLakeSystem-akka.actor.default-dispatcher-5] INFO com.uber.hoodie.io.HoodieCommitArchiveLog - Archiving instants [[20190730104047__clean__COMPLETED], [20190730104512__clean__COMPLETED], [20190730104605__clean__COMPLETED], [20190730104628__clean__COMPLETED], [20190730104651__clean__COMPLETED], [20190730104721__clean__COMPLETED], [20190730104758__clean__COMPLETED], [20190730104821__clean__COMPLETED], [20190730104843__clean__COMPLETED], [20190730104905__clean__COMPLETED], [20190730104926__clean__COMPLETED]] 11:08:04.653 [DataLakeSystem-akka.actor.default-dispatcher-5] INFO com.uber.hoodie.io.HoodieCommitArchiveLog - Wrapper schema {"type":"record","name":"HoodieArchivedMetaEntry","namespace":"com.uber.hoodie.avro.model","fields":[{"name":"hoodieCommitMetadata","type":["null",{"type":"record","name":"HoodieCommitMetadata","fields":[{"name":"partitionToWriteStats","type":["null",{"type":"map","values":{"type":"array","items":{"type":"record","name":"HoodieWriteStat","fields":[{"name":"fileId","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"path","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"prevCommit","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"numWrites","type":["null","long"],"default":null},{"name":"numDeletes","type":["null","long"],"default":null},{"name":"numUpdateWrites","type":["null","long"],"default":null},{"name":"totalWriteBytes","type":["null","long"],"default":null},{"name":"totalWriteErrors","type":["null","long"],"default":null},{"name":"partitionPath","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"totalLogRecords","type":["null","long"],"default":null},{"name":"totalLogFiles","type":["null","long"],"default":null},{"name":"totalUpdatedRecordsCompacted","type":["null","long"],"default":null},{"name":"numInserts","type":["null","long"],"default":null},{"name":"totalLogBlocks","type":["null","long"],"default":null},{"name":"totalCorruptLogBlock","type":["null","long"],"default":null},{"name":"totalRollbackBlocks","type":["null","long"],"default":null}]}},"avro.java.string":"String"}]},{"name":"extraMetadata","type":["null",{"type":"map","values":{"type":"string","avro.java.string":"String"},"avro.java.string":"String"}]}]}],"default":"null"},{"name":"hoodieCleanMetadata","type":["null",{"type":"record","name":"HoodieCleanMetadata","fields":[{"name":"startCleanTime","type":{"type":"string","avro.java.string":"String"}},{"name":"timeTakenInMillis","type":"long"},{"name":"totalFilesDeleted","type":"int"},{"name":"earliestCommitToRetain","type":{"type":"string","avro.java.string":"String"}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieCleanPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"policy","type":{"type":"string","avro.java.string":"String"}},{"name":"deletePathPatterns","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"successDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"failedDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}}]},"avro.java.string":"String"}}]}],"default":"null"},{"name":"hoodieCompactionMetadata","type":["null",{"type":"record","name":"HoodieCompactionMetadata","fields":[{"name":"partitionToCompactionWriteStats","type":["null",{"type":"map","values":{"type":"array","items":{"type":"record","name":"HoodieCompactionWriteStat","fields":[{"name":"partitionPath","type":["null",{"type":"string","avro.java.string":"String"}]},{"name":"totalLogRecords","type":["null","long"]},{"name":"totalLogFiles","type":["null","long"]},{"name":"totalUpdatedRecordsCompacted","type":["null","long"]},{"name":"hoodieWriteStat","type":["null","HoodieWriteStat"]}]}},"avro.java.string":"String"}]}]}],"default":"null"},{"name":"hoodieRollbackMetadata","type":["null",{"type":"record","name":"HoodieRollbackMetadata","fields":[{"name":"startRollbackTime","type":{"type":"string","avro.java.string":"String"}},{"name":"timeTakenInMillis","type":"long"},{"name":"totalFilesDeleted","type":"int"},{"name":"commitsRollback","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieRollbackPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"successDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"failedDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}}]},"avro.java.string":"String"}}]}],"default":"null"},{"name":"hoodieSavePointMetadata","type":["null",{"type":"record","name":"HoodieSavepointMetadata","fields":[{"name":"savepointedBy","type":{"type":"string","avro.java.string":"String"}},{"name":"savepointedAt","type":"long"},{"name":"comments","type":{"type":"string","avro.java.string":"String"}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieSavepointPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"savepointDataFile","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}}]},"avro.java.string":"String"}}]}],"default":"null"},{"name":"commitTime","type":["null",{"type":"string","avro.java.string":"String"}]},{"name":"actionType","type":["null",{"type":"string","avro.java.string":"String"}]}]} 11:08:04.706 [DataLakeSystem-akka.actor.default-dispatcher-5] ERROR com.uber.hoodie.exception.HoodieException: Unable to close HoodieLogFormat writer at com.uber.hoodie.io.HoodieCommitArchiveLog.close(HoodieCommitArchiveLog.java:103) at com.uber.hoodie.io.HoodieCommitArchiveLog.archiveIfRequired(HoodieCommitArchiveLog.java:124) at com.uber.hoodie.HoodieWriteClient.commit(HoodieWriteClient.java:542) at com.uber.hoodie.HoodieWriteClient.commit(HoodieWriteClient.java:489) at com.uber.hoodie.HoodieWriteClient.commit(HoodieWriteClient.java:480) at com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:155) at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225) .... at akka.stream.impl.fusing.Map$$anon$10.onPush(Ops.scala:52) at akka.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:519) at akka.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:411) at akka.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:588) at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:472) at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:563) at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:745) at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:760) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:670) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[172.21.0.14:9866,DS-3b46b080-0c8e-4d74-a090-734caebd4a16,DISK]], original=[DatanodeInfoWithStorage[172.21.0.14:9866,DS-3b46b080-0c8e-4d74-a090-734caebd4a16,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1044) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1107) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1276) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:560) ``` Any clue ?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services