[ https://issues.apache.org/jira/browse/SPARK-37350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shefron Yudy updated SPARK-37350: --------------------------------- Description: I saw the same error in SparkThriftServer process's log when I restart all datanodes of HDFS , The logs as follows: {code:java} 2021-11-16 13:52:11,044 ERROR [spark-listener-group-eventLog] scheduler.AsyncEventQueue:Listener EventLoggingListener threw an exception java.io.IOException: All datanodes [DatanodeInfoWithStorage[10.121.23.101:1019,DS-90cb8066-8e5c-443f-804b-20c3ad01851b,DISK]] are bad. Aborting... at org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1561) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1495) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481) at org.apache.hadoop.hdfs.DataStreamer.processDatanodeErrorOrExternalError(DataStreamer.java:1256) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667) {code} The eventLog will be available normally if I restart the SparkThriftServer, I suggest that the EventLoggingListener's dfs writer should reconnect after all datanodes stop and then start later。 was: I saw the same error in SparkThriftServer process's log when I restart all datanodes of HDFS , The logs as follows: {panel:title=My title} 2021-11-16 13:52:11,044 ERROR [spark-listener-group-eventLog] scheduler.AsyncEventQueue:Listener EventLoggingListener threw an exception java.io.IOException: All datanodes [DatanodeInfoWithStorage[10.121.23.101:1019,DS-90cb8066-8e5c-443f-804b-20c3ad01851b,DISK]] are bad. Aborting... at org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1561) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1495) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481) at org.apache.hadoop.hdfs.DataStreamer.processDatanodeErrorOrExternalError(DataStreamer.java:1256) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667) {panel} The eventLog will be available normally if I restart the SparkThriftServer, I suggest that the EventLoggingListener's dfs writer should reconnect after all datanodes stop and then start later。 > EventLoggingListener keep logging errors after hdfs restart all datanodes > ------------------------------------------------------------------------- > > Key: SPARK-37350 > URL: https://issues.apache.org/jira/browse/SPARK-37350 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0 > Environment: Spark-2.4.0 > Hadoop-3.0.0 > Hive-2.1.1 > Reporter: Shefron Yudy > Priority: Major > > I saw the same error in SparkThriftServer process's log when I restart all > datanodes of HDFS , The logs as follows: > {code:java} > 2021-11-16 13:52:11,044 ERROR [spark-listener-group-eventLog] > scheduler.AsyncEventQueue:Listener EventLoggingListener threw an exception > java.io.IOException: All datanodes > [DatanodeInfoWithStorage[10.121.23.101:1019,DS-90cb8066-8e5c-443f-804b-20c3ad01851b,DISK]] > are bad. Aborting... > at > org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1561) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1495) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481) > at > org.apache.hadoop.hdfs.DataStreamer.processDatanodeErrorOrExternalError(DataStreamer.java:1256) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667) > {code} > The eventLog will be available normally if I restart the SparkThriftServer, I > suggest that the EventLoggingListener's dfs writer should reconnect after all > datanodes stop and then start later。 -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org