Dmitry Kravchuk created SPARK-41163: ---------------------------------------
Summary: Spark 3.2.2 Key: SPARK-41163 URL: https://issues.apache.org/jira/browse/SPARK-41163 Project: Spark Issue Type: Bug Components: Build, Deploy Affects Versions: 3.2.2 Environment: * spark 3.2.2 * hadoop 3.1.2 * hive 3.1.1 * scala 2.12 Reporter: Dmitry Kravchuk Fix For: 3.2.3 Hello there. I've build spark 3.2.2 for my cluster which has hadoop 3.1.2 and scala 2.12 (pom.xml is attached). build script: {code:java} cd spark && \ ./build/mvn -Pyarn -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive -Phive-thriftserver -DskipTests clean package {code} It was working fine but a few applications has got strage error and warning form time to time. It always looks like datanode connection lost and shuffle reading issues. {code:java} 2022-11-16 22:18:25,423 ERROR server.TransportChannelHandler: Connection to s00abd02node9.company.com/10.x.y.163:35143 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.shuffle.io.connectionTimeout if this is wrong. 2022-11-16 22:18:25,423 ERROR client.TransportResponseHandler: Still have 5 requests outstanding when connection from s00abd02node9.company.com/10.x.y.163:35143 is closed 2022-11-16 22:18:25,423 WARN netty.NettyBlockTransferService: Error while trying to get the host local dirs for [16] 2022-11-16 22:18:25,425 ERROR storage.ShuffleBlockFetcherIterator: Error occurred while fetching host local blocks {code} So when it happend application will go to retry and fail after 2nd start. Can anybody help? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org