[ https://issues.apache.org/jira/browse/SPARK-26728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liupengcheng updated SPARK-26728: --------------------------------- Summary: Make rdd.unpersist blocking configurable (was: Make rdd.unpersist and broadcast.unpersist blocking configurable) > Make rdd.unpersist blocking configurable > ---------------------------------------- > > Key: SPARK-26728 > URL: https://issues.apache.org/jira/browse/SPARK-26728 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.1.0, 2.4.0 > Reporter: liupengcheng > Priority: Major > > Currently, rdd.unpersist's blocking argument is set to true by default. > However, in actual production cluster(especially large cluster), node lost or > network issue can always happen. > Users always use rdd.unpersist as non-exceptional, so sometimes the blocking > unpersist may cause user's job failure, and this happened many times in our > cluster. > {code:java} > 2018-05-16,13:28:33,489 WARN org.apache.spark.storage.BlockManagerMaster: > Failed to remove RDD 15 - Failed to send RPC 7571440800577648876 to > c3-hadoop-prc-st2325.bj/10.136.136.25:43474: > java.nio.channels.ClosedChannelException > java.io.IOException: Failed to send RPC 7571440800577648876 to > c3-hadoop-prc-st2325.bj/10.136.136.25:43474: > java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) > at > io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) > at > io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.nio.channels.ClosedChannelException > 2018-05-16,13:28:33,489 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: > User class threw exception: java.io.IOException: Failed to send RPC > 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: > java.nio.channels.ClosedChannelException > java.io.IOException: Failed to send RPC 7571440800577648876 to > c3-hadoop-prc-st2325.bj/10.136.136.25:43474: > java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) > at > io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) > at > io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.nio.channels.ClosedChannelException > {code} > I think we can make this blocking argument as a config, so that we can > control the default value of it with gray scale systems. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org