[ 
https://issues.apache.org/jira/browse/SPARK-26728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liupengcheng updated SPARK-26728:
---------------------------------
    Summary: Make rdd.unpersist blocking configurable  (was: Make rdd.unpersist 
and broadcast.unpersist blocking configurable)

> Make rdd.unpersist blocking configurable
> ----------------------------------------
>
>                 Key: SPARK-26728
>                 URL: https://issues.apache.org/jira/browse/SPARK-26728
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0, 2.4.0
>            Reporter: liupengcheng
>            Priority: Major
>
> Currently, rdd.unpersist's blocking argument is set to true by default. 
> However, in actual production cluster(especially large cluster), node lost or 
> network issue can always happen.
> Users always use rdd.unpersist as non-exceptional, so sometimes the blocking 
> unpersist may cause user's job failure, and this happened many times in our 
> cluster.
> {code:java}
> 2018-05-16,13:28:33,489 WARN org.apache.spark.storage.BlockManagerMaster: 
> Failed to remove RDD 15 - Failed to send RPC 7571440800577648876 to 
> c3-hadoop-prc-st2325.bj/10.136.136.25:43474: 
> java.nio.channels.ClosedChannelException
> java.io.IOException: Failed to send RPC 7571440800577648876 to 
> c3-hadoop-prc-st2325.bj/10.136.136.25:43474: 
> java.nio.channels.ClosedChannelException
>       at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
>       at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
>       at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>       at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
>       at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
>       at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
>       at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
>       at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
>       at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
>       at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
>       at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> 2018-05-16,13:28:33,489 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: 
> User class threw exception: java.io.IOException: Failed to send RPC 
> 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: 
> java.nio.channels.ClosedChannelException
> java.io.IOException: Failed to send RPC 7571440800577648876 to 
> c3-hadoop-prc-st2325.bj/10.136.136.25:43474: 
> java.nio.channels.ClosedChannelException
>       at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
>       at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
>       at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>       at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
>       at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
>       at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
>       at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
>       at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
>       at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
>       at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
>       at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> {code}
> I think we can make this blocking argument as a config, so that we can 
> control the default value of it with gray scale systems.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to