[ 
https://issues.apache.org/jira/browse/SPARK-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782678#comment-16782678
 ] 

Mohamed Mehdi BEN AISSA commented on SPARK-24346:
-------------------------------------------------

Any news !?  I have exactly the same issue in the same context (HDP version) :

ERROR TransportRequestHandler: Error opening block 
StreamChunkId\{streamId=1377556883266, chunkIndex=9} for request from 
/10.147.167.40:39050

java.io.EOFException

        at java.io.DataInputStream.readFully(DataInputStream.java:197)

        at java.io.DataInputStream.readLong(DataInputStream.java:416)

        at 
org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:209)

        at 
org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:375)

        at 
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:61)

        at 
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:60)

        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)

        at 
scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31)

        at 
org.apache.spark.network.server.OneForOneStreamManager.getChunk(OneForOneStreamManager.java:92)

        at 
org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:137)

        at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)

        at 
org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)

        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)

        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)

        at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)

        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)

        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)

        at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)

        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)

        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)

        at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)

        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)

        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)

        at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)

        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)

        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

        at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)

        at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138)

        at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)

        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)

        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)

        at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

        at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

        at java.lang.Thread.run(Thread.java:745)

 

 

> Executors are unable to fetch remote cache blocks
> -------------------------------------------------
>
>                 Key: SPARK-24346
>                 URL: https://issues.apache.org/jira/browse/SPARK-24346
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 2.3.0
>         Environment: OS: Centos 7.3
> Cluster: Hortonwork HDP 2.6.5 with Spark 2.3.0
>            Reporter: Truong Duc Kien
>            Priority: Major
>
> After we upgrade from Spark 2.2.1 to Spark 2.3.0, our Spark jobs took a 
> massive performance hit because executors become unable to fetch remote cache 
> block from each others. The scenario is:
> 1. An executor creates a connection and sends a ChunkFetchRequest message to 
> another executor. 
> 2. This request arrives at the target executor, which sends back a 
> ChunkFetchSuccess response
> 3. The ChunkFetchSuccess msg never arrives.
> 4. The connection between these two executors is killed by the originating 
> executor after 120s of idleness. At the same time, the other executor report 
> that it failed to send the ChunkFetchSuccess because the pipe is closed.
> This process repeats itself 3 times, delaying our jobs by 6 minutes, then the 
> originating executor decides to stop fetching and calculates the block by 
> itself and the job can continue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to