[
https://issues.apache.org/jira/browse/SPARK-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jack Hu updated SPARK-7183:
---------------------------
Summary: Memory leak in netty shuffle with spark standalone cluster (was:
Memory leak with netty shuffle with spark standalone cluster)
> Memory leak in netty shuffle with spark standalone cluster
> ----------------------------------------------------------
>
> Key: SPARK-7183
> URL: https://issues.apache.org/jira/browse/SPARK-7183
> Project: Spark
> Issue Type: Bug
> Components: Shuffle
> Affects Versions: 1.3.0
> Reporter: Jack Hu
> Labels: memory-leak, netty, shuffle
>
> There is slow leak in netty shuffle with spark cluster in
> {{TransportRequestHandler.streamIds}}
> In spark cluster, there are some reusable netty connections between two block
> managers to get/send blocks between worker/drivers. These connections are
> handled by the {{org.apache.spark.network.server.TransportRequestHandler}} in
> server side. This handler keep tracking all the streamids negotiate by RPC
> when shuffle data need transform in these two block managers and the streamid
> is keeping increasing, and never get a chance to be deleted exception this
> connection is dropped (seems never happen in normal running).
> Here are some detail logs of this {{TransportRequestHandler}} (Note: we add
> a log a print the total size of {{TransportRequestHandler.streamIds}}, the
> log is "Current set size is N of
> org.apache.spark.network.server.TransportRequestHandler@ADDRESS", this set
> size is keeping increasing in our test)
> {quote}
> 15/04/22 21:00:16 DEBUG TransportServer: Shuffle server started on port :46288
> 15/04/22 21:00:16 INFO NettyBlockTransferService: Server created on 46288
> 15/04/22 21:00:31 INFO TransportRequestHandler: Created
> TransportRequestHandler
> org.apache.spark.network.server.TransportRequestHandler@29a4f3e7
> 15/04/22 21:00:32 TRACE MessageDecoder: Received message RpcRequest:
> RpcRequest\{requestId=6655045571437304938, message=[B@59778678\}
> 15/04/22 21:00:32 TRACE NettyBlockRpcServer: Received request:
> OpenBlocks\{appId=app-20150422210016-0000, execId=<driver>,
> blockIds=[broadcast_1_piece0]}
> 15/04/22 21:00:32 TRACE NettyBlockRpcServer: Registered streamId
> 1387459488000 with 1 buffers
> 15/04/22 21:00:33 TRACE TransportRequestHandler: Sent result
> RpcResponse\{requestId=6655045571437304938, response=[B@d2840b\} to client
> /10.111.7.150:33802
> 15/04/22 21:00:33 TRACE MessageDecoder: Received message ChunkFetchRequest:
> ChunkFetchRequest\{streamChunkId=StreamChunkId\{streamId=1387459488000,
> chunkIndex=0}}
> 15/04/22 21:00:33 TRACE TransportRequestHandler: Received req from
> /10.111.7.150:33802 to fetch block StreamChunkId\{streamId=1387459488000,
> chunkIndex=0\}
> 15/04/22 21:00:33 INFO TransportRequestHandler: Current set size is 1 of
> org.apache.spark.network.server.TransportRequestHandler@29a4f3e7
> 15/04/22 21:00:33 TRACE OneForOneStreamManager: Removing stream id
> 1387459488000
> 15/04/22 21:00:33 TRACE TransportRequestHandler: Sent result
> ChunkFetchSuccess\{streamChunkId=StreamChunkId\{streamId=1387459488000,
> chunkIndex=0}, buffer=NioManagedBuffer\{buf=java.nio.HeapByteBuffer[pos=0
> lim=3839 cap=3839]}} to client /10.111.7.150:33802
> 15/04/22 21:00:34 TRACE MessageDecoder: Received message RpcRequest:
> RpcRequest\{requestId=6660601528868866371, message=[B@42bed1b8\}
> 15/04/22 21:00:34 TRACE NettyBlockRpcServer: Received request:
> OpenBlocks\{appId=app-20150422210016-0000, execId=<driver>,
> blockIds=[broadcast_3_piece0]}
> 15/04/22 21:00:34 TRACE NettyBlockRpcServer: Registered streamId
> 1387459488001 with 1 buffers
> 15/04/22 21:00:34 TRACE TransportRequestHandler: Sent result
> RpcResponse\{requestId=6660601528868866371, response=[B@7fa3fb60\} to client
> /10.111.7.150:33802
> 15/04/22 21:00:34 TRACE MessageDecoder: Received message ChunkFetchRequest:
> ChunkFetchRequest\{streamChunkId=StreamChunkId\{streamId=1387459488001,
> chunkIndex=0}}
> 15/04/22 21:00:34 TRACE TransportRequestHandler: Received req from
> /10.111.7.150:33802 to fetch block StreamChunkId\{streamId=1387459488001,
> chunkIndex=0\}
> 15/04/22 21:00:34 INFO TransportRequestHandler: Current set size is 2 of
> org.apache.spark.network.server.TransportRequestHandler@29a4f3e7
> 15/04/22 21:00:34 TRACE OneForOneStreamManager: Removing stream id
> 1387459488001
> 15/04/22 21:00:34 TRACE TransportRequestHandler: Sent result
> ChunkFetchSuccess\{streamChunkId=StreamChunkId\{streamId=1387459488001,
> chunkIndex=0}, buffer=NioManagedBuffer\{buf=java.nio.HeapByteBuffer[pos=0
> lim=4277 cap=4277]}} to client /10.111.7.150:33802
> 15/04/22 21:00:34 TRACE MessageDecoder: Received message RpcRequest:
> RpcRequest\{requestId=8454597410163901330, message=[B@19c673d1\}
> 15/04/22 21:00:34 TRACE NettyBlockRpcServer: Received request:
> OpenBlocks\{appId=app-20150422210016-0000, execId=<driver>,
> blockIds=[broadcast_2_piece0]}
> 15/04/22 21:00:34 TRACE NettyBlockRpcServer: Registered streamId
> 1387459488002 with 1 buffers
> 15/04/22 21:00:34 TRACE TransportRequestHandler: Sent result
> RpcResponse\{requestId=8454597410163901330, response=[B@35dbdac2\} to client
> /10.111.7.150:33802
> 15/04/22 21:00:34 TRACE MessageDecoder: Received message ChunkFetchRequest:
> ChunkFetchRequest\{streamChunkId=StreamChunkId\{streamId=1387459488002,
> chunkIndex=0}}
> 15/04/22 21:00:34 TRACE TransportRequestHandler: Received req from
> /10.111.7.150:33802 to fetch block StreamChunkId\{streamId=1387459488002,
> chunkIndex=0\}
> 15/04/22 21:00:34 INFO TransportRequestHandler: Current set size is 3 of
> org.apache.spark.network.server.TransportRequestHandler@29a4f3e7
> 15/04/22 21:00:34 TRACE OneForOneStreamManager: Removing stream id
> 1387459488002
> ......
> 15/04/22 23:59:50 TRACE MessageDecoder: Received message RpcRequest:
> RpcRequest\{requestId=5718124278216696100, message=[B@7ade3ea3\}
> 15/04/22 23:59:50 TRACE NettyBlockRpcServer: Received request:
> OpenBlocks\{appId=app-20150422210016-0000, execId=<driver>,
> blockIds=[broadcast_14679_piece0]}
> 15/04/22 23:59:50 TRACE NettyBlockRpcServer: Registered streamId
> 1387459501252 with 1 buffers
> 15/04/22 23:59:50 TRACE TransportRequestHandler: Sent result
> RpcResponse\{requestId=5718124278216696100, response=[B@40c07a63\} to client
> /10.111.7.150:33802
> 15/04/22 23:59:50 TRACE MessageDecoder: Received message ChunkFetchRequest:
> ChunkFetchRequest\{streamChunkId=StreamChunkId\{streamId=1387459501252,
> chunkIndex=0}}
> 15/04/22 23:59:50 TRACE TransportRequestHandler: Received req from
> /10.111.7.150:33802 to fetch block StreamChunkId\{streamId=1387459501252,
> chunkIndex=0\}
> 15/04/22 23:59:50 INFO TransportRequestHandler: Current set size is 13253 of
> org.apache.spark.network.server.TransportRequestHandler@29a4f3e7
> 15/04/22 23:59:50 TRACE OneForOneStreamManager: Removing stream id
> 1387459501252
> 15/04/22 23:59:50 TRACE TransportRequestHandler: Sent result
> ChunkFetchSuccess\{streamChunkId=StreamChunkId\{streamId=1387459501252,
> chunkIndex=0}, buffer=NioManagedBuffer\{buf=java.nio.HeapByteBuffer[pos=0
> lim=31474 cap=31474]}} to client /10.111.7.150:33802
> 15/04/22 23:59:50 TRACE MessageDecoder: Received message RpcRequest:
> RpcRequest\{requestId=8663805364150028136, message=[B@5974f9b4\}
> 15/04/22 23:59:50 TRACE NettyBlockRpcServer: Received request:
> OpenBlocks\{appId=app-20150422210016-0000, execId=<driver>,
> blockIds=[broadcast_14688_piece0]}
> 15/04/22 23:59:50 TRACE NettyBlockRpcServer: Registered streamId
> 1387459501253 with 1 buffers
> 15/04/22 23:59:50 TRACE TransportRequestHandler: Sent result
> RpcResponse\{requestId=8663805364150028136, response=[B@122023c6\} to client
> /10.111.7.150:33802
> 15/04/22 23:59:50 TRACE MessageDecoder: Received message ChunkFetchRequest:
> ChunkFetchRequest\{streamChunkId=StreamChunkId\{streamId=1387459501253,
> chunkIndex=0}}
> 15/04/22 23:59:50 TRACE TransportRequestHandler: Received req from
> /10.111.7.150:33802 to fetch block StreamChunkId\{streamId=1387459501253,
> chunkIndex=0\}
> 15/04/22 23:59:50 INFO TransportRequestHandler: Current set size is 13254 of
> org.apache.spark.network.server.TransportRequestHandler@29a4f3e7
> 15/04/22 23:59:50 TRACE OneForOneStreamManager: Removing stream id
> 1387459501253
> 15/04/22 23:59:50 TRACE TransportRequestHandler: Sent result
> ChunkFetchSuccess\{streamChunkId=StreamChunkId\{streamId=1387459501253,
> chunkIndex=0}, buffer=NioManagedBuffer\{buf=java.nio.HeapByteBuffer[pos=0
> lim=4047 cap=4047]}} to client /10.111.7.150:33802
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]