[jira] [Commented] (SPARK-20426) OneForOneStreamManager occupies too much memory.

jin xing (JIRA) Fri, 21 Apr 2017 02:36:18 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978394#comment-15978394
 ]


jin xing commented on SPARK-20426:
----------------------------------

Currently in the code, shuffle-read process is like below:
1. *TransportClient* sends *OpenBlocks* to shuffle service;
2. Receiving OpenBlocks, shuffle service will register the requested blocks:
{code}
     message match {
      case openBlocks: OpenBlocks =>
        val blocks: Seq[ManagedBuffer] =
          openBlocks.blockIds.map(BlockId.apply).map(blockManager.getBlockData)
        val streamId = streamManager.registerStream(appId, 
blocks.iterator.asJava)
{code}
   Shuffle service will cache the blocks data(i.e. FileSegmentManagedBuffer) as 
*StreamState* in StreamManager.
   Then shuffle service sends the *streamId* back to TransportClient.
3. TransportClient sends *ChunkFetchRequest* (with streamId and chunkIndex 
wrapped inside) to shuffle service to request the data.

Brain storm: The problem here is that too many FileSegmentManagedBuffers are 
registered in StreamManager. To allieviate the memory stress in external 
shuffle service, could TransportClient just sends ChunkFetchRequest with 
blockId wrapped inside and doesn't send the *OpenBlocks*? i.e. we don't 
register the stream any more.

> OneForOneStreamManager occupies too much memory.
> ------------------------------------------------
>
>                 Key: SPARK-20426
>                 URL: https://issues.apache.org/jira/browse/SPARK-20426
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> Spark jobs are running on yarn cluster in my warehouse. We enabled the 
> external shuffle service(*--conf spark.shuffle.service.enabled=true*). 
> Recently NodeManager runs OOM now and then. Dumping heap memory, we find that 
> *OneFroOneStreamManager*'s footprint is huge. NodeManager is configured with 
> 5G heap memory. While *OneForOneManager* costs 2.5G and there are 5503233 
> *FileSegmentManagedBuffer* objects. Is there any suggestions to avoid this 
> other than just keep increasing NodeManager's memory? Is it possible to stop 
> *registerStream* in OneForOneStreamManager? Thus we don't need to cache so 
> many metadatas(i.e. StreamState).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20426) OneForOneStreamManager occupies too much memory.

Reply via email to