[ https://issues.apache.org/jira/browse/SPARK-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978409#comment-15978409 ]
jin xing commented on SPARK-20426: ---------------------------------- [~jerryshao] {quote} Brain storm: The problem here is that too many FileSegmentManagedBuffers are registered in StreamManager. To allieviate the memory stress in external shuffle service, could TransportClient just sends ChunkFetchRequest with blockId wrapped inside and doesn't send the OpenBlocks? i.e. we don't register the stream any more. {quote} How do you think about this. If this sounds plausible, I can post a design doc :) > OneForOneStreamManager occupies too much memory. > ------------------------------------------------ > > Key: SPARK-20426 > URL: https://issues.apache.org/jira/browse/SPARK-20426 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 2.1.0 > Reporter: jin xing > Attachments: screenshot-1.png, screenshot-2.png > > > Spark jobs are running on yarn cluster in my warehouse. We enabled the > external shuffle service(*--conf spark.shuffle.service.enabled=true*). > Recently NodeManager runs OOM now and then. Dumping heap memory, we find that > *OneFroOneStreamManager*'s footprint is huge. NodeManager is configured with > 5G heap memory. While *OneForOneManager* costs 2.5G and there are 5503233 > *FileSegmentManagedBuffer* objects. Is there any suggestions to avoid this > other than just keep increasing NodeManager's memory? Is it possible to stop > *registerStream* in OneForOneStreamManager? Thus we don't need to cache so > many metadatas(i.e. StreamState). -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org