[ https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908309#comment-14908309 ]
Imran Rashid commented on SPARK-9103: ------------------------------------- Hi [~liyezhang556520], thanks for posting the design doc. Looks good, just a couple of questions. 1) Will the proposed design cover SPARK-9111, getting the memory when the executor dies abnormally, (esp when killed by yarn)? It seems to me the answer is "no", which is fine, that can be tackled separately, I just wanted to clarify. 2) I see the complexity of having overlapping stages, but I wonder if it could be simplified somewhat. It seems to me you just need to maintain a {{executorToLatestMetrics: Map[executor, metrics]}}, and then on every stage complete, you just log them all? Maybe this is what you are already describing in the doc, but it seems like there is more state & a bit more logging going on. Eg., I don't fully understand why you need to log both "CHB1" and "HB3" in your example. thanks > Tracking spark's memory usage > ----------------------------- > > Key: SPARK-9103 > URL: https://issues.apache.org/jira/browse/SPARK-9103 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI > Reporter: Zhang, Liye > Attachments: Tracking Spark Memory Usage - Phase 1.pdf > > > Currently spark only provides little memory usage information (RDD cache on > webUI) for the executors. User have no idea on what is the memory consumption > when they are running spark applications with a lot of memory used in spark > executors. Especially when they encounter the OOM, it’s really hard to know > what is the cause of the problem. So it would be helpful to give out the > detail memory consumption information for each part of spark, so that user > can clearly have a picture of where the memory is exactly used. > The memory usage info to expose should include but not limited to shuffle, > cache, network, serializer, etc. > User can optionally choose to open this functionality since this is mainly > for debugging and tuning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org