[ 
https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908309#comment-14908309
 ] 

Imran Rashid commented on SPARK-9103:
-------------------------------------

Hi [~liyezhang556520], thanks for posting the design doc.  Looks good, just a 
couple of questions.

1) Will the proposed design cover SPARK-9111, getting the memory when the 
executor dies abnormally, (esp when killed by yarn)?  It seems to me the answer 
is "no", which is fine, that can be tackled separately, I just wanted to 
clarify.

2) I see the complexity of having overlapping stages, but I wonder if it could 
be simplified somewhat.  It seems to me you just need to maintain a 
{{executorToLatestMetrics: Map[executor, metrics]}}, and then on every stage 
complete, you just log them all?  Maybe this is what you are already describing 
in the doc, but it seems like there is more state & a bit more logging going 
on.  Eg., I don't fully understand why you need to log both "CHB1" and "HB3" in 
your example.

thanks

> Tracking spark's memory usage
> -----------------------------
>
>                 Key: SPARK-9103
>                 URL: https://issues.apache.org/jira/browse/SPARK-9103
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Spark Core, Web UI
>            Reporter: Zhang, Liye
>         Attachments: Tracking Spark Memory Usage - Phase 1.pdf
>
>
> Currently spark only provides little memory usage information (RDD cache on 
> webUI) for the executors. User have no idea on what is the memory consumption 
> when they are running spark applications with a lot of memory used in spark 
> executors. Especially when they encounter the OOM, it’s really hard to know 
> what is the cause of the problem. So it would be helpful to give out the 
> detail memory consumption information for each part of spark, so that user 
> can clearly have a picture of where the memory is exactly used. 
> The memory usage info to expose should include but not limited to shuffle, 
> cache, network, serializer, etc.
> User can optionally choose to open this functionality since this is mainly 
> for debugging and tuning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to