[ 
https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181543#comment-16181543
 ] 

Imran Rashid commented on SPARK-9103:
-------------------------------------

Hi [~jerryshao], sorry it took me a little while to respond.

I think having the info available in the metric system is great, but I see two 
different types of short-comings:

1) MetricsSystem / graphite etc. are great, but its really hard to correlate 
the timeline view you get with what is actually going on in your job.  Did the 
tasks that took the longest also correlate to the tasks that used the most 
memory?  Were there some phases of your application which had pressure on one 
part of memory (eg., execution memory) and another part which had pressure on 
another part (eg., user memory)?  What was the memory usage for tasks that 
failed?  And for tasks that were slow?

It seems *really* hard to answer those questions via graphite, as you have to 
do some mental joins from task to executor to the graphite view in the right 
time frame, and the rollups aren't exactly right, etc.  (Or maybe there is some 
sophisticated way to do this in graphite that I don't know about?)  It also 
just seems like something that should be baked into the spark UI, and not 
require a 3rd party tool ... perhaps its overkill for the spark UI to handle 
this (too big a load on the driver for large apps?), but would like to see if 
something better is possible.

2) Even within the metric system, I'm not sure we are capturing everything we 
need.  The most obvious thing to me is capturing the total process memory -- 
even off-heap memory which isn't part of the jvm managed memory, whether its 
memory managed by spark, or even a 3rd party lib (eg. parquet).  I have a 
feeling there are more things that would be useful to capture, though I haven't 
done a full audit of what is currently exposed in the metric system to be 
honest.

> Tracking spark's memory usage
> -----------------------------
>
>                 Key: SPARK-9103
>                 URL: https://issues.apache.org/jira/browse/SPARK-9103
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Spark Core, Web UI
>            Reporter: Zhang, Liye
>         Attachments: Tracking Spark Memory Usage - Phase 1.pdf
>
>
> Currently spark only provides little memory usage information (RDD cache on 
> webUI) for the executors. User have no idea on what is the memory consumption 
> when they are running spark applications with a lot of memory used in spark 
> executors. Especially when they encounter the OOM, it’s really hard to know 
> what is the cause of the problem. So it would be helpful to give out the 
> detail memory consumption information for each part of spark, so that user 
> can clearly have a picture of where the memory is exactly used. 
> The memory usage info to expose should include but not limited to shuffle, 
> cache, network, serializer, etc.
> User can optionally choose to open this functionality since this is mainly 
> for debugging and tuning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to