[ 
https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029917#comment-16029917
 ] 

Thomas Graves commented on SPARK-20923:
---------------------------------------

[~joshrosen] [~zsxwing] [~eseyfe] I think you have looked at this fairly 
recently, do you know if this is used by anything or anybody? I'm not finding 
it used anywhere in the code or UI but maybe I'm missing some obscure reference

> TaskMetrics._updatedBlockStatuses uses a lot of memory
> ------------------------------------------------------
>
>                 Key: SPARK-20923
>                 URL: https://issues.apache.org/jira/browse/SPARK-20923
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> The driver appears to use a ton of memory in certain cases to store the task 
> metrics updated block status'.  For instance I had a user reading data form 
> hive and caching it.  The # of tasks to read was around 62,000, they were 
> using 1000 executors and it ended up caching a couple TB's of data.  The 
> driver kept running out of memory. 
> I investigated and it looks like there was 5GB of a 10GB heap being used up 
> by the TaskMetrics._updatedBlockStatuses because there are a lot of blocks.
> The updatedBlockStatuses was already removed from the task end event under 
> SPARK-20084.  I don't see anything else that seems to be using this.  Anybody 
> know if I missed something?
>  If its not being used we should remove it, otherwise we need to figure out a 
> better way of doing it so it doesn't use so much memory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to