[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

Josh Rosen (JIRA) Tue, 30 May 2017 12:10:35 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029962#comment-16029962
 ]


Josh Rosen commented on SPARK-20923:
------------------------------------

It doesn't seem to be used, as far as I can tell from a quick skim. The best 
way to confirm would probably be to start removing it, deleting things which 
depend on this as you go (e.g. the TaskMetrics getter method for accessing the 
current value) and see if you run into anything which looks like a non-test 
use. I'll be happy to review a patch to clean this up.

> TaskMetrics._updatedBlockStatuses uses a lot of memory
> ------------------------------------------------------
>
>                 Key: SPARK-20923
>                 URL: https://issues.apache.org/jira/browse/SPARK-20923
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> The driver appears to use a ton of memory in certain cases to store the task 
> metrics updated block status'.  For instance I had a user reading data form 
> hive and caching it.  The # of tasks to read was around 62,000, they were 
> using 1000 executors and it ended up caching a couple TB's of data.  The 
> driver kept running out of memory. 
> I investigated and it looks like there was 5GB of a 10GB heap being used up 
> by the TaskMetrics._updatedBlockStatuses because there are a lot of blocks.
> The updatedBlockStatuses was already removed from the task end event under 
> SPARK-20084.  I don't see anything else that seems to be using this.  Anybody 
> know if I missed something?
>  If its not being used we should remove it, otherwise we need to figure out a 
> better way of doing it so it doesn't use so much memory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

Reply via email to