[jira] [Commented] (SPARK-15904) High Memory Pressure using MLlib K-means

Sean Owen (JIRA) Thu, 16 Jun 2016 06:00:23 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333719#comment-15333719
 ]


Sean Owen commented on SPARK-15904:
-----------------------------------

[~Purple] I don't understand why you reopened this. Please don't unless 
information has meaningfully changed the discussion. What you present here 
clearly shows you have memory settings that aren't consistent with your 
physical memory or problem size. Unsurprisingly it doesn't work. The links you 
added have nothing to do with this problem. I am reclosing this because we did 
explain and resolve your problem.  

> High Memory Pressure using MLlib K-means
> ----------------------------------------
>
>                 Key: SPARK-15904
>                 URL: https://issues.apache.org/jira/browse/SPARK-15904
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.6.1
>         Environment: Mac OS X 10.11.6beta on Macbook Pro 13" mid-2012. 16GB 
> of RAM.
>            Reporter: Alessio
>            Priority: Minor
>
> *Please Note*: even though the issue has been marked as "not a problem" and 
> "resolved", this is actually a problem and wasn't resolved at all. Though I 
> reopened it. Several people encountered memory issues using MLlib for large 
> and complex problems (see 
> http://stackoverflow.com/questions/32621267/spark-1-4-0-hangs-running-randomforest
>  and 
> http://stackoverflow.com/questions/27367804/how-do-i-get-spark-submit-to-close)
> Running MLlib K-Means on a ~400MB dataset (12 partitions), persisted on 
> Memory and Disk.
> Everything's fine, although at the end of K-Means, after the number of 
> iterations, the cost function value and the running time there's a nice 
> "Removing RDD <idx> from persistent list" stage. However, during this stage 
> there's a high memory pressure. Weird, since RDDs are about to be removed. 
> Full log of this stage:
> 16/06/12 20:37:33 INFO clustering.KMeans: Run 0 finished in 14 iterations
> 16/06/12 20:37:33 INFO clustering.KMeans: Iterations took 694.544 seconds.
> 16/06/12 20:37:33 INFO clustering.KMeans: KMeans converged in 14 iterations.
> 16/06/12 20:37:33 INFO clustering.KMeans: The cost for the best run is 
> 49784.87126751288.
> 16/06/12 20:37:33 INFO rdd.MapPartitionsRDD: Removing RDD 781 from 
> persistence list
> 16/06/12 20:37:33 INFO storage.BlockManager: Removing RDD 781
> 16/06/12 20:37:33 INFO rdd.MapPartitionsRDD: Removing RDD 780 from 
> persistence list
> 16/06/12 20:37:33 INFO storage.BlockManager: Removing RDD 780
> I'm running this K-Means on a 16GB machine, with Spark Context as local[*]. 
> My machine has an i5 hyperthreaded dual-core, thus [*] means 4.
> I'm launching this application though spark-submit with --driver-memory 9G.
> _Further tests:_ the problem appears also without persisting/caching on 
> memory (i.e. persist on disk only or no caching/persisting at all)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15904) High Memory Pressure using MLlib K-means

Reply via email to