Xiangrui Meng created SPARK-3541:
------------------------------------

             Summary: Improve ALS internal storage
                 Key: SPARK-3541
                 URL: https://issues.apache.org/jira/browse/SPARK-3541
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
            Reporter: Xiangrui Meng


The internal storage of ALS uses many small objects, which increases the GC 
pressure and makes ALS difficult to scale to very large scale, e.g., 50 billion 
ratings. In such cases, the full GC may take more than 10 minutes to finish. 
That is longer than the default heartbeat timeout and hence executors will be 
removed under default settings.

We can use primitive arrays to reduce the number of objects significantly. This 
requires big change to the ALS implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to