[ https://issues.apache.org/jira/browse/SPARK-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng reassigned SPARK-3541: ------------------------------------ Assignee: Xiangrui Meng > Improve ALS internal storage > ---------------------------- > > Key: SPARK-3541 > URL: https://issues.apache.org/jira/browse/SPARK-3541 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Reporter: Xiangrui Meng > Assignee: Xiangrui Meng > Original Estimate: 96h > Remaining Estimate: 96h > > The internal storage of ALS uses many small objects, which increases the GC > pressure and makes ALS difficult to scale to very large scale, e.g., 50 > billion ratings. In such cases, the full GC may take more than 10 minutes to > finish. That is longer than the default heartbeat timeout and hence executors > will be removed under default settings. > We can use primitive arrays to reduce the number of objects significantly. > This requires big change to the ALS implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org