Xiangrui Meng created SPARK-3541: ------------------------------------ Summary: Improve ALS internal storage Key: SPARK-3541 URL: https://issues.apache.org/jira/browse/SPARK-3541 Project: Spark Issue Type: Improvement Components: ML, MLlib Reporter: Xiangrui Meng
The internal storage of ALS uses many small objects, which increases the GC pressure and makes ALS difficult to scale to very large scale, e.g., 50 billion ratings. In such cases, the full GC may take more than 10 minutes to finish. That is longer than the default heartbeat timeout and hence executors will be removed under default settings. We can use primitive arrays to reduce the number of objects significantly. This requires big change to the ALS implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org