[ https://issues.apache.org/jira/browse/SPARK-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085794#comment-15085794 ]
Alger Remirata commented on SPARK-5955: --------------------------------------- Hi Xiangrui Meng, First of all, I would like to thank you guys for developing spark and putting it open source that we can use. I'm Alger Remirata, a researcher from the Philippines. I'm new to Spark and Scala, and working in a project involving matrix factorizations in Spark. I have a problem regarding running ALS in Spark. It has a stackoverflow due to long linage chain as per comments on the internet. One of their suggestion is to use the setCheckpointInterval so that for every 10-20 iterations, we can checkpoint the RDDs and it prevents the error. Just want to ask details on how to do checkpointing with ALS. I am using spark-kernel developed by IBM: https://github.com/ibm-et/spark-kernel instead of spark-shell. Here are some of my specific questions regarding details on checkpoint: 1. In setting checkpoint directory through SparkContext.setCheckPointDir(), it needs to be a hadoop compatible directory. Can we use any available hdfs-compatible directory? 2. What do you mean by this comment on the code in ALS checkpointing: If the checkpoint directory is not set in [[org.apache.spark.SparkContext]], * this setting is ignored. 3. Is the use of setCheckPointInterval the only code I needed to add to have checkpointing for ALS work? 4. I am getting this error: Name: java.lang.IllegalArgumentException, Message: Wrong FS: expected file :///. How can I solve this? What is the proper way of using checkpointing. Thanks a lot! > Add checkpointInterval to ALS > ----------------------------- > > Key: SPARK-5955 > URL: https://issues.apache.org/jira/browse/SPARK-5955 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib > Affects Versions: 1.3.0 > Reporter: Xiangrui Meng > Assignee: Xiangrui Meng > Fix For: 1.3.1, 1.4.0 > > > We should add checkpoint interval to ALS to prevent the following: > 1. storing large shuffle files > 2. stack overflow (SPARK-1106) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org