I have some suggestions you may try 1) input RDD ,use the persist method ,this may much save running time 2) from the UI,you can see cluster spend much time in shuffle stage , this can adjust through some conf parameters ,such as" spark.shuffle.memoryFraction" "spark.memory.fraction"
good luck -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Execution-error-during-ALS-execution-in-spark-tp26644p26652.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org