[ https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308797#comment-15308797 ]
Nick Pentreath commented on SPARK-15447: ---------------------------------------- Created a Google sheet with initial results: https://docs.google.com/spreadsheets/d/1iX5LisfXcZSTCHp8VPoo5z-eCO85A5VsZDtZ5e475ks/edit?usp=sharing So far for SPARK-6717 I've just used {{spark-perf}} to compare the RDD-based APIs (as the checkpointing only impacts the RDD-based {{train}} method). From these results no red flags, and 2.0 is actually faster in general relative to 1.6. Checkpointing does add a minor overhead (but this overhead is consistent across the versions and again better in 2.0). There is something a little weird about the 1.6 results for 10m ratings case, but not sure what's going on there - I've rerun a few times with the same result. Also, haven't managed to get to 1b ratings yet due to cluster size, will keep working on it. > Performance test for ALS in Spark 2.0 > ------------------------------------- > > Key: SPARK-15447 > URL: https://issues.apache.org/jira/browse/SPARK-15447 > Project: Spark > Issue Type: Task > Components: ML > Affects Versions: 2.0.0 > Reporter: Xiangrui Meng > Assignee: Nick Pentreath > Priority: Critical > Labels: QA > > We made several changes to ALS in 2.0. It is necessary to run some tests to > avoid performance regression. We should test (synthetic) datasets from 1 > million ratings to 1 billion ratings. > cc [~mlnick] [~holdenk] Do you have time to run some large-scale performance > tests? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org