[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50579245 QA results for PR 929:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17422/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50654846 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/929 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50484656 QA tests have started for PR 929. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17358/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50492147 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50492523 QA tests have started for PR 929. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17359/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50491613 QA results for PR 929:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17358/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/929#discussion_r15533428 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -255,6 +260,9 @@ class ALS private ( rank, lambda, alpha, YtY) previousProducts.unpersist() logInfo(Re-computing U given I (Iteration %d/%d).format(iter, iterations)) +if (sc.checkpointDir.isDefined (iter % 3 == 1)) { + products.checkpoint() --- End diff -- It may be unnecessary to checkpoint both RDDs in a single iteration. We can checkpoint products only, which is usually smaller than users: ~~~ user - product - user - product - user - product - user - product - user - product - user - product check check ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50499597 QA results for PR 929:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17359/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50500525 QA tests have started for PR 929. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17362/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/929#discussion_r15534935 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -255,6 +255,9 @@ class ALS private ( rank, lambda, alpha, YtY) previousProducts.unpersist() logInfo(Re-computing U given I (Iteration %d/%d).format(iter, iterations)) +if (sc.checkpointDir.isDefined (iter % 3 == 1)) { --- End diff -- Sorry I didn't notice this in the first pass. Do we want to checkpoint in the second iteration? How about `(iter 0) (iter % 3 == 0)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50503214 @mengxr Done. Tomorrow, I will test in detail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50506986 QA results for PR 929:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17362/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/929#discussion_r15566577 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -255,6 +255,9 @@ class ALS private ( rank, lambda, alpha, YtY) previousProducts.unpersist() logInfo(Re-computing U given I (Iteration %d/%d).format(iter, iterations)) +if (sc.checkpointDir.isDefined (iter % 3 == 1)) { --- End diff -- `iter` from 1 to `iterations` . The checkpoint RDD is `products-1`, `products-4`,`products-7`,`products-10` ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/929#discussion_r15566629 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -255,6 +255,9 @@ class ALS private ( rank, lambda, alpha, YtY) previousProducts.unpersist() logInfo(Re-computing U given I (Iteration %d/%d).format(iter, iterations)) +if (sc.checkpointDir.isDefined (iter % 3 == 1)) { --- End diff -- Do we need to checkpoint the first RDD? If `iter` starts at `1`, we can use `iter % 3 == 0` and hence the checkpoint RDDs are `product-3`, `product-6`, `product-9`, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/929#discussion_r15566754 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -255,6 +255,9 @@ class ALS private ( rank, lambda, alpha, YtY) previousProducts.unpersist() logInfo(Re-computing U given I (Iteration %d/%d).format(iter, iterations)) +if (sc.checkpointDir.isDefined (iter % 3 == 1)) { --- End diff -- It's a good idea. Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50574626 LGTM. Waiting for Jenkins. Btw, @witgo if you have a big dataset to test, could you try to set the storage level of ratings and user/product in/out links to `MEMORY_AND_DISK_SER` and enable `spark.rdd.compress`. It will save a lot of memory with a little overhead on the speed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50574926 Ok, I will try it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50575931 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50576082 QA tests have started for PR 929. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17422/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50380948 @witgo We don't need to checkpoint both users and products, but only the smaller one. For the initial version, it is fine to checkpoint either of them. We should also do checkpointing in explicit ALS to clean intermediate shuffle data. Could you add that as well and help test? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---