[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50579245
  
QA results for PR 929:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17422/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50654846
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/929


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50484656
  
QA tests have started for PR 929. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17358/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50492147
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50492523
  
QA tests have started for PR 929. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17359/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50491613
  
QA results for PR 929:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17358/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/929#discussion_r15533428
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -255,6 +260,9 @@ class ALS private (
   rank, lambda, alpha, YtY)
 previousProducts.unpersist()
 logInfo(Re-computing U given I (Iteration %d/%d).format(iter, 
iterations))
+if (sc.checkpointDir.isDefined  (iter % 3 == 1)) {
+  products.checkpoint()
--- End diff --

It may be unnecessary to checkpoint both RDDs in a single iteration. We can 
checkpoint products only, which is usually smaller than users:

~~~
user - product - user - product - user - product - user - product - 
user - product - user - product
   
check   
 check
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50499597
  
QA results for PR 929:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17359/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50500525
  
QA tests have started for PR 929. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17362/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/929#discussion_r15534935
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -255,6 +255,9 @@ class ALS private (
   rank, lambda, alpha, YtY)
 previousProducts.unpersist()
 logInfo(Re-computing U given I (Iteration %d/%d).format(iter, 
iterations))
+if (sc.checkpointDir.isDefined  (iter % 3 == 1)) {
--- End diff --

Sorry I didn't notice this in the first pass. Do we want to checkpoint in 
the second iteration? How about `(iter  0)  (iter % 3 == 0)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50503214
  
@mengxr   Done. Tomorrow, I will test in detail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50506986
  
QA results for PR 929:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17362/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/929#discussion_r15566577
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -255,6 +255,9 @@ class ALS private (
   rank, lambda, alpha, YtY)
 previousProducts.unpersist()
 logInfo(Re-computing U given I (Iteration %d/%d).format(iter, 
iterations))
+if (sc.checkpointDir.isDefined  (iter % 3 == 1)) {
--- End diff --

`iter` from 1 to `iterations` . The checkpoint RDD is `products-1`, 
`products-4`,`products-7`,`products-10`  ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/929#discussion_r15566629
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -255,6 +255,9 @@ class ALS private (
   rank, lambda, alpha, YtY)
 previousProducts.unpersist()
 logInfo(Re-computing U given I (Iteration %d/%d).format(iter, 
iterations))
+if (sc.checkpointDir.isDefined  (iter % 3 == 1)) {
--- End diff --

Do we need to checkpoint the first RDD? If `iter` starts at `1`, we can use 
`iter % 3 == 0` and hence the checkpoint RDDs are `product-3`, `product-6`, 
`product-9`, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/929#discussion_r15566754
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -255,6 +255,9 @@ class ALS private (
   rank, lambda, alpha, YtY)
 previousProducts.unpersist()
 logInfo(Re-computing U given I (Iteration %d/%d).format(iter, 
iterations))
+if (sc.checkpointDir.isDefined  (iter % 3 == 1)) {
--- End diff --

It's a good idea. Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50574626
  
LGTM. Waiting for Jenkins. Btw, @witgo if you have a big dataset to test, 
could you try to set the storage level of ratings and user/product in/out links 
to `MEMORY_AND_DISK_SER` and enable `spark.rdd.compress`. It will save a lot of 
memory with a little overhead on the speed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50574926
  
Ok, I will try it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50575931
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50576082
  
QA tests have started for PR 929. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17422/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/929#issuecomment-50380948
  
@witgo We don't need to checkpoint both users and products, but only the 
smaller one. For the initial version, it is fine to checkpoint either of them. 
We should also do checkpointing in explicit ALS to clean intermediate shuffle 
data. Could you add that as well and help test? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---