[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-25 Thread staple
Github user staple closed the pull request at:

https://github.com/apache/spark/pull/2412


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-25 Thread staple
Github user staple commented on the pull request:

https://github.com/apache/spark/pull/2412#issuecomment-56887646
  
@davies, sure will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-25 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2412#issuecomment-56886980
  
@staple thanks, I'd like to keep it as before for ALS, could you close this 
PR (maybe also the issue)? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-25 Thread staple
Github user staple commented on the pull request:

https://github.com/apache/spark/pull/2412#issuecomment-56865408
  
@davies It looks like in your #2378 you already disabled caching for 
NaiveBayes and DecisionTree. The only difference from this patch is that I 
disabled caching for ALS as well.

We discussed this a bit here: 
https://github.com/apache/spark/pull/2378#discussion_r17686208. I filed this 
ticket as a follow up of the work on uncached input warnings 
(https://github.com/apache/spark/pull/2347). The warnings are only supposed to 
be printed if the input data is accessed repeatedly on many iterations during 
learning. That's not the case with ALS, so a warning shouldn't be printed 
there. But I can see there's a case for caching because the input data is 
accessed twice when constructing an intermediate representation of the data. I 
don't have a strong preference on whether we should or should not cache in 
python for the ALS learner.

If you are fine with continuing to cache in python for ALS, then there's no 
more work to be done for this ticket, SPARK-3550.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-23 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2412#issuecomment-56610305
  
@staple could you rebase this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-17 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2412#issuecomment-55929756
  
@staple I also addressed this in #2378 , could you help to review this part?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2412#issuecomment-55788543
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20396/consoleFull)
 for   PR 2412 at commit 
[`c8ff120`](https://github.com/apache/spark/commit/c8ff120945da4c1aa2d0c9ba81fbed79de6cab66).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class NonASCIICharacterChecker extends ScalariformChecker `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2412#issuecomment-55776764
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20396/consoleFull)
 for   PR 2412 at commit 
[`c8ff120`](https://github.com/apache/spark/commit/c8ff120945da4c1aa2d0c9ba81fbed79de6cab66).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-16 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2412#issuecomment-55776094
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-16 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2412#issuecomment-55776119
  
this is ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2412#issuecomment-55762318
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

2014-09-16 Thread staple
GitHub user staple opened a pull request:

https://github.com/apache/spark/pull/2412

[SPARK-3550][MLLIB] Disable automatic rdd caching for relevant learners.

The NaiveBayes, ALS, and DecisionTree learners do not require external 
caching to prevent repeated RDD re-evaluation during learning iterations. 
NaiveBayes only evaluates its input RDD once, while ALS and DecisionTree 
internally persist transformations of their input RDDs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/staple/spark SPARK-3550

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2412


commit c8ff120945da4c1aa2d0c9ba81fbed79de6cab66
Author: Aaron Staple 
Date:   2014-09-15T20:22:27Z

[SPARK-3550][MLLIB] Disable automatic rdd caching for relevant learners.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org