[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-27 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-215318737
  
@jkbradley Another bug found:

The `CrossValidator` and `TrainValidationSplit` miss the `seed` when saving 
and loading. I'd prefer to create a JIRA and fix them in this PR since they 
will block the integration test here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-215309644
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57216/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-215309599
  
**[Test build #57216 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57216/consoleFull)**
 for PR 12604 at commit 
[`fa8a05c`](https://github.com/apache/spark/commit/fa8a05c6acd955670af64637caa9c1f38c5e9f09).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-215309643
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-215308604
  
**[Test build #57216 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57216/consoleFull)**
 for PR 12604 at commit 
[`fa8a05c`](https://github.com/apache/spark/commit/fa8a05c6acd955670af64637caa9c1f38c5e9f09).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214952694
  
Thanks, the updates look good.  I'll try to comment on the tuning.py part 
later today or early tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61196124
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -657,32 +656,104 @@ def test_logistic_regression(self):
 except OSError:
 pass
 
+def _compare_params(self, m1, m2, param):
+"""
+Compare 2 ML params, assert they have the same param.
--- End diff --

Document that param must be a Param member of m1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61196122
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -657,32 +656,104 @@ def test_logistic_regression(self):
 except OSError:
 pass
 
+def _compare_params(self, m1, m2, param):
--- End diff --

This can be used in a lot of other places below (everywhere you compare 2 
param values & parents).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214918093
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57037/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214918091
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214917892
  
**[Test build #57037 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57037/consoleFull)**
 for PR 12604 at commit 
[`c348bf0`](https://github.com/apache/spark/commit/c348bf03b99eeb8363fb20f7069595f5d9fd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214899308
  
**[Test build #57037 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57037/consoleFull)**
 for PR 12604 at commit 
[`c348bf0`](https://github.com/apache/spark/commit/c348bf03b99eeb8363fb20f7069595f5d9fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214898290
  
**[Test build #57034 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57034/consoleFull)**
 for PR 12604 at commit 
[`c14b25e`](https://github.com/apache/spark/commit/c14b25e0ae8b0dd66bce777e5104d456dc629f89).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214898301
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57034/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214898299
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214898112
  
**[Test build #57034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57034/consoleFull)**
 for PR 12604 at commit 
[`c14b25e`](https://github.com/apache/spark/commit/c14b25e0ae8b0dd66bce777e5104d456dc629f89).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214897400
  
@jkbradley Here we go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214894928
  
For the tuning.py updates, I want to mess with the code locally to test 
possibilities.  Could you please update + fix merge conflicts? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214890020
  
I'll do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214889349
  
Nice catch about logreg.  It would be great to develop a generic way to 
test the JavaParams wrappers.  Could you please create a JIRA for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214852153
  
@jkbradley Find another bug: For LogisticRegression in PySpark, if you 
write it then reload it, the result is not identical with the previous one. 
Because some params in `ml.LogisticRegression` are defined with default value, 
while those params in `pyspark.ml.LogisticRegression` have no default value. 
With `JavaParams`, we're going to set default values for those params. So it 
will fail the following test:

```python
def _compare_param(self, m1, m2, param):
"""
Compare 2 ML params, assert they have the same param.
"""
# Prevent key not found error in case of some param neither in 
paramMap and
# defaultParamMap.
if m1.isDefined(param):
self.assertEqual(m1.getOrDefault(param), m2.getOrDefault(param))
self.assertEqual(param.parent, m2.getParam(param.name).parent)
else:
# If m1 is not defined param, then m2 should not, too.
self.assertEqual(m2.isDefined(m2.getParam(param.name)), False)
```

How about we setup a new JIRA to check the default values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214834313
  
JIRA created: https://issues.apache.org/jira/browse/SPARK-14924 for 
OneVsRest with classifier in estimatorParamMaps of tuning fail to persistence


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61016865
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -141,6 +144,71 @@ def getEvaluator(self):
 """
 return self.getOrDefault(self.evaluator)
 
+def _transfer_param_map_to_java_impl(self, pyParamMap, java_obj):
+"""
+Transfer a Python ParamMap to a Java ParamMap which belongs to an 
Java estimator of
+ValidatorParams.
+This utility method helps CrossValidator and TrainValidationSplit 
implementing their
+_transfer_param_map_to_java().
+"""
+estimator, epms, evaluator, seed = self._to_java_impl()
--- End diff --

This is getting the Python Params from the class, not from pyParamMap.  It 
should get them from pyParamMap.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214580036
  
I added some comments.  I think the tuning.py changes could be simplified 
somewhat, but I'll need to return to this later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61016852
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -684,6 +689,52 @@ def _compare_pipelines(self, m1, m2):
 self.assertEqual(len(m1.stages), len(m2.stages))
 for s1, s2 in zip(m1.stages, m2.stages):
 self._compare_pipelines(s1, s2)
+elif isinstance(m1, OneVsRestParams):
+# Check the equality of classifiers (value and parent).
+self._compare_pipelines(m1.getClassifier(), m2.getClassifier())
+self.assertEqual(m1.classifier.parent, m2.classifier.parent)
+
+# Check the equality of other params (value and parent).
+for p in m1.params:
+if p.name != "classifier":
+self.assertEqual(m1.getOrDefault(p), 
m2.getOrDefault(p))
+self.assertEqual(p.parent, m2.getParam(p.name).parent)
+
+# Check extra attributes of OneVsRestModel.
+if isinstance(m1, OneVsRestModel):
+self.assertEqual(len(m1.models), len(m2.models))
+for x, y in zip(m1.models, m2.models):
+self._compare_pipelines(x, y)
+elif isinstance(m1, ValidatorParams):
+# Check the equality of estimators (value and parent).
+self._compare_pipelines(m1.getEstimator(), m2.getEstimator())
+self.assertEqual(m1.estimator.parent, m2.estimator.parent)
+
+# Check the equality of evaluators (value and parent).
+self._compare_pipelines(m1.getEvaluator(), m2.getEvaluator())
+self.assertEqual(m1.evaluator.parent, m2.evaluator.parent)
+
+# Check the equality of estimator parameter maps (value and 
parent).
+self.assertEqual(len(m1.getEstimatorParamMaps()), 
len(m2.getEstimatorParamMaps()))
+for epm1, epm2 in zip(m1.getEstimatorParamMaps(), 
m2.getEstimatorParamMaps()):
+self.assertEqual(len(epm1), len(epm2))
+for pair in epm1:
+self.assertIn(pair, epm2)
--- End diff --

Check value.  If value is an instance of ```Params```, then call 
```_compare_pipelines``` recursively on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61016862
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -26,10 +27,10 @@
 from pyspark.ml.util import JavaMLWriter, JavaMLReader, MLReadable, 
MLWritable
 from pyspark.ml.wrapper import JavaParams
 from pyspark.sql.functions import rand
-from pyspark.mllib.common import inherit_doc, _py2java
+from pyspark.mllib.common import inherit_doc, _py2java, _java2py
 
 __all__ = ['ParamGridBuilder', 'CrossValidator', 'CrossValidatorModel', 
'TrainValidationSplit',
-   'TrainValidationSplitModel']
+   'TrainValidationSplitModel', 'ValidatorParams']
--- End diff --

Could you please put these on 4 different lines?
* ParamGridBuilder
* CrossValidator, CrossValidatorModel
* TrainValidationSplit, TrainValidationSplitModel
* ValidatorParams


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61016854
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -746,6 +797,70 @@ def test_nested_pipeline_persistence(self):
 except OSError:
 pass
 
+# TODO: Add OneVsRest as part of nested meta-algorithms.
+def test_nested_meta_algorithms_persistence(self):
+sqlContext = SQLContext(self.sc)
+temp_path = tempfile.mkdtemp()
+try:
+df = sqlContext.createDataFrame([
+Row(label=0.0, features=Vectors.dense(1.0, 0.8)),
+Row(label=0.0, features=Vectors.dense(0.8, 0.8)),
+Row(label=0.0, features=Vectors.dense(1.0, 1.2)),
+Row(label=1.0, features=Vectors.sparse(2, [], [])),
+Row(label=1.0, features=Vectors.sparse(2, [0], [0.1])),
+Row(label=1.0, features=Vectors.sparse(2, [1], [0.1]))])
+
+lr = LogisticRegression()
+
+# Check the estimator of 
CrossValidator(TrainValidationSplit(LogisticRegression))
+tvs_grid = ParamGridBuilder().addGrid(lr.maxIter, [5, 
10]).build()
--- End diff --

Use maxIter = 1,2 to make it faster.  Same elsewhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12604#discussion_r61016847
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -674,8 +676,11 @@ def _compare_pipelines(self, m1, m2):
 if isinstance(m1, JavaParams):
 self.assertEqual(len(m1.params), len(m2.params))
 for p in m1.params:
-self.assertEqual(m1.getOrDefault(p), m2.getOrDefault(p))
-self.assertEqual(p.parent, m2.getParam(p.name).parent)
+# Prevent key not found error in case of some param 
neither in paramMap and
+# defaultParamMap.
+if p in m1._paramMap or p in m1._defaultParamMap:
--- End diff --

Use ```m1.isDefined```.
Also, how about creating a helper method:
```
def _compare_param(m1, m2, paramName)
```
which implements these 3 lines?  That should eliminate some duplicate code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214562765
  
> OneVsRest with classifier in estimatorParamMaps of tuning fail to 
persistence

Nice catch. Could you please create a JIRA and note this issue in the 
Python docstring, with a link to the JIRA?

Your proposal might work, but it munges some ideas in OneVsRest and 
validation together.  I think we can devise a better long-term solution, but 
it's not a critical issue since it's a pretty fancy use case for OneVsRest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-25 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-214559956
  
I'll take a look now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-22 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-213578858
  
@jkbradley There is also a possible enhancement for `OneVsRest`. Currently 
`OneVsRest` requires `classifier` as its parameter, which is not easy-to-use 
especially when using in CrossValidator and TrainValidationSplit.

First, it can not be persisted if the `classifier` param appears in 
`estimatorParamMaps`;
Second, if we want to wrap OneVsRest in CrossValidator, we need to 
construct the estimatorParamMaps like below:
```scala
ovr = OneVsRest()
epms = [{ovr.classifier: LogisticRegression(maxIter=5)}, {ovr.classifier: 
LogisticRegression(maxIter=10)}]
cv = CrossValidator(estimator=ovr, estimatorParamMaps=epms, ...)
``` 

So I think, how about to add a estimatorParamMaps param to OneVsRest just 
like what we do in CrossValidator and TrainValidationSplit, say,

```scala
lr = LogisticRegression()
epms = [{lr.maxIter: 5}, {lr.maxIter: 10}]
ovr = OneVsRest(classifier=lr, estimatorParamMaps=epms)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-22 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-213571053
  
@jkbradley Here are what I found in the integration test. The UIDs of 
non-JavaParams classes look good so far with current tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-213300230
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56670/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-213300229
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-213300068
  
**[Test build #56670 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56670/consoleFull)**
 for PR 12604 at commit 
[`622e564`](https://github.com/apache/spark/commit/622e5647a271e68854d75aafa10972a89585df56).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12604#issuecomment-213295082
  
**[Test build #56670 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56670/consoleFull)**
 for PR 12604 at commit 
[`622e564`](https://github.com/apache/spark/commit/622e5647a271e68854d75aafa10972a89585df56).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org