[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-215318737 @jkbradley Another bug found: The `CrossValidator` and `TrainValidationSplit` miss the `seed` when saving and loading. I'd prefer to create a JIRA and fix them in this PR since they will block the integration test here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-215309644 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57216/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-215309599 **[Test build #57216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57216/consoleFull)** for PR 12604 at commit [`fa8a05c`](https://github.com/apache/spark/commit/fa8a05c6acd955670af64637caa9c1f38c5e9f09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-215309643 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-215308604 **[Test build #57216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57216/consoleFull)** for PR 12604 at commit [`fa8a05c`](https://github.com/apache/spark/commit/fa8a05c6acd955670af64637caa9c1f38c5e9f09). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214952694 Thanks, the updates look good. I'll try to comment on the tuning.py part later today or early tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12604#discussion_r61196124 --- Diff: python/pyspark/ml/tests.py --- @@ -657,32 +656,104 @@ def test_logistic_regression(self): except OSError: pass +def _compare_params(self, m1, m2, param): +""" +Compare 2 ML params, assert they have the same param. --- End diff -- Document that param must be a Param member of m1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12604#discussion_r61196122 --- Diff: python/pyspark/ml/tests.py --- @@ -657,32 +656,104 @@ def test_logistic_regression(self): except OSError: pass +def _compare_params(self, m1, m2, param): --- End diff -- This can be used in a lot of other places below (everywhere you compare 2 param values & parents). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214918093 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57037/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214918091 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214917892 **[Test build #57037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57037/consoleFull)** for PR 12604 at commit [`c348bf0`](https://github.com/apache/spark/commit/c348bf03b99eeb8363fb20f7069595f5d9fd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214899308 **[Test build #57037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57037/consoleFull)** for PR 12604 at commit [`c348bf0`](https://github.com/apache/spark/commit/c348bf03b99eeb8363fb20f7069595f5d9fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214898290 **[Test build #57034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57034/consoleFull)** for PR 12604 at commit [`c14b25e`](https://github.com/apache/spark/commit/c14b25e0ae8b0dd66bce777e5104d456dc629f89). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214898301 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57034/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214898299 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214898112 **[Test build #57034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57034/consoleFull)** for PR 12604 at commit [`c14b25e`](https://github.com/apache/spark/commit/c14b25e0ae8b0dd66bce777e5104d456dc629f89). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214897400 @jkbradley Here we go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214894928 For the tuning.py updates, I want to mess with the code locally to test possibilities. Could you please update + fix merge conflicts? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214890020 I'll do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214889349 Nice catch about logreg. It would be great to develop a generic way to test the JavaParams wrappers. Could you please create a JIRA for that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214852153 @jkbradley Find another bug: For LogisticRegression in PySpark, if you write it then reload it, the result is not identical with the previous one. Because some params in `ml.LogisticRegression` are defined with default value, while those params in `pyspark.ml.LogisticRegression` have no default value. With `JavaParams`, we're going to set default values for those params. So it will fail the following test: ```python def _compare_param(self, m1, m2, param): """ Compare 2 ML params, assert they have the same param. """ # Prevent key not found error in case of some param neither in paramMap and # defaultParamMap. if m1.isDefined(param): self.assertEqual(m1.getOrDefault(param), m2.getOrDefault(param)) self.assertEqual(param.parent, m2.getParam(param.name).parent) else: # If m1 is not defined param, then m2 should not, too. self.assertEqual(m2.isDefined(m2.getParam(param.name)), False) ``` How about we setup a new JIRA to check the default values? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214834313 JIRA created: https://issues.apache.org/jira/browse/SPARK-14924 for OneVsRest with classifier in estimatorParamMaps of tuning fail to persistence --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12604#discussion_r61016865 --- Diff: python/pyspark/ml/tuning.py --- @@ -141,6 +144,71 @@ def getEvaluator(self): """ return self.getOrDefault(self.evaluator) +def _transfer_param_map_to_java_impl(self, pyParamMap, java_obj): +""" +Transfer a Python ParamMap to a Java ParamMap which belongs to an Java estimator of +ValidatorParams. +This utility method helps CrossValidator and TrainValidationSplit implementing their +_transfer_param_map_to_java(). +""" +estimator, epms, evaluator, seed = self._to_java_impl() --- End diff -- This is getting the Python Params from the class, not from pyParamMap. It should get them from pyParamMap. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214580036 I added some comments. I think the tuning.py changes could be simplified somewhat, but I'll need to return to this later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12604#discussion_r61016852 --- Diff: python/pyspark/ml/tests.py --- @@ -684,6 +689,52 @@ def _compare_pipelines(self, m1, m2): self.assertEqual(len(m1.stages), len(m2.stages)) for s1, s2 in zip(m1.stages, m2.stages): self._compare_pipelines(s1, s2) +elif isinstance(m1, OneVsRestParams): +# Check the equality of classifiers (value and parent). +self._compare_pipelines(m1.getClassifier(), m2.getClassifier()) +self.assertEqual(m1.classifier.parent, m2.classifier.parent) + +# Check the equality of other params (value and parent). +for p in m1.params: +if p.name != "classifier": +self.assertEqual(m1.getOrDefault(p), m2.getOrDefault(p)) +self.assertEqual(p.parent, m2.getParam(p.name).parent) + +# Check extra attributes of OneVsRestModel. +if isinstance(m1, OneVsRestModel): +self.assertEqual(len(m1.models), len(m2.models)) +for x, y in zip(m1.models, m2.models): +self._compare_pipelines(x, y) +elif isinstance(m1, ValidatorParams): +# Check the equality of estimators (value and parent). +self._compare_pipelines(m1.getEstimator(), m2.getEstimator()) +self.assertEqual(m1.estimator.parent, m2.estimator.parent) + +# Check the equality of evaluators (value and parent). +self._compare_pipelines(m1.getEvaluator(), m2.getEvaluator()) +self.assertEqual(m1.evaluator.parent, m2.evaluator.parent) + +# Check the equality of estimator parameter maps (value and parent). +self.assertEqual(len(m1.getEstimatorParamMaps()), len(m2.getEstimatorParamMaps())) +for epm1, epm2 in zip(m1.getEstimatorParamMaps(), m2.getEstimatorParamMaps()): +self.assertEqual(len(epm1), len(epm2)) +for pair in epm1: +self.assertIn(pair, epm2) --- End diff -- Check value. If value is an instance of ```Params```, then call ```_compare_pipelines``` recursively on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12604#discussion_r61016862 --- Diff: python/pyspark/ml/tuning.py --- @@ -26,10 +27,10 @@ from pyspark.ml.util import JavaMLWriter, JavaMLReader, MLReadable, MLWritable from pyspark.ml.wrapper import JavaParams from pyspark.sql.functions import rand -from pyspark.mllib.common import inherit_doc, _py2java +from pyspark.mllib.common import inherit_doc, _py2java, _java2py __all__ = ['ParamGridBuilder', 'CrossValidator', 'CrossValidatorModel', 'TrainValidationSplit', - 'TrainValidationSplitModel'] + 'TrainValidationSplitModel', 'ValidatorParams'] --- End diff -- Could you please put these on 4 different lines? * ParamGridBuilder * CrossValidator, CrossValidatorModel * TrainValidationSplit, TrainValidationSplitModel * ValidatorParams --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12604#discussion_r61016854 --- Diff: python/pyspark/ml/tests.py --- @@ -746,6 +797,70 @@ def test_nested_pipeline_persistence(self): except OSError: pass +# TODO: Add OneVsRest as part of nested meta-algorithms. +def test_nested_meta_algorithms_persistence(self): +sqlContext = SQLContext(self.sc) +temp_path = tempfile.mkdtemp() +try: +df = sqlContext.createDataFrame([ +Row(label=0.0, features=Vectors.dense(1.0, 0.8)), +Row(label=0.0, features=Vectors.dense(0.8, 0.8)), +Row(label=0.0, features=Vectors.dense(1.0, 1.2)), +Row(label=1.0, features=Vectors.sparse(2, [], [])), +Row(label=1.0, features=Vectors.sparse(2, [0], [0.1])), +Row(label=1.0, features=Vectors.sparse(2, [1], [0.1]))]) + +lr = LogisticRegression() + +# Check the estimator of CrossValidator(TrainValidationSplit(LogisticRegression)) +tvs_grid = ParamGridBuilder().addGrid(lr.maxIter, [5, 10]).build() --- End diff -- Use maxIter = 1,2 to make it faster. Same elsewhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12604#discussion_r61016847 --- Diff: python/pyspark/ml/tests.py --- @@ -674,8 +676,11 @@ def _compare_pipelines(self, m1, m2): if isinstance(m1, JavaParams): self.assertEqual(len(m1.params), len(m2.params)) for p in m1.params: -self.assertEqual(m1.getOrDefault(p), m2.getOrDefault(p)) -self.assertEqual(p.parent, m2.getParam(p.name).parent) +# Prevent key not found error in case of some param neither in paramMap and +# defaultParamMap. +if p in m1._paramMap or p in m1._defaultParamMap: --- End diff -- Use ```m1.isDefined```. Also, how about creating a helper method: ``` def _compare_param(m1, m2, paramName) ``` which implements these 3 lines? That should eliminate some duplicate code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214562765 > OneVsRest with classifier in estimatorParamMaps of tuning fail to persistence Nice catch. Could you please create a JIRA and note this issue in the Python docstring, with a link to the JIRA? Your proposal might work, but it munges some ideas in OneVsRest and validation together. I think we can devise a better long-term solution, but it's not a critical issue since it's a pretty fancy use case for OneVsRest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214559956 I'll take a look now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-213578858 @jkbradley There is also a possible enhancement for `OneVsRest`. Currently `OneVsRest` requires `classifier` as its parameter, which is not easy-to-use especially when using in CrossValidator and TrainValidationSplit. First, it can not be persisted if the `classifier` param appears in `estimatorParamMaps`; Second, if we want to wrap OneVsRest in CrossValidator, we need to construct the estimatorParamMaps like below: ```scala ovr = OneVsRest() epms = [{ovr.classifier: LogisticRegression(maxIter=5)}, {ovr.classifier: LogisticRegression(maxIter=10)}] cv = CrossValidator(estimator=ovr, estimatorParamMaps=epms, ...) ``` So I think, how about to add a estimatorParamMaps param to OneVsRest just like what we do in CrossValidator and TrainValidationSplit, say, ```scala lr = LogisticRegression() epms = [{lr.maxIter: 5}, {lr.maxIter: 10}] ovr = OneVsRest(classifier=lr, estimatorParamMaps=epms) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-213571053 @jkbradley Here are what I found in the integration test. The UIDs of non-JavaParams classes look good so far with current tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-213300230 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56670/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-213300229 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-213300068 **[Test build #56670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56670/consoleFull)** for PR 12604 at commit [`622e564`](https://github.com/apache/spark/commit/622e5647a271e68854d75aafa10972a89585df56). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-213295082 **[Test build #56670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56670/consoleFull)** for PR 12604 at commit [`622e564`](https://github.com/apache/spark/commit/622e5647a271e68854d75aafa10972a89585df56). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org