[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18982 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/18982#discussion_r176828963 --- Diff: python/pyspark/ml/wrapper.py --- @@ -118,11 +118,18 @@ def _transfer_params_to_java(self): """ Transforms the embedded params to the companion Java object. """ -paramMap = self.extractParamMap() +pair_defaults = [] for param in self.params: -if param in paramMap: -pair = self._make_java_param_pair(param, paramMap[param]) +if self.isSet(param): +pair = self._make_java_param_pair(param, self._paramMap[param]) self._java_obj.set(pair) +if self.hasDefault(param): +pair = self._make_java_param_pair(param, self._defaultParamMap[param]) +pair_defaults.append(pair) +if len(pair_defaults) > 0: +sc = SparkContext._active_spark_context +pair_defaults_seq = sc._jvm.PythonUtils.toSeq(pair_defaults) +self._java_obj.setDefault(pair_defaults_seq) --- End diff -- I think this is reasonable, a few extra lines to avoid potential unwanted user surprise is worth it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18982#discussion_r176825209 --- Diff: python/pyspark/ml/wrapper.py --- @@ -118,11 +118,18 @@ def _transfer_params_to_java(self): """ Transforms the embedded params to the companion Java object. """ -paramMap = self.extractParamMap() +pair_defaults = [] for param in self.params: -if param in paramMap: -pair = self._make_java_param_pair(param, paramMap[param]) +if self.isSet(param): +pair = self._make_java_param_pair(param, self._paramMap[param]) self._java_obj.set(pair) +if self.hasDefault(param): +pair = self._make_java_param_pair(param, self._defaultParamMap[param]) +pair_defaults.append(pair) +if len(pair_defaults) > 0: +sc = SparkContext._active_spark_context +pair_defaults_seq = sc._jvm.PythonUtils.toSeq(pair_defaults) +self._java_obj.setDefault(pair_defaults_seq) --- End diff -- My take is that while they should be the same, it's still possible they might not be. The user could extend their own classes or it's quite easy to change in Python. Although we don't really support this, if there was a mismatch the user would probably just get bad results and it would be really hard to figure out why. From the Python API, it would look like it was one value but actually using another in Scala. If you all think it's overly cautious to do this, I can take it out. I just thought it would be cheap insurance to just set these values regardless. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18982#discussion_r176316173 --- Diff: python/pyspark/ml/wrapper.py --- @@ -118,11 +118,18 @@ def _transfer_params_to_java(self): """ Transforms the embedded params to the companion Java object. """ -paramMap = self.extractParamMap() +pair_defaults = [] for param in self.params: -if param in paramMap: -pair = self._make_java_param_pair(param, paramMap[param]) +if self.isSet(param): +pair = self._make_java_param_pair(param, self._paramMap[param]) self._java_obj.set(pair) +if self.hasDefault(param): +pair = self._make_java_param_pair(param, self._defaultParamMap[param]) +pair_defaults.append(pair) +if len(pair_defaults) > 0: +sc = SparkContext._active_spark_context +pair_defaults_seq = sc._jvm.PythonUtils.toSeq(pair_defaults) +self._java_obj.setDefault(pair_defaults_seq) --- End diff -- If java side and python side the default params are the same, do we still need to set default params for the java object? Are't they already be set if they are default params? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/18982#discussion_r137360294 --- Diff: python/pyspark/ml/wrapper.py --- @@ -118,11 +118,13 @@ def _transfer_params_to_java(self): """ Transforms the embedded params to the companion Java object. """ -paramMap = self.extractParamMap() for param in self.params: -if param in paramMap: -pair = self._make_java_param_pair(param, paramMap[param]) +if param in self._paramMap: +pair = self._make_java_param_pair(param, self._paramMap[param]) self._java_obj.set(pair) +if param in self._defaultParamMap: --- End diff -- Sounds reasonable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/18982#discussion_r137360175 --- Diff: python/pyspark/ml/tests.py --- @@ -455,6 +455,14 @@ def test_logistic_regression_check_thresholds(self): LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5] ) +def test_preserve_set_state(self): +model = Binarizer() +self.assertFalse(model.isSet("threshold")) +model._transfer_params_to_java() --- End diff -- I think that would be a reasonable thing to do, the slight increase in testing overhead is probably worth it, it keeps us from being too closely tied to the implementation details and we already use `SparkSessionTestCase` in a lot of places. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18982#discussion_r134380704 --- Diff: python/pyspark/ml/wrapper.py --- @@ -118,11 +118,13 @@ def _transfer_params_to_java(self): """ Transforms the embedded params to the companion Java object. """ -paramMap = self.extractParamMap() for param in self.params: -if param in paramMap: -pair = self._make_java_param_pair(param, paramMap[param]) +if param in self._paramMap: +pair = self._make_java_param_pair(param, self._paramMap[param]) self._java_obj.set(pair) +if param in self._defaultParamMap: --- End diff -- We usually make the assumption that Python defines the same default values as Java, in Spark ML at least, but given the circumstances of the JIRA - they defined their own Model - then it's still possible for `hasDefault` or the default value to return something different that Python would. So I'm just being overly cautious here, but it's pretty cheap to just transfer the default values anyway right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18982#discussion_r134380293 --- Diff: python/pyspark/ml/tests.py --- @@ -455,6 +455,14 @@ def test_logistic_regression_check_thresholds(self): LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5] ) +def test_preserve_set_state(self): +model = Binarizer() +self.assertFalse(model.isSet("threshold")) +model._transfer_params_to_java() --- End diff -- yeah, it would be a little better to call the actual `transform`, but we would still need to call `_transfer_params_from_java` or check `isSet` with a direct call to Java via py4j. I was going to do this, but the `ParamTest` class doesn't already create a `SparkSession` - I'm sure it's just a small amount of overhead but that's why I thought to just use `_transfer_params_to_java`. Do you think it would be worth it to change `ParamTests` to inherit from `SparkSessionTestCase` so a session is created and I could make a `DataFrame` to transform? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/18982#discussion_r134075412 --- Diff: python/pyspark/ml/wrapper.py --- @@ -118,11 +118,13 @@ def _transfer_params_to_java(self): """ Transforms the embedded params to the companion Java object. """ -paramMap = self.extractParamMap() for param in self.params: -if param in paramMap: -pair = self._make_java_param_pair(param, paramMap[param]) +if param in self._paramMap: +pair = self._make_java_param_pair(param, self._paramMap[param]) self._java_obj.set(pair) +if param in self._defaultParamMap: --- End diff -- Should this be an else if? No need to transfer the default value if we've explicitly set it to another value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/18982#discussion_r134075445 --- Diff: python/pyspark/ml/tests.py --- @@ -455,6 +455,14 @@ def test_logistic_regression_check_thresholds(self): LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5] ) +def test_preserve_set_state(self): +model = Binarizer() +self.assertFalse(model.isSet("threshold")) +model._transfer_params_to_java() --- End diff -- Would it make sense to do an actual transform here instead of the two inner parts of the transform? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/18982 [SPARK-21685][PYTHON][ML] PySpark Params isSet state should not change after transform ## What changes were proposed in this pull request? Currently when a PySpark Model is transformed, default params that have not been explicitly set are then set on the Java side on the call to `wrapper._transfer_values_to_java`. This incorrectly changes the state of the Param as it should still be marked as a default value only. ## How was this patch tested? Added a new test to verify that when transferring Params to Java, default params have their state preserved. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark pyspark-ml-param-to-java-defaults-SPARK-21685 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18982.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18982 commit bbe3ef70d2acb3647171621b02b4f73c001d5caf Author: Bryan Cutler Date: 2017-08-17T23:59:43Z added regression test for preserving Param set state when transferring to Java commit 12030569df3aa92a227d6e7d06ceeb5121d853f9 Author: Bryan Cutler Date: 2017-08-17T23:59:54Z changed _transfer_params_to_java to not set param in Java unless explicitly set --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org