This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 8556710 [SPARK-28985][PYTHON][ML][FOLLOW-UP] Add _IsotonicRegressionBase 8556710 is described below commit 8556710409d9f2fbaee9dbf76a2ea70218316693 Author: zero323 <mszymkiew...@gmail.com> AuthorDate: Fri Oct 4 18:06:10 2019 -0500 [SPARK-28985][PYTHON][ML][FOLLOW-UP] Add _IsotonicRegressionBase ### What changes were proposed in this pull request? Adds ```python class _IsotonicRegressionBase(HasFeaturesCol, HasLabelCol, HasPredictionCol, HasWeightCol): ... ``` with related `Params` and uses it to replace `JavaPredictor` and `HasWeightCol` in `IsotonicRegression` base classes and `JavaPredictionModel,` in `IsotonicRegressionModel` base classes. ### Why are the changes needed? Previous work (#25776) on [SPARK-28985](https://issues.apache.org/jira/browse/SPARK-28985) replaced `JavaEstimator`, `HasFeaturesCol`, `HasLabelCol`, `HasPredictionCol` in `IsotonicRegression` and `JavaModel` in `IsotonicRegressionModel` with newly added `JavaPredictor`: https://github.com/apache/spark/blob/e97b55d32285052a1f76cca35377c4b21eb2e7d7/python/pyspark/ml/wrapper.py#L377 and `JavaPredictionModel` https://github.com/apache/spark/blob/e97b55d32285052a1f76cca35377c4b21eb2e7d7/python/pyspark/ml/wrapper.py#L405 respectively. This however is inconsistent with Scala counterpart where both classes extend private `IsotonicRegressionBase` https://github.com/apache/spark/blob/3cb1b57809d0b4a93223669f5c10cea8fc53eff6/mllib/src/main/scala/org/apache/spark/ml/regression/IsotonicRegression.scala#L42-L43 This preserves some of the existing inconsistencies (`model` as defined in [the official example](https://github.com/apache/spark/blob/master/examples/src/main/python/ml/isotonic_regression_example.py)), i.e. ```python from pyspark.ml.regression impor IsotonicRegressionMode from pyspark.ml.param.shared import HasWeightCol issubclass(IsotonicRegressionModel, HasWeightCol) # False hasattr(model, "weightCol") # True ``` as well as introduces a bug, by adding unsupported `predict` method: ```python import inspect hasattr(model, "predict") # True inspect.getfullargspec(IsotonicRegressionModel.predict) # FullArgSpec(args=['self', 'value'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={}) IsotonicRegressionModel.predict.__doc__ # Predict label for the given features.\n\n .. versionadded:: 3.0.0' model.predict(dataset.first().features) # Py4JError: An error occurred while calling o49.predict. Trace: # py4j.Py4JException: Method predict([class org.apache.spark.ml.linalg.SparseVector]) does not exist # ... ``` Furthermore existing implementation can cause further problems in the future, if `Predictor` / `PredictionModel` API changes. ### Does this PR introduce any user-facing change? Yes. It: - Removes invalid `IsotonicRegressionModel.predict` method. - Adds `HasWeightColumn` to `IsotonicRegressionModel`. however the faulty implementation hasn't been released yet, and proposed additions have negligible potential for breaking existing code (and none, compared to changes already made in #25776). ### How was this patch tested? - Existing unit tests. - Manual testing. CC huaxingao, zhengruifeng Closes #26023 from zero323/SPARK-28985-FOLLOW-UP-isotonic-regression. Authored-by: zero323 <mszymkiew...@gmail.com> Signed-off-by: Sean Owen <sean.o...@databricks.com> --- python/pyspark/ml/regression.py | 57 ++++++++++++++++++++++++----------------- 1 file changed, 33 insertions(+), 24 deletions(-) diff --git a/python/pyspark/ml/regression.py b/python/pyspark/ml/regression.py index 1a7d39b..a1e8217 100644 --- a/python/pyspark/ml/regression.py +++ b/python/pyspark/ml/regression.py @@ -465,8 +465,38 @@ class LinearRegressionTrainingSummary(LinearRegressionSummary): return self._call_java("totalIterations") +class _IsotonicRegressionBase(HasFeaturesCol, HasLabelCol, HasPredictionCol, HasWeightCol): + """ + Params for :py:class:`IsotonicRegression` and :py:class:`IsotonicRegressionModel`. + + .. versionadded:: 3.0.0 + """ + + isotonic = Param( + Params._dummy(), "isotonic", + "whether the output sequence should be isotonic/increasing (true) or" + + "antitonic/decreasing (false).", typeConverter=TypeConverters.toBoolean) + featureIndex = Param( + Params._dummy(), "featureIndex", + "The index of the feature if featuresCol is a vector column, no effect otherwise.", + typeConverter=TypeConverters.toInt) + + def getIsotonic(self): + """ + Gets the value of isotonic or its default value. + """ + return self.getOrDefault(self.isotonic) + + def getFeatureIndex(self): + """ + Gets the value of featureIndex or its default value. + """ + return self.getOrDefault(self.featureIndex) + + @inherit_doc -class IsotonicRegression(JavaPredictor, HasWeightCol, JavaMLWritable, JavaMLReadable): +class IsotonicRegression(JavaEstimator, _IsotonicRegressionBase, HasWeightCol, + JavaMLWritable, JavaMLReadable): """ Currently implemented using parallelized pool adjacent violators algorithm. Only univariate (single feature) algorithm supported. @@ -503,16 +533,6 @@ class IsotonicRegression(JavaPredictor, HasWeightCol, JavaMLWritable, JavaMLRead .. versionadded:: 1.6.0 """ - - isotonic = \ - Param(Params._dummy(), "isotonic", - "whether the output sequence should be isotonic/increasing (true) or" + - "antitonic/decreasing (false).", typeConverter=TypeConverters.toBoolean) - featureIndex = \ - Param(Params._dummy(), "featureIndex", - "The index of the feature if featuresCol is a vector column, no effect otherwise.", - typeConverter=TypeConverters.toInt) - @keyword_only def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction", weightCol=None, isotonic=True, featureIndex=0): @@ -547,26 +567,15 @@ class IsotonicRegression(JavaPredictor, HasWeightCol, JavaMLWritable, JavaMLRead """ return self._set(isotonic=value) - def getIsotonic(self): - """ - Gets the value of isotonic or its default value. - """ - return self.getOrDefault(self.isotonic) - def setFeatureIndex(self, value): """ Sets the value of :py:attr:`featureIndex`. """ return self._set(featureIndex=value) - def getFeatureIndex(self): - """ - Gets the value of featureIndex or its default value. - """ - return self.getOrDefault(self.featureIndex) - -class IsotonicRegressionModel(JavaPredictionModel, JavaMLWritable, JavaMLReadable): +class IsotonicRegressionModel(JavaModel, _IsotonicRegressionBase, + JavaMLWritable, JavaMLReadable): """ Model fitted by :class:`IsotonicRegression`. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org