[spark] branch master updated: [SPARK-28985][PYTHON][ML][FOLLOW-UP] Add _IsotonicRegressionBase

srowen Fri, 04 Oct 2019 16:06:51 -0700

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 8556710  [SPARK-28985][PYTHON][ML][FOLLOW-UP] Add 
_IsotonicRegressionBase
8556710 is described below

commit 8556710409d9f2fbaee9dbf76a2ea70218316693
Author: zero323 <mszymkiew...@gmail.com>
AuthorDate: Fri Oct 4 18:06:10 2019 -0500

    [SPARK-28985][PYTHON][ML][FOLLOW-UP] Add _IsotonicRegressionBase
    
    ### What changes were proposed in this pull request?
    
    Adds
    
    ```python
    class _IsotonicRegressionBase(HasFeaturesCol, HasLabelCol, 
HasPredictionCol, HasWeightCol): ...
    ```
    
    with related `Params` and uses it to replace `JavaPredictor` and 
`HasWeightCol` in `IsotonicRegression` base classes and `JavaPredictionModel,` 
in `IsotonicRegressionModel` base classes.
    
    ### Why are the changes needed?
    
    Previous work (#25776) on 
[SPARK-28985](https://issues.apache.org/jira/browse/SPARK-28985) replaced 
`JavaEstimator`, `HasFeaturesCol`, `HasLabelCol`, `HasPredictionCol` in 
`IsotonicRegression` and `JavaModel` in `IsotonicRegressionModel` with newly 
added `JavaPredictor`:
    
    
https://github.com/apache/spark/blob/e97b55d32285052a1f76cca35377c4b21eb2e7d7/python/pyspark/ml/wrapper.py#L377
    
    and `JavaPredictionModel`
    
    
https://github.com/apache/spark/blob/e97b55d32285052a1f76cca35377c4b21eb2e7d7/python/pyspark/ml/wrapper.py#L405
    
    respectively.
    
    This however is inconsistent with Scala counterpart where both  classes 
extend private `IsotonicRegressionBase`
    
    
https://github.com/apache/spark/blob/3cb1b57809d0b4a93223669f5c10cea8fc53eff6/mllib/src/main/scala/org/apache/spark/ml/regression/IsotonicRegression.scala#L42-L43
    
    This preserves some of the existing inconsistencies (`model` as defined in 
[the official 
example](https://github.com/apache/spark/blob/master/examples/src/main/python/ml/isotonic_regression_example.py)),
 i.e.
    
    ```python
    from pyspark.ml.regression impor IsotonicRegressionMode
    from pyspark.ml.param.shared import HasWeightCol
    
    issubclass(IsotonicRegressionModel, HasWeightCol)
    # False
    
    hasattr(model, "weightCol")
    # True
    ```
    
    as well as introduces a bug, by adding unsupported `predict` method:
    
    ```python
    import inspect
    
    hasattr(model, "predict")
    # True
    
    inspect.getfullargspec(IsotonicRegressionModel.predict)
    # FullArgSpec(args=['self', 'value'], varargs=None, varkw=None, 
defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={})
    
    IsotonicRegressionModel.predict.__doc__
    # Predict label for the given features.\n\n        .. versionadded:: 3.0.0'
    
    model.predict(dataset.first().features)
    
    # Py4JError: An error occurred while calling o49.predict. Trace:
    # py4j.Py4JException: Method predict([class 
org.apache.spark.ml.linalg.SparseVector]) does not exist
    # ...
    
    ```
    
    Furthermore existing implementation can cause further problems in the 
future, if `Predictor` / `PredictionModel` API changes.
    
    ### Does this PR introduce any user-facing change?
    
    Yes. It:
    
    - Removes invalid `IsotonicRegressionModel.predict` method.
    - Adds `HasWeightColumn` to `IsotonicRegressionModel`.
    
    however the faulty implementation hasn't been released yet, and proposed 
additions have negligible potential for breaking existing code (and none, 
compared to changes already made in #25776).
    
    ### How was this patch tested?
    
    - Existing unit tests.
    - Manual testing.
    
    CC huaxingao, zhengruifeng
    
    Closes #26023 from zero323/SPARK-28985-FOLLOW-UP-isotonic-regression.
    
    Authored-by: zero323 <mszymkiew...@gmail.com>
    Signed-off-by: Sean Owen <sean.o...@databricks.com>
---
 python/pyspark/ml/regression.py | 57 ++++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/python/pyspark/ml/regression.py b/python/pyspark/ml/regression.py
index 1a7d39b..a1e8217 100644
--- a/python/pyspark/ml/regression.py
+++ b/python/pyspark/ml/regression.py
@@ -465,8 +465,38 @@ class 
LinearRegressionTrainingSummary(LinearRegressionSummary):
         return self._call_java("totalIterations")
 
 
+class _IsotonicRegressionBase(HasFeaturesCol, HasLabelCol, HasPredictionCol, 
HasWeightCol):
+    """
+    Params for :py:class:`IsotonicRegression` and 
:py:class:`IsotonicRegressionModel`.
+
+    .. versionadded:: 3.0.0
+    """
+
+    isotonic = Param(
+        Params._dummy(), "isotonic",
+        "whether the output sequence should be isotonic/increasing (true) or" +
+        "antitonic/decreasing (false).", 
typeConverter=TypeConverters.toBoolean)
+    featureIndex = Param(
+        Params._dummy(), "featureIndex",
+        "The index of the feature if featuresCol is a vector column, no effect 
otherwise.",
+        typeConverter=TypeConverters.toInt)
+
+    def getIsotonic(self):
+        """
+        Gets the value of isotonic or its default value.
+        """
+        return self.getOrDefault(self.isotonic)
+
+    def getFeatureIndex(self):
+        """
+        Gets the value of featureIndex or its default value.
+        """
+        return self.getOrDefault(self.featureIndex)
+
+
 @inherit_doc
-class IsotonicRegression(JavaPredictor, HasWeightCol, JavaMLWritable, 
JavaMLReadable):
+class IsotonicRegression(JavaEstimator, _IsotonicRegressionBase, HasWeightCol,
+                         JavaMLWritable, JavaMLReadable):
     """
     Currently implemented using parallelized pool adjacent violators algorithm.
     Only univariate (single feature) algorithm supported.
@@ -503,16 +533,6 @@ class IsotonicRegression(JavaPredictor, HasWeightCol, 
JavaMLWritable, JavaMLRead
 
     .. versionadded:: 1.6.0
     """
-
-    isotonic = \
-        Param(Params._dummy(), "isotonic",
-              "whether the output sequence should be isotonic/increasing 
(true) or" +
-              "antitonic/decreasing (false).", 
typeConverter=TypeConverters.toBoolean)
-    featureIndex = \
-        Param(Params._dummy(), "featureIndex",
-              "The index of the feature if featuresCol is a vector column, no 
effect otherwise.",
-              typeConverter=TypeConverters.toInt)
-
     @keyword_only
     def __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
                  weightCol=None, isotonic=True, featureIndex=0):
@@ -547,26 +567,15 @@ class IsotonicRegression(JavaPredictor, HasWeightCol, 
JavaMLWritable, JavaMLRead
         """
         return self._set(isotonic=value)
 
-    def getIsotonic(self):
-        """
-        Gets the value of isotonic or its default value.
-        """
-        return self.getOrDefault(self.isotonic)
-
     def setFeatureIndex(self, value):
         """
         Sets the value of :py:attr:`featureIndex`.
         """
         return self._set(featureIndex=value)
 
-    def getFeatureIndex(self):
-        """
-        Gets the value of featureIndex or its default value.
-        """
-        return self.getOrDefault(self.featureIndex)
-
 
-class IsotonicRegressionModel(JavaPredictionModel, JavaMLWritable, 
JavaMLReadable):
+class IsotonicRegressionModel(JavaModel, _IsotonicRegressionBase,
+                              JavaMLWritable, JavaMLReadable):
     """
     Model fitted by :class:`IsotonicRegression`.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28985][PYTHON][ML][FOLLOW-UP] Add _IsotonicRegressionBase

Reply via email to