[GitHub] spark pull request #21828: Update regression.py

2018-07-23 Thread woodthom2
Github user woodthom2 closed the pull request at:

https://github.com/apache/spark/pull/21828


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21828: Update regression.py

2018-07-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21828#discussion_r204012751
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -1116,7 +1116,7 @@ def setParams(self, featuresCol="features", 
labelCol="label", predictionCol="pre
   maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
   maxMemoryInMB=256, cacheNodeIds=False, 
subsamplingRate=1.0,
   checkpointInterval=10, lossType="squared", maxIter=20, 
stepSize=0.1, seed=None,
-  impuriy="variance", featureSubsetStrategy="all"):
+  impurity="variance", featureSubsetStrategy="all"):
--- End diff --

we could. I would rather use `_NoValue` instance for that purpose though. 
Also, I would make a warning via warnings package as we do in the code base. 
Can we add a simple test for that as well while we are here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21828: Update regression.py

2018-07-20 Thread woodthom2
Github user woodthom2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21828#discussion_r203997163
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -1116,7 +1116,7 @@ def setParams(self, featuresCol="features", 
labelCol="label", predictionCol="pre
   maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
   maxMemoryInMB=256, cacheNodeIds=False, 
subsamplingRate=1.0,
   checkpointInterval=10, lossType="squared", maxIter=20, 
stepSize=0.1, seed=None,
-  impuriy="variance", featureSubsetStrategy="all"):
+  impurity="variance", featureSubsetStrategy="all"):
--- End diff --



what about this until next major release?
```
def setParams(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
  maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
  maxMemoryInMB=256, cacheNodeIds=False, 
subsamplingRate=1.0,
  checkpointInterval=10, lossType="squared", maxIter=20, 
stepSize=0.1, seed=None,
  impuriy=None, impurity="variance", 
featureSubsetStrategy="all"):
if impuriy is not None: # for backward compatibility
impurity = impuriy
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21828: Update regression.py

2018-07-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21828#discussion_r203995996
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -1116,7 +1116,7 @@ def setParams(self, featuresCol="features", 
labelCol="label", predictionCol="pre
   maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
   maxMemoryInMB=256, cacheNodeIds=False, 
subsamplingRate=1.0,
   checkpointInterval=10, lossType="squared", maxIter=20, 
stepSize=0.1, seed=None,
-  impuriy="variance", featureSubsetStrategy="all"):
+  impurity="variance", featureSubsetStrategy="all"):
--- End diff --

Of course that's possible - user upgraded Spark and suddenly it gives no 
such keyword exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21828: Update regression.py

2018-07-20 Thread woodthom2
Github user woodthom2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21828#discussion_r203994440
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -1116,7 +1116,7 @@ def setParams(self, featuresCol="features", 
labelCol="label", predictionCol="pre
   maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
   maxMemoryInMB=256, cacheNodeIds=False, 
subsamplingRate=1.0,
   checkpointInterval=10, lossType="squared", maxIter=20, 
stepSize=0.1, seed=None,
-  impuriy="variance", featureSubsetStrategy="all"):
+  impurity="variance", featureSubsetStrategy="all"):
--- End diff --

is anyone depending on the typoed version?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21828: Update regression.py

2018-07-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21828#discussion_r203994108
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -1116,7 +1116,7 @@ def setParams(self, featuresCol="features", 
labelCol="label", predictionCol="pre
   maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
   maxMemoryInMB=256, cacheNodeIds=False, 
subsamplingRate=1.0,
   checkpointInterval=10, lossType="squared", maxIter=20, 
stepSize=0.1, seed=None,
-  impuriy="variance", featureSubsetStrategy="all"):
+  impurity="variance", featureSubsetStrategy="all"):
--- End diff --

I think this can't just changed like this since it's going to break other 
users codes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21828: Update regression.py

2018-07-20 Thread woodthom2
GitHub user woodthom2 opened a pull request:

https://github.com/apache/spark/pull/21828

Update regression.py

Correct typo impuriy -> impurity
(this would have stopped GBT working for some hyperparameter configurations)

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/woodthom2/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21828.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21828


commit d3973d0149ce313210c1f1930a9450bb33d70960
Author: woodthom2 
Date:   2018-07-20T09:51:59Z

Update regression.py

Correct typo impuriy -> impurity
(this would have stopped GBT working for some hyperparameter configurations)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org