[ 
https://issues.apache.org/jira/browse/SPARK-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Shearer updated SPARK-14740:
---------------------------------
    Description: 
If you tune hyperparameters using a CrossValidator object in PySpark, you may 
not be able to extract the parameter values of the best model.

{noformat}
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.mllib.linalg import Vectors
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator

dataset = sqlContext.createDataFrame(
    [(Vectors.dense([0.0]), 0.0),
     (Vectors.dense([0.4]), 1.0),
     (Vectors.dense([0.5]), 0.0),
     (Vectors.dense([0.6]), 1.0),
     (Vectors.dense([1.0]), 1.0)] * 10,
    ["features", "label"])
lr = LogisticRegression()
grid = ParamGridBuilder().addGrid(lr.regParam, [0.1, 0.01, 0.001, 
0.0001]).build()
evaluator = BinaryClassificationEvaluator()
cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator)
cvModel = cv.fit(dataset)
{noformat}

I can get the regression coefficient out, but I can't get the regularization 
parameter

{noformat}
In [3]: cvModel.bestModel.coefficients
Out[3]: DenseVector([3.1573])

In [4]: cvModel.bestModel.explainParams()
Out[4]: ''

In [5]: cvModel.bestModel.extractParamMap()
Out[5]: {}

In [15]: cvModel.params
Out[15]: []

In [36]: cvModel.bestModel.params
Out[36]: []
{noformat}

For the original issue raised on StackOverflow please see 
http://stackoverflow.com/questions/36697304/how-to-extract-model-hyper-parameters-from-spark-ml-in-pyspark



  was:
If you tune hyperparameters using a CrossValidator object in PySpark, you may 
not be able to extract the parameter values of the best model.

{noformat}
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.mllib.linalg import Vectors
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator

dataset = sqlContext.createDataFrame(
    [(Vectors.dense([0.0]), 0.0),
     (Vectors.dense([0.4]), 1.0),
     (Vectors.dense([0.5]), 0.0),
     (Vectors.dense([0.6]), 1.0),
     (Vectors.dense([1.0]), 1.0)] * 10,
    ["features", "label"])
lr = LogisticRegression()
grid = ParamGridBuilder().addGrid(lr.regParam, [0.1, 0.01, 0.001, 
0.0001]).build()
evaluator = BinaryClassificationEvaluator()
cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator)
cvModel = cv.fit(dataset)
{noformat}

I can get the regression coefficient out, but I can't get the regularization 
parameter

{noformat}
In [3]: cvModel.bestModel.coefficients
Out[3]: DenseVector([3.1573])

In [4]: cvModel.bestModel.explainParams()
Out[4]: ''

In [5]: cvModel.bestModel.extractParamMap()
Out[5]: {}

In [15]: cvModel.params
Out[15]: []

In [36]: cvModel.bestModel.params
Out[36]: []
{noformat}

For a simple example please see 
http://stackoverflow.com/questions/36697304/how-to-extract-model-hyper-parameters-from-spark-ml-in-pyspark




> CrossValidatorModel.bestModel does not include hyper-parameters
> ---------------------------------------------------------------
>
>                 Key: SPARK-14740
>                 URL: https://issues.apache.org/jira/browse/SPARK-14740
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Paul Shearer
>
> If you tune hyperparameters using a CrossValidator object in PySpark, you may 
> not be able to extract the parameter values of the best model.
> {noformat}
> from pyspark.ml.classification import LogisticRegression
> from pyspark.ml.evaluation import BinaryClassificationEvaluator
> from pyspark.mllib.linalg import Vectors
> from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
> dataset = sqlContext.createDataFrame(
>     [(Vectors.dense([0.0]), 0.0),
>      (Vectors.dense([0.4]), 1.0),
>      (Vectors.dense([0.5]), 0.0),
>      (Vectors.dense([0.6]), 1.0),
>      (Vectors.dense([1.0]), 1.0)] * 10,
>     ["features", "label"])
> lr = LogisticRegression()
> grid = ParamGridBuilder().addGrid(lr.regParam, [0.1, 0.01, 0.001, 
> 0.0001]).build()
> evaluator = BinaryClassificationEvaluator()
> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
> evaluator=evaluator)
> cvModel = cv.fit(dataset)
> {noformat}
> I can get the regression coefficient out, but I can't get the regularization 
> parameter
> {noformat}
> In [3]: cvModel.bestModel.coefficients
> Out[3]: DenseVector([3.1573])
> In [4]: cvModel.bestModel.explainParams()
> Out[4]: ''
> In [5]: cvModel.bestModel.extractParamMap()
> Out[5]: {}
> In [15]: cvModel.params
> Out[15]: []
> In [36]: cvModel.bestModel.params
> Out[36]: []
> {noformat}
> For the original issue raised on StackOverflow please see 
> http://stackoverflow.com/questions/36697304/how-to-extract-model-hyper-parameters-from-spark-ml-in-pyspark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to