[jira] [Commented] (SPARK-29691) Estimator fit method fails to copy params (in PySpark)

2019-11-14 Thread John Bauer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974527#comment-16974527
 ] 

John Bauer commented on SPARK-29691:


[[SPARK-29691] ensure Param objects are valid in fit, 
transform|https://github.com/apache/spark/pull/26527]

> Estimator fit method fails to copy params (in PySpark)
> --
>
> Key: SPARK-29691
> URL: https://issues.apache.org/jira/browse/SPARK-29691
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Bauer
>Priority: Minor
>
> Estimator `fit` method is supposed to copy a dictionary of params, 
> overwriting the estimator's previous values, before fitting the model. 
> However, the parameter values are not updated.  This was observed in PySpark, 
> but may be present in the Java objects, as the PySpark code appears to be 
> functioning correctly.   (The copy method that interacts with Java is 
> actually implemented in Params.)
> For example, this prints
> Before: 0.8
> After: 0.8
> but After should be 0.75
> {code:python}
> from pyspark.ml.classification import LogisticRegression
> # Load training data
> training = spark \
> .read \
> .format("libsvm") \
> .load("data/mllib/sample_multiclass_classification_data.txt")
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> print("Before:", lr.getOrDefault("elasticNetParam"))
> # Fit the model, but with an updated parameter setting:
> lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
> print("After:", lr.getOrDefault("elasticNetParam"))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29691) Estimator fit method fails to copy params (in PySpark)

2019-11-05 Thread John Bauer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967773#comment-16967773
 ] 

John Bauer commented on SPARK-29691:


Yes, I can do that.  An error message suggesting a call to getParam would get 
people on track.  (I think that extending the API to include parameter names as 
above could be done safely, with a check that they could be bound to self, and 
an additional check in Pipeline.fit to prevent them being broadcast across a 
pipeline.)

> Estimator fit method fails to copy params (in PySpark)
> --
>
> Key: SPARK-29691
> URL: https://issues.apache.org/jira/browse/SPARK-29691
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Bauer
>Priority: Minor
>
> Estimator `fit` method is supposed to copy a dictionary of params, 
> overwriting the estimator's previous values, before fitting the model. 
> However, the parameter values are not updated.  This was observed in PySpark, 
> but may be present in the Java objects, as the PySpark code appears to be 
> functioning correctly.   (The copy method that interacts with Java is 
> actually implemented in Params.)
> For example, this prints
> Before: 0.8
> After: 0.8
> but After should be 0.75
> {code:python}
> from pyspark.ml.classification import LogisticRegression
> # Load training data
> training = spark \
> .read \
> .format("libsvm") \
> .load("data/mllib/sample_multiclass_classification_data.txt")
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> print("Before:", lr.getOrDefault("elasticNetParam"))
> # Fit the model, but with an updated parameter setting:
> lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
> print("After:", lr.getOrDefault("elasticNetParam"))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29691) Estimator fit method fails to copy params (in PySpark)

2019-11-05 Thread Bryan Cutler (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967748#comment-16967748
 ] 

Bryan Cutler commented on SPARK-29691:
--

[~JohnHBauer] I'm not sure we should extend the API to accept parameter names 
too, but it should definitely check that values  in {{extra}} are an instance 
of {{Param}} and raise an error if not. Would that be ok with you and could you 
do a PR for this?

> Estimator fit method fails to copy params (in PySpark)
> --
>
> Key: SPARK-29691
> URL: https://issues.apache.org/jira/browse/SPARK-29691
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Bauer
>Priority: Minor
>
> Estimator `fit` method is supposed to copy a dictionary of params, 
> overwriting the estimator's previous values, before fitting the model. 
> However, the parameter values are not updated.  This was observed in PySpark, 
> but may be present in the Java objects, as the PySpark code appears to be 
> functioning correctly.   (The copy method that interacts with Java is 
> actually implemented in Params.)
> For example, this prints
> Before: 0.8
> After: 0.8
> but After should be 0.75
> {code:python}
> from pyspark.ml.classification import LogisticRegression
> # Load training data
> training = spark \
> .read \
> .format("libsvm") \
> .load("data/mllib/sample_multiclass_classification_data.txt")
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> print("Before:", lr.getOrDefault("elasticNetParam"))
> # Fit the model, but with an updated parameter setting:
> lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
> print("After:", lr.getOrDefault("elasticNetParam"))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29691) Estimator fit method fails to copy params (in PySpark)

2019-11-05 Thread John Bauer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967741#comment-16967741
 ] 

John Bauer commented on SPARK-29691:


I wonder if it would make sense to do this:

{code:java}
def _copyValues(self, to, extra=None):
"""
Copies param values from this instance to another instance for
params shared by them.

:param to: the target instance
:param extra: extra params to be copied
:return: the target instance with param values copied
"""
paramMap = self._paramMap.copy()
if extra is not None:
paramMap.update(extra)
for param in self.params:
# copy default params
if param in self._defaultParamMap and to.hasParam(param.name):
to._defaultParamMap[to.getParam(param.name)] = 
self._defaultParamMap[param]
# copy explicitly set params
if param in paramMap and to.hasParam(param.name):
to._set(**{param.name: paramMap[param]})
# allow extra to update parameters on self by name, without having 
to call getParam first
elif self.hasParam(param):
to._set(**{param: paramMap[param]})
else:
pass
return to
{code}
This should allow:
lr.fit(df, extra={"elasticNetParam": 0.3}) 
to produce the same result as:
lr.fit(df, extra={lr.getParam("elasticNetParam"): 0.3})

> Estimator fit method fails to copy params (in PySpark)
> --
>
> Key: SPARK-29691
> URL: https://issues.apache.org/jira/browse/SPARK-29691
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Bauer
>Priority: Minor
>
> Estimator `fit` method is supposed to copy a dictionary of params, 
> overwriting the estimator's previous values, before fitting the model. 
> However, the parameter values are not updated.  This was observed in PySpark, 
> but may be present in the Java objects, as the PySpark code appears to be 
> functioning correctly.   (The copy method that interacts with Java is 
> actually implemented in Params.)
> For example, this prints
> Before: 0.8
> After: 0.8
> but After should be 0.75
> {code:python}
> from pyspark.ml.classification import LogisticRegression
> # Load training data
> training = spark \
> .read \
> .format("libsvm") \
> .load("data/mllib/sample_multiclass_classification_data.txt")
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> print("Before:", lr.getOrDefault("elasticNetParam"))
> # Fit the model, but with an updated parameter setting:
> lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
> print("After:", lr.getOrDefault("elasticNetParam"))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29691) Estimator fit method fails to copy params (in PySpark)

2019-11-04 Thread John Bauer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966941#comment-16966941
 ] 

John Bauer commented on SPARK-29691:


OK that works.  I worked with fit doing a grid search some time ago, and don't 
remember it working like this.  I will check some of my earlier projects to see 
if my memory fails me...

> Estimator fit method fails to copy params (in PySpark)
> --
>
> Key: SPARK-29691
> URL: https://issues.apache.org/jira/browse/SPARK-29691
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Bauer
>Priority: Minor
>
> Estimator `fit` method is supposed to copy a dictionary of params, 
> overwriting the estimator's previous values, before fitting the model. 
> However, the parameter values are not updated.  This was observed in PySpark, 
> but may be present in the Java objects, as the PySpark code appears to be 
> functioning correctly.   (The copy method that interacts with Java is 
> actually implemented in Params.)
> For example, this prints
> Before: 0.8
> After: 0.8
> but After should be 0.75
> {code:python}
> from pyspark.ml.classification import LogisticRegression
> # Load training data
> training = spark \
> .read \
> .format("libsvm") \
> .load("data/mllib/sample_multiclass_classification_data.txt")
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> print("Before:", lr.getOrDefault("elasticNetParam"))
> # Fit the model, but with an updated parameter setting:
> lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
> print("After:", lr.getOrDefault("elasticNetParam"))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29691) Estimator fit method fails to copy params (in PySpark)

2019-11-04 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966929#comment-16966929
 ] 

Huaxin Gao commented on SPARK-29691:


Could you please change this line 
{code:java}
lrModel1 = lr.fit(training, params={"elasticNetParam": 0.3}){code}
to 
{code:java}
lrModel1 = lr.fit(training, params={lr.elasticNetParam : 0.3}){code}
and try again?

 

> Estimator fit method fails to copy params (in PySpark)
> --
>
> Key: SPARK-29691
> URL: https://issues.apache.org/jira/browse/SPARK-29691
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Bauer
>Priority: Minor
>
> Estimator `fit` method is supposed to copy a dictionary of params, 
> overwriting the estimator's previous values, before fitting the model. 
> However, the parameter values are not updated.  This was observed in PySpark, 
> but may be present in the Java objects, as the PySpark code appears to be 
> functioning correctly.   (The copy method that interacts with Java is 
> actually implemented in Params.)
> For example, this prints
> Before: 0.8
> After: 0.8
> but After should be 0.75
> {code:python}
> from pyspark.ml.classification import LogisticRegression
> # Load training data
> training = spark \
> .read \
> .format("libsvm") \
> .load("data/mllib/sample_multiclass_classification_data.txt")
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> print("Before:", lr.getOrDefault("elasticNetParam"))
> # Fit the model, but with an updated parameter setting:
> lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
> print("After:", lr.getOrDefault("elasticNetParam"))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29691) Estimator fit method fails to copy params (in PySpark)

2019-11-04 Thread John Bauer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966768#comment-16966768
 ] 

John Bauer commented on SPARK-29691:


I will update the example shortly - I was using this in the context of an 
MLflow Hyperparameter search, and none of the model outputs changed either.

> Estimator fit method fails to copy params (in PySpark)
> --
>
> Key: SPARK-29691
> URL: https://issues.apache.org/jira/browse/SPARK-29691
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Bauer
>Priority: Minor
>
> Estimator `fit` method is supposed to copy a dictionary of params, 
> overwriting the estimator's previous values, before fitting the model. 
> However, the parameter values are not updated.  This was observed in PySpark, 
> but may be present in the Java objects, as the PySpark code appears to be 
> functioning correctly.   (The copy method that interacts with Java is 
> actually implemented in Params.)
> For example, this prints
> Before: 0.8
> After: 0.8
> but After should be 0.75
> {code:python}
> from pyspark.ml.classification import LogisticRegression
> # Load training data
> training = spark \
> .read \
> .format("libsvm") \
> .load("data/mllib/sample_multiclass_classification_data.txt")
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> print("Before:", lr.getOrDefault("elasticNetParam"))
> # Fit the model, but with an updated parameter setting:
> lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
> print("After:", lr.getOrDefault("elasticNetParam"))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29691) Estimator fit method fails to copy params (in PySpark)

2019-11-01 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965128#comment-16965128
 ] 

Huaxin Gao commented on SPARK-29691:


I checked the doc and implementation. The Estimator fits the model using the 
passed in optional params instead of the embedded params, but it doesn't 
overwrite the estimator's embedded params values. In your case, the estimator 
uses 0.75 to fit the model, but it still keeps 0.8 for it's own 
elasticNetParam. If you get the model's parameters, it should have 0.75 for 
elasticNetParam. This seems to work as designed. 
# Fit the model, but with an updated parameter setting:lrModel = 
lr.fit(training, params={lor.elasticNetParam : 0.75})print("After:", 
lrModel.getOrDefault("elasticNetParam"))  # print 0.75

> Estimator fit method fails to copy params (in PySpark)
> --
>
> Key: SPARK-29691
> URL: https://issues.apache.org/jira/browse/SPARK-29691
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Bauer
>Priority: Minor
>
> Estimator `fit` method is supposed to copy a dictionary of params, 
> overwriting the estimator's previous values, before fitting the model. 
> However, the parameter values are not updated.  This was observed in PySpark, 
> but may be present in the Java objects, as the PySpark code appears to be 
> functioning correctly.   (The copy method that interacts with Java is 
> actually implemented in Params.)
> For example, this prints
> Before: 0.8
> After: 0.8
> but After should be 0.75
> {code:python}
> from pyspark.ml.classification import LogisticRegression
> # Load training data
> training = spark \
> .read \
> .format("libsvm") \
> .load("data/mllib/sample_multiclass_classification_data.txt")
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> print("Before:", lr.getOrDefault("elasticNetParam"))
> # Fit the model, but with an updated parameter setting:
> lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
> print("After:", lr.getOrDefault("elasticNetParam"))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org