[ 
https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Bauer updated SPARK-29691:
-------------------------------
    Description: 
Estimator `fit` method (implemented in Params) is supposed to copy a dictionary 
of params, overwriting the estimator's previous values, before fitting the 
model.  However, the parameter values are not updated.  This was observed in 
PySpark, but may be present in the Java objects, as the PySpark code appears to 
be functioning correctly.

For example, this prints

{{Before: 0.8
After: 0.8}}

but After should be 0.75


{code:python}
from pyspark.ml.classification import LogisticRegression

# Load training data
training = spark \
    .read \
    .format("libsvm") \
    .load("data/mllib/sample_multiclass_classification_data.txt")

lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
print("Before:", lr.getOrDefault("elasticNetParam"))

# Fit the model, but with an updated parameter setting:
lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})

print("After:", lr.getOrDefault("elasticNetParam"))
{code}

  was:
Estimator `fit` method (implemented in Params) is supposed to copy a dictionary 
of params, overwriting the estimator's previous values, before fitting the 
model.  However, the parameter values are not updated.  This was observed in 
PySpark, but may be present in the Java objects, as the PySpark code appears to 
be functioning correctly.

For example, this prints

{{Before: 0.8
After: 0.8}}

but After should be 0.75

{{from pyspark.ml.classification import LogisticRegression

# Load training data
training = spark \
    .read \
    .format("libsvm") \
    .load("data/mllib/sample_multiclass_classification_data.txt")

lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
print("Before:", lr.getOrDefault("elasticNetParam"))

# Fit the model, but with an updated parameter setting:
lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})

print("After:", lr.getOrDefault("elasticNetParam"))}}


> Estimator fit method fails to copy params (in PySpark)
> ------------------------------------------------------
>
>                 Key: SPARK-29691
>                 URL: https://issues.apache.org/jira/browse/SPARK-29691
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.4
>            Reporter: John Bauer
>            Priority: Minor
>
> Estimator `fit` method (implemented in Params) is supposed to copy a 
> dictionary of params, overwriting the estimator's previous values, before 
> fitting the model.  However, the parameter values are not updated.  This was 
> observed in PySpark, but may be present in the Java objects, as the PySpark 
> code appears to be functioning correctly.
> For example, this prints
> {{Before: 0.8
> After: 0.8}}
> but After should be 0.75
> {code:python}
> from pyspark.ml.classification import LogisticRegression
> # Load training data
> training = spark \
>     .read \
>     .format("libsvm") \
>     .load("data/mllib/sample_multiclass_classification_data.txt")
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> print("Before:", lr.getOrDefault("elasticNetParam"))
> # Fit the model, but with an updated parameter setting:
> lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
> print("After:", lr.getOrDefault("elasticNetParam"))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to