[ 
https://issues.apache.org/jira/browse/SPARK-42825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Partridge updated SPARK-42825:
------------------------------------
    Description: 
The Python signature/docstring of the setParams() method for the estimators and 
transformers under pyspark.ml imply that if you don't set any of the named 
params then they will be reset to their default values.

Example from 
[https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.clustering.GaussianMixture.html#pyspark.ml.clustering.GaussianMixture.setParams]
 :
{code:java}
setParams(self, \*, featuresCol="features", predictionCol="prediction", k=2, 
probabilityCol="probability", tol=0.01, maxIter=100, seed=None, 
aggregationDepth=2, weightCol=None){code}
In the extreme this would imply that if you called setParams() with no args 
then _all_ the params would be reset to their default values.

But what actually happens is that _only_ the params passed in the call get 
changed; the values of any other params aren't affected. So if you call 
setParams() with no args then _no_ params get changed!

So is this behavior by design? I guess it is from the name of the method. But 
it is counter-intuitive from its docstring. So if this behavior is intentional 
then perhaps the default docstring should make this explicit by saying 
something like:

"Sets the named params. The values of other params are not affected."

  was:
The Python signature/docstring of the setParams() method for the estimators and 
transformers under pyspark.ml imply that if you don't set any of the named 
params then they will be reset to their default values.

Example from 
[https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.clustering.GaussianMixture.html#pyspark.ml.clustering.GaussianMixture.setParams]
 :

{{{{}}}}
{code:java}
setParams(self, \*, featuresCol="features", predictionCol="prediction", k=2, 
probabilityCol="probability", tol=0.01, maxIter=100, seed=None, 
aggregationDepth=2, weightCol=None){code}
In the extreme this would imply that if you called setParams() with no args 
then _all_ the params would be reset to their default values.

But what actually happens is that _only_ the params passed in the call get 
changed; the values of any other params aren't affected. So if you call 
setParams() with no args then _no_ params get changed!

So is this behavior by design? I guess it is from the name of the method. But 
it is counter-intuitive from its docstring. So if this behavior is intentional 
then perhaps the default docstring should make this explicit by saying 
something like:

"Sets the named params. The values of other params are not affected."


> setParams() only sets explicitly named params. Is this intentional or a bug?
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-42825
>                 URL: https://issues.apache.org/jira/browse/SPARK-42825
>             Project: Spark
>          Issue Type: Question
>          Components: ML, PySpark
>    Affects Versions: 3.3.2
>            Reporter: Lucas Partridge
>            Priority: Minor
>
> The Python signature/docstring of the setParams() method for the estimators 
> and transformers under pyspark.ml imply that if you don't set any of the 
> named params then they will be reset to their default values.
> Example from 
> [https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.clustering.GaussianMixture.html#pyspark.ml.clustering.GaussianMixture.setParams]
>  :
> {code:java}
> setParams(self, \*, featuresCol="features", predictionCol="prediction", k=2, 
> probabilityCol="probability", tol=0.01, maxIter=100, seed=None, 
> aggregationDepth=2, weightCol=None){code}
> In the extreme this would imply that if you called setParams() with no args 
> then _all_ the params would be reset to their default values.
> But what actually happens is that _only_ the params passed in the call get 
> changed; the values of any other params aren't affected. So if you call 
> setParams() with no args then _no_ params get changed!
> So is this behavior by design? I guess it is from the name of the method. But 
> it is counter-intuitive from its docstring. So if this behavior is 
> intentional then perhaps the default docstring should make this explicit by 
> saying something like:
> "Sets the named params. The values of other params are not affected."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to