[ 
https://issues.apache.org/jira/browse/SPARK-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357473#comment-16357473
 ] 

Marco Gaido commented on SPARK-23244:
-------------------------------------

The change is related because your problem is caused by the python api setting 
(wrongly) all the values (default and not default) as real values. So the model 
is persisted with all the default values set as they were actually set by the 
user. That PR is avoiding the default values being actually set, so the 
persisted model will treat them all as defaults and the newly loaded model will 
be right.

If you have more questions feel free to ask. And feel free to try my patch on 
your own to check whether your problem is solved or not and to provide 
feedbacks on it if you want.

> Incorrect handling of default values when deserializing python wrappers of 
> scala transformers
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23244
>                 URL: https://issues.apache.org/jira/browse/SPARK-23244
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 2.2.1
>            Reporter: Tomas Nykodym
>            Priority: Minor
>
> Default values are not handled properly when serializing/deserializing python 
> trasnformers which are wrappers around scala objects. It looks like that 
> after deserialization the default values which were based on uid do not get 
> properly restored and values which were not set are set to their (original) 
> default values.
> Here's a simple code example using Bucketizer:
> {code:python}
> >>> from pyspark.ml.feature import Bucketizer
> >>> a = Bucketizer() 
> >>> a.save("bucketizer0")
> >>> b = load("bucketizer0") 
> >>> a._defaultParamMap[a.outputCol]
> u'Bucketizer_440bb49206c148989db7__output'
> >>> b._defaultParamMap[b.outputCol]
> u'Bucketizer_41cf9afbc559ca2bfc9a__output'
> >>> a.isSet(a.outputCol)
> False 
> >>> b.isSet(b.outputCol)
> True
> >>> a.getOutputCol()
> u'Bucketizer_440bb49206c148989db7__output'
> >>> b.getOutputCol()
> u'Bucketizer_440bb49206c148989db7__output'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to