[ https://issues.apache.org/jira/browse/SPARK-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407507#comment-16407507 ]
Bryan Cutler commented on SPARK-23244: -------------------------------------- I looked into this and it is a little bit different because with save/load, params are only transferred from Java to Python. So the actual problem is in Scala: {code:java} scala> import org.apache.spark.ml.feature.Bucketizer import org.apache.spark.ml.feature.Bucketizer scala> val a = new Bucketizer() a: org.apache.spark.ml.feature.Bucketizer = bucketizer_30c66d09db18 scala> a.isSet(a.outputCol) res2: Boolean = false scala> a.save("bucketizer0") scala> val b = Bucketizer.load("bucketizer0") b: org.apache.spark.ml.feature.Bucketizer = bucketizer_30c66d09db18 scala> b.isSet(b.outputCol) res4: Boolean = true{code} It seems this is being worked on in SPARK-23455, so I'll still close this as a duplicate > Incorrect handling of default values when deserializing python wrappers of > scala transformers > --------------------------------------------------------------------------------------------- > > Key: SPARK-23244 > URL: https://issues.apache.org/jira/browse/SPARK-23244 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 2.2.1 > Reporter: Tomas Nykodym > Priority: Minor > > Default values are not handled properly when serializing/deserializing python > trasnformers which are wrappers around scala objects. It looks like that > after deserialization the default values which were based on uid do not get > properly restored and values which were not set are set to their (original) > default values. > Here's a simple code example using Bucketizer: > {code:python} > >>> from pyspark.ml.feature import Bucketizer > >>> a = Bucketizer() > >>> a.save("bucketizer0") > >>> b = load("bucketizer0") > >>> a._defaultParamMap[a.outputCol] > u'Bucketizer_440bb49206c148989db7__output' > >>> b._defaultParamMap[b.outputCol] > u'Bucketizer_41cf9afbc559ca2bfc9a__output' > >>> a.isSet(a.outputCol) > False > >>> b.isSet(b.outputCol) > True > >>> a.getOutputCol() > u'Bucketizer_440bb49206c148989db7__output' > >>> b.getOutputCol() > u'Bucketizer_440bb49206c148989db7__output' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org