Saif Addin created SPARK-28902: ---------------------------------- Summary: Spark ML Pipeline with nested Pipelines fails to load when saved from Python Key: SPARK-28902 URL: https://issues.apache.org/jira/browse/SPARK-28902 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.4.3 Reporter: Saif Addin
Hi, this error is affecting a bunch of our nested use cases. Saving a *PipelineModel* with one of its stages being another *PipelineModel*, fails when loading it from Scala if it is saved in Python. *Python side:* {code:java} from pyspark.ml import Pipeline from pyspark.ml.feature import Tokenizer t = Tokenizer() p = Pipeline().setStages([t]) d = spark.createDataFrame([["Hello Peter Parker"]]) pm = p.fit(d) np = Pipeline().setStages([pm]) npm = np.fit(d) npm.write().save('./npm_test') {code} *Scala side:* {code:java} scala> import org.apache.spark.ml.PipelineModel scala> val pp = PipelineModel.load("./npm_test") java.lang.IllegalArgumentException: requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.PipelineModel but found class name pyspark.ml.pipeline.PipelineModel at scala.Predef$.require(Predef.scala:224) at org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) ... 50 elided {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org