[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python
[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255158#comment-17255158 ] Weichen Xu commented on SPARK-28902: [~nmarcott] I also considered this issue when I adding `MetaAlgorithmReadWrite` related code. But, I would like to keep pipeline checkStagesForJava logic unchanged because: * (This is not BUG) Save model in pyspark and then loaded from java, this is not required. We can try best effort to make it work but not necessary to ensure it. * The more important reason is that, we need ensure this case work: Pipeline([..., CrossValidator(estimator=XXX)] If we convert this case into JavaModel and then save, the `_to_java` implementation will be very very complicated, because we need to consider how to pass the `CrossValidator.estimatorParamMaps` param to java side (why it is complicated you can refer to the impl of it in java and python side) So I suggest don't change any related code here except you find bugs. The related code is already complicated, changing them may introduce new bugs. CC [~ajaysaini] [~podongfeng] > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.3 >Reporter: Saif Addin >Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python
[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255007#comment-17255007 ] Nicholas Brett Marcott commented on SPARK-28902: It seems [this PR|https://github.com/apache/spark/pull/1/files] and [this PR |https://github.com/apache/spark/commit/7e759b2d95eb3592d62ec010297c39384173a93c#diff-43bf01d52810ead40daf5a967f807a6c6b99d66959ad531617f10c1535503192R291-R295]combined (and possibly others) are breaking this. Both of these PRs are to support python-only stages. The [implementation|https://github.com/apache/spark/blob/master/python/pyspark/ml/pipeline.py#L351-L352] considers anything that doesn't inherit JavaMLWritable as python-only, and this includes several "meta" stages like PipelineModel, CrossValidatorModel + more since the second PR mentioned. {code:java} def checkStagesForJava(stages): return all(isinstance(stage, JavaMLWritable) for stage in stages){code} [Similar logic|https://github.com/apache/spark/blob/master/python/pyspark/ml/tuning.py#L291-L295] to check if nested stages have java equivalents exists in the second PR mentioned above: {code:java} def is_java_convertible(instance): allNestedStages = MetaAlgorithmReadWrite.getAllNestedStages(instance.getEstimator()) evaluator_convertible = isinstance(instance.getEvaluator(), JavaParams) estimator_convertible = all(map(lambda stage: hasattr(stage, '_to_java'), allNestedStages)) return estimator_convertible and evaluator_convertible {code} It seems there needs to a be a consistent and clean way to check whether all stages can be converted to java/support being written in Java. Maybe something similar to the is_java_convertible function above can be used instead of checkStagesForJava for Pipelines. Another alternative is to add a an abstraction around the '_to_java'/'_from_java' functions/ having a java equivalent and check all stages inherit that. + [~ajaysaini95700], [~weichenxu123], [~podongfeng] > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.3 >Reporter: Saif Addin >Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python
[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154234#comment-17154234 ] Makarov Vasiliy Nicolaevich commented on SPARK-28902: - Also have reproduced it with CrossValidatorModel being in pipeline. {code:java} pipelineModel.stages Out[23]: [VectorAssembler_28864f6124b9, CrossValidatorModel_acf8596f410b] {code} > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.3 >Reporter: Saif Addin >Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python
[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079165#comment-17079165 ] chiranjeevi commented on SPARK-28902: - Hi team, i am having the same issue, May i know how can this be fixed ? > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.3 >Reporter: Saif Addin >Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python
[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927077#comment-16927077 ] Saif Addin commented on SPARK-28902: Ah, here I thought you said you couldn't reproduce it. Gladly hoping to see this fixed :) > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.3 >Reporter: Saif Addin >Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python
[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927076#comment-16927076 ] Junichi Koizumi commented on SPARK-28902: --- Since versions aren't the main concern here should I create a PR ? > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.3 >Reporter: Saif Addin >Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python
[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927074#comment-16927074 ] Junichi Koizumi commented on SPARK-28902: --- Since, versions aren't the main concern here should I create a PR ? > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.3 >Reporter: Saif Addin >Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python
[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926226#comment-16926226 ] Saif Addin commented on SPARK-28902: Hi [~reconjun] I am not sure how you made it work, it fails on my and the error actually makes sense. The nested pipeline class is not mapped to a pyspark classname. > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.3 >Reporter: Saif Addin >Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python
[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920478#comment-16920478 ] Junichi Koizumi commented on SPARK-28902: --- Could you tell a little bit more about the workaround? It turns out to be fine on my version . pyspark : >>> from pyspark.ml import Pipeline >>> from pyspark.ml.feature import Tokenizer >>> t = Tokenizer() >>> p = Pipeline().setStages([t]) >>> d = spark.createDataFrame([["Apache spark logistic regression "]]) >>> pm = p.fit(d) >>> np = Pipeline().setStages([pm]) >>> npm = np.fit(d) >>> npm.write().save('./npm_test') scala side : scala> import org.apache.spark.ml.PipelineModel import org.apache.spark.ml.PipelineModel scala> val pp = PipelineModel.load("./npm_test") pp: org.apache.spark.ml.PipelineModel = PipelineModel_4d879f6b2b02c8d3d467 > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.3 >Reporter: Saif Addin >Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org