[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python

2020-12-26 Thread Weichen Xu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255158#comment-17255158
 ] 

Weichen Xu commented on SPARK-28902:


[~nmarcott]
I also considered this issue when I adding `MetaAlgorithmReadWrite` related 
code. But, I would like to keep pipeline checkStagesForJava logic unchanged 
because:

* (This is not BUG) Save model in pyspark and then loaded from java, this is 
not required. We can try best effort to make it work but not necessary to 
ensure it. 

* The more important reason is that, we need ensure this case work:
   Pipeline([...,  CrossValidator(estimator=XXX)]
   If we convert this case into JavaModel and then save, the `_to_java` 
implementation will be very very complicated, because we need to consider how 
to pass the `CrossValidator.estimatorParamMaps` param to java side (why it is 
complicated you can refer to the impl of it in java and python side)

So I suggest don't change any related code here except you find bugs. The 
related code is already complicated, changing them may introduce new bugs.

CC [~ajaysaini] [~podongfeng]




> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> 
>
> Key: SPARK-28902
> URL: https://issues.apache.org/jira/browse/SPARK-28902
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.3
>Reporter: Saif Addin
>Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another 
> *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>  
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>  
>  
> *Scala side:*
>  
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading 
> metadata: Expected class name org.apache.spark.ml.PipelineModel but found 
> class name pyspark.ml.pipeline.PipelineModel
>  at scala.Predef$.require(Predef.scala:224)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
>  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
>  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
>  at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
>  ... 50 elided
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python

2020-12-26 Thread Nicholas Brett Marcott (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255007#comment-17255007
 ] 

Nicholas Brett Marcott commented on SPARK-28902:


It seems [this PR|https://github.com/apache/spark/pull/1/files] and [this 
PR 
|https://github.com/apache/spark/commit/7e759b2d95eb3592d62ec010297c39384173a93c#diff-43bf01d52810ead40daf5a967f807a6c6b99d66959ad531617f10c1535503192R291-R295]combined
 (and possibly others) are breaking this. Both of these PRs are to support 
python-only stages. 

The 
[implementation|https://github.com/apache/spark/blob/master/python/pyspark/ml/pipeline.py#L351-L352]
 considers anything that doesn't inherit JavaMLWritable as python-only, and 
this includes several "meta" stages like PipelineModel, CrossValidatorModel + 
more since the second PR mentioned.
{code:java}
 def checkStagesForJava(stages):
  return all(isinstance(stage, JavaMLWritable) for stage in 
stages){code}
 

[Similar 
logic|https://github.com/apache/spark/blob/master/python/pyspark/ml/tuning.py#L291-L295]
 to check if nested stages have java equivalents exists in the second PR 
mentioned above:

 
{code:java}
def is_java_convertible(instance):
 allNestedStages = 
MetaAlgorithmReadWrite.getAllNestedStages(instance.getEstimator())
 evaluator_convertible = isinstance(instance.getEvaluator(), JavaParams)
 estimator_convertible = all(map(lambda stage: hasattr(stage, '_to_java'), 
allNestedStages))
 return estimator_convertible and evaluator_convertible
{code}
 

It seems there needs to a be a consistent and clean way to check whether all 
stages can be converted to java/support being written in Java. Maybe something 
similar to the is_java_convertible function above can be used instead of 
checkStagesForJava for Pipelines. Another alternative is to add a an 
abstraction around the '_to_java'/'_from_java' functions/ having a java 
equivalent and check all stages inherit that.

+ [~ajaysaini95700], [~weichenxu123], [~podongfeng]

 

> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> 
>
> Key: SPARK-28902
> URL: https://issues.apache.org/jira/browse/SPARK-28902
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.3
>Reporter: Saif Addin
>Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another 
> *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>  
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>  
>  
> *Scala side:*
>  
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading 
> metadata: Expected class name org.apache.spark.ml.PipelineModel but found 
> class name pyspark.ml.pipeline.PipelineModel
>  at scala.Predef$.require(Predef.scala:224)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
>  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
>  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
>  at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
>  ... 50 elided
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python

2020-07-09 Thread Makarov Vasiliy Nicolaevich (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154234#comment-17154234
 ] 

Makarov Vasiliy Nicolaevich commented on SPARK-28902:
-

Also have reproduced it with CrossValidatorModel being in pipeline.

 
{code:java}
pipelineModel.stages

Out[23]: [VectorAssembler_28864f6124b9, CrossValidatorModel_acf8596f410b]   
{code}
 

> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> 
>
> Key: SPARK-28902
> URL: https://issues.apache.org/jira/browse/SPARK-28902
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.3
>Reporter: Saif Addin
>Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another 
> *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>  
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>  
>  
> *Scala side:*
>  
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading 
> metadata: Expected class name org.apache.spark.ml.PipelineModel but found 
> class name pyspark.ml.pipeline.PipelineModel
>  at scala.Predef$.require(Predef.scala:224)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
>  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
>  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
>  at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
>  ... 50 elided
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python

2020-04-09 Thread chiranjeevi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079165#comment-17079165
 ] 

chiranjeevi commented on SPARK-28902:
-

Hi team,

i am having the same issue, May i know how can this be fixed ?

> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> 
>
> Key: SPARK-28902
> URL: https://issues.apache.org/jira/browse/SPARK-28902
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.3
>Reporter: Saif Addin
>Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another 
> *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>  
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>  
>  
> *Scala side:*
>  
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading 
> metadata: Expected class name org.apache.spark.ml.PipelineModel but found 
> class name pyspark.ml.pipeline.PipelineModel
>  at scala.Predef$.require(Predef.scala:224)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
>  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
>  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
>  at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
>  ... 50 elided
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python

2019-09-10 Thread Saif Addin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927077#comment-16927077
 ] 

Saif Addin commented on SPARK-28902:


Ah, here I thought you said you couldn't reproduce it. Gladly hoping to see 
this fixed :)

> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> 
>
> Key: SPARK-28902
> URL: https://issues.apache.org/jira/browse/SPARK-28902
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.3
>Reporter: Saif Addin
>Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another 
> *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>  
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>  
>  
> *Scala side:*
>  
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading 
> metadata: Expected class name org.apache.spark.ml.PipelineModel but found 
> class name pyspark.ml.pipeline.PipelineModel
>  at scala.Predef$.require(Predef.scala:224)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
>  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
>  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
>  at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
>  ... 50 elided
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python

2019-09-10 Thread Junichi Koizumi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927076#comment-16927076
 ] 

Junichi  Koizumi  commented on SPARK-28902:
---

Since versions aren't the main concern here should I create a PR ? 

> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> 
>
> Key: SPARK-28902
> URL: https://issues.apache.org/jira/browse/SPARK-28902
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.3
>Reporter: Saif Addin
>Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another 
> *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>  
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>  
>  
> *Scala side:*
>  
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading 
> metadata: Expected class name org.apache.spark.ml.PipelineModel but found 
> class name pyspark.ml.pipeline.PipelineModel
>  at scala.Predef$.require(Predef.scala:224)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
>  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
>  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
>  at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
>  ... 50 elided
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python

2019-09-10 Thread Junichi Koizumi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927074#comment-16927074
 ] 

Junichi  Koizumi  commented on SPARK-28902:
---

  Since, versions aren't the main concern here should I create a PR ? 

> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> 
>
> Key: SPARK-28902
> URL: https://issues.apache.org/jira/browse/SPARK-28902
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.3
>Reporter: Saif Addin
>Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another 
> *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>  
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>  
>  
> *Scala side:*
>  
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading 
> metadata: Expected class name org.apache.spark.ml.PipelineModel but found 
> class name pyspark.ml.pipeline.PipelineModel
>  at scala.Predef$.require(Predef.scala:224)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
>  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
>  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
>  at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
>  ... 50 elided
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python

2019-09-09 Thread Saif Addin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926226#comment-16926226
 ] 

Saif Addin commented on SPARK-28902:


Hi [~reconjun] I am not sure how you made it work, it fails on my and the error 
actually makes sense. The nested pipeline class is not mapped to a pyspark 
classname.

> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> 
>
> Key: SPARK-28902
> URL: https://issues.apache.org/jira/browse/SPARK-28902
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.3
>Reporter: Saif Addin
>Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another 
> *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>  
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>  
>  
> *Scala side:*
>  
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading 
> metadata: Expected class name org.apache.spark.ml.PipelineModel but found 
> class name pyspark.ml.pipeline.PipelineModel
>  at scala.Predef$.require(Predef.scala:224)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
>  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
>  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
>  at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
>  ... 50 elided
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python

2019-09-01 Thread Junichi Koizumi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920478#comment-16920478
 ] 

Junichi  Koizumi  commented on SPARK-28902:
---

Could you tell a little bit more about the workaround?  It turns out to be fine 
on my version . 

pyspark :

>>> from pyspark.ml import Pipeline
>>> from pyspark.ml.feature import Tokenizer
>>> t = Tokenizer()
>>> p = Pipeline().setStages([t])
>>> d = spark.createDataFrame([["Apache spark logistic regression "]])
>>> pm = p.fit(d)
>>> np = Pipeline().setStages([pm])
>>> npm = np.fit(d)
>>> npm.write().save('./npm_test')

scala side :

scala> import org.apache.spark.ml.PipelineModel
import org.apache.spark.ml.PipelineModel

scala> val pp = PipelineModel.load("./npm_test")
pp: org.apache.spark.ml.PipelineModel = PipelineModel_4d879f6b2b02c8d3d467

> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> 
>
> Key: SPARK-28902
> URL: https://issues.apache.org/jira/browse/SPARK-28902
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.3
>Reporter: Saif Addin
>Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another 
> *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>  
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>  
>  
> *Scala side:*
>  
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading 
> metadata: Expected class name org.apache.spark.ml.PipelineModel but found 
> class name pyspark.ml.pipeline.PipelineModel
>  at scala.Predef$.require(Predef.scala:224)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
>  at 
> org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
>  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
>  at 
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
>  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
>  at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
>  ... 50 elided
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org