RE: Re: [Spark ML Pipeline]: Error Loading Pipeline Model with Custom Transformer

2022-01-12 Thread Alana Young
I have updated the gist 
(https://gist.github.com/ally1221/5acddd9650de3dc67f6399a4687893aa 
). Please 
let me know if there are any additional questions.

Re: [Spark ML Pipeline]: Error Loading Pipeline Model with Custom Transformer

2022-01-12 Thread Gourav Sengupta
Hi,

may be I have less time, but can you please add some inline comments in
your code to explain what you are trying to do?

Regards,
Gourav Sengupta



On Tue, Jan 11, 2022 at 5:29 PM Alana Young  wrote:

> I am experimenting with creating and persisting ML pipelines using custom
> transformers (I am using Spark 3.1.2). I was able to create a transformer
> class (for testing purposes, I modeled the code off the SQLTransformer
> class) and save the pipeline model. When I attempt to load the saved
> pipeline model, I am running into the following error:
>
> java.lang.NullPointerException at
> java.base/java.lang.reflect.Method.invoke(Method.java:559) at
> org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstanceReader(ReadWrite.scala:631)
> at
> org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:276)
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
> at
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at
> scala.collection.TraversableLike.map(TraversableLike.scala:238) at
> scala.collection.TraversableLike.map$(TraversableLike.scala:231) at
> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at
> org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
> at
> org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
> at scala.util.Try$.apply(Try.scala:213) at
> org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
> at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268)
> at
> org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356)
> at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:160) at
> org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:155) at
> org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42)
> at
> org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355)
> at
> org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
> at scala.util.Try$.apply(Try.scala:213) at
> org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
> at
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355)
> at
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:349)
> ... 38 elided
>
>
> Here is a gist
>  containing
> the relevant code. Any feedback and advice would be appreciated. Thank
> you.
>


[Spark ML Pipeline]: Error Loading Pipeline Model with Custom Transformer

2022-01-11 Thread Alana Young
I am experimenting with creating and persisting ML pipelines using custom 
transformers (I am using Spark 3.1.2). I was able to create a transformer class 
(for testing purposes, I modeled the code off the SQLTransformer class) and 
save the pipeline model. When I attempt to load the saved pipeline model, I am 
running into the following error: 

java.lang.NullPointerException
  at java.base/java.lang.reflect.Method.invoke(Method.java:559)
  at 
org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstanceReader(ReadWrite.scala:631)
  at 
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:276)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
  at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
  at 
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
  at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
  at scala.util.Try$.apply(Try.scala:213)
  at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268)
  at 
org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356)
  at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:160)
  at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:155)
  at 
org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42)
  at 
org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355)
  at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
  at scala.util.Try$.apply(Try.scala:213)
  at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
  at 
org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355)
  at 
org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:349)
  ... 38 elided


Here is a gist 
 containing 
the relevant code. Any feedback and advice would be appreciated. Thank you.