I am experimenting with creating and persisting ML pipelines using custom transformers (I am using Spark 3.1.2). I was able to create a transformer class (for testing purposes, I modeled the code off the SQLTransformer class) and save the pipeline model. When I attempt to load the saved pipeline model, I am running into the following error:
java.lang.NullPointerException at java.base/java.lang.reflect.Method.invoke(Method.java:559) at org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstanceReader(ReadWrite.scala:631) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:276) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268) at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356) at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:160) at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:155) at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42) at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355) at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:349) ... 38 elided Here is a gist <https://gist.github.com/ally1221/5acddd9650de3dc67f6399a4687893aa> containing the relevant code. Any feedback and advice would be appreciated. Thank you.