I am experimenting with creating and persisting ML pipelines using custom 
transformers (I am using Spark 3.1.2). I was able to create a transformer class 
(for testing purposes, I modeled the code off the SQLTransformer class) and 
save the pipeline model. When I attempt to load the saved pipeline model, I am 
running into the following error: 

java.lang.NullPointerException
  at java.base/java.lang.reflect.Method.invoke(Method.java:559)
  at 
org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstanceReader(ReadWrite.scala:631)
  at 
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:276)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
  at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
  at 
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
  at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
  at scala.util.Try$.apply(Try.scala:213)
  at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268)
  at 
org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356)
  at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:160)
  at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:155)
  at 
org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42)
  at 
org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355)
  at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
  at scala.util.Try$.apply(Try.scala:213)
  at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
  at 
org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355)
  at 
org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:349)
  ... 38 elided


Here is a gist 
<https://gist.github.com/ally1221/5acddd9650de3dc67f6399a4687893aa> containing 
the relevant code. Any feedback and advice would be appreciated. Thank you. 

Reply via email to