Hi! I have a pipelineModel (use RandomForestClassifier) that I am trying to save locally. I can save it using:
//save locally val fileOut = new FileOutputStream("file:///home/user/forest.model") val out = new ObjectOutputStream(fileOut) out.writeObject(model) out.close() fileOut.close() Then I deserialize it using: val fileIn = new FileInputStream("/home/forest.model") val in = new ObjectInputStream(fileIn) val cvModel = in.readObject().asInstanceOf[org.apache.spark.ml.PipelineModel] in.close() fileIn.close() but when I try to use it: val predictions2 = cvModel.transform(testingData) It throws an exception: java.lang.IllegalArgumentException: Field "browser_index" does not exist. at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:212) at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:212) at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) at scala.collection.AbstractMap.getOrElse(Map.scala:58) at org.apache.spark.sql.types.StructType.apply(StructType.scala:211) at org.apache.spark.ml.feature.VectorAssembler$$anonfun$5.apply(VectorAssembler.scala:111) at org.apache.spark.ml.feature.VectorAssembler$$anonfun$5.apply(VectorAssembler.scala:111) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:111) at org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:301) at org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:301) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:108) at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:301) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:68) at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:296) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:60) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:62) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:64) at $iwC$$iwC$$iwC.<init>(<console>:66) at $iwC$$iwC.<init>(<console>:68) at $iwC.<init>(<console>:70) at <init>(<console>:72) at .<init>(<console>:76) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:664) at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:629) at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:622) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276) at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I am not using .save and .load because they do not work in Spark 1.6 for RandomForest. Any idea how to do this? any alternatives? Thanks! -- *Mario Lazaro* | Software Engineer, Big Data *GumGum* <http://www.gumgum.com/> | *Ads that stick* 310-985-3792 | ma...@gumgum.com