Re: How to save mllib model to hdfs and reload it
Shixiong, These two snippets behave different in Scala. In the second snippet, you define variable named m and does evaluate the right hand size as part of the definition. In other words, the variable was replaced by the pre-computed value of Array(1.0) in the subsequently code. So in the second snippet, you do not need to serialize class and it can work well even in distributed environment because it only send the pre-computed value rather than the whole class to different execute nodes. 2014-08-14 22:54 GMT+08:00 Shixiong Zhu zsxw...@gmail.com: I think I can reproduce this error. The following code cannot work and report Foo cannot be serialized. (log in gist https://gist.github.com/zsxwing/4f9f17201d4378fe3e16): class Foo { def foo() = Array(1.0) } val t = new Foo val m = t.foo val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_ + m(0)) r2.toArray But the following code can work (log in gist https://gist.github.com/zsxwing/802cade0facb36a37656): class Foo { def foo() = Array(1.0) } var m: Array[Double] = null { val t = new Foo m = t.foo } val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_ + m(0)) r2.toArray Best Regards, Shixiong Zhu 2014-08-14 22:11 GMT+08:00 Christopher Nguyen c...@adatao.com: Hi Hoai-Thu, the issue of private default constructor is unlikely the cause here, since Lance was already able to load/deserialize the model object. And on that side topic, I wish all serdes libraries would just use constructor.setAccessible(true) by default :-) Most of the time that privacy is not about serdes reflection restrictions. Sent while mobile. Pls excuse typos etc. On Aug 14, 2014 1:58 AM, Hoai-Thu Vuong thuv...@gmail.com wrote: A man in this community give me a video: https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question in this community and other guys helped me to solve this problem. I'm trying to load MatrixFactorizationModel from object file, but compiler said that, I can not create object because the constructor is private. To solve this, I put my new object to same package as MatrixFactorizationModel. Luckly it works. On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen c...@adatao.com wrote: Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to isolate the cause to serialization of the loaded model. And also try to serialize the deserialized (loaded) model manually to see if that throws any visible exceptions. Sent while mobile. Pls excuse typos etc. On Aug 13, 2014 7:03 AM, lancezhange lancezha...@gmail.com wrote: my prediction codes are simple enough as follows: *val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction = model.predict(point.features) (point.label, prediction) }* when model is the loaded one, above code just can't work. Can you catch the error? Thanks. PS. i use spark-shell under standalone mode, version 1.0.0 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Thu.
Re: How to save mllib model to hdfs and reload it
A man in this community give me a video: https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question in this community and other guys helped me to solve this problem. I'm trying to load MatrixFactorizationModel from object file, but compiler said that, I can not create object because the constructor is private. To solve this, I put my new object to same package as MatrixFactorizationModel. Luckly it works. On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen c...@adatao.com wrote: Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to isolate the cause to serialization of the loaded model. And also try to serialize the deserialized (loaded) model manually to see if that throws any visible exceptions. Sent while mobile. Pls excuse typos etc. On Aug 13, 2014 7:03 AM, lancezhange lancezha...@gmail.com wrote: my prediction codes are simple enough as follows: *val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction = model.predict(point.features) (point.label, prediction) }* when model is the loaded one, above code just can't work. Can you catch the error? Thanks. PS. i use spark-shell under standalone mode, version 1.0.0 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Thu.
Re: How to save mllib model to hdfs and reload it
Hi Hoai-Thu, the issue of private default constructor is unlikely the cause here, since Lance was already able to load/deserialize the model object. And on that side topic, I wish all serdes libraries would just use constructor.setAccessible(true) by default :-) Most of the time that privacy is not about serdes reflection restrictions. Sent while mobile. Pls excuse typos etc. On Aug 14, 2014 1:58 AM, Hoai-Thu Vuong thuv...@gmail.com wrote: A man in this community give me a video: https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question in this community and other guys helped me to solve this problem. I'm trying to load MatrixFactorizationModel from object file, but compiler said that, I can not create object because the constructor is private. To solve this, I put my new object to same package as MatrixFactorizationModel. Luckly it works. On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen c...@adatao.com wrote: Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to isolate the cause to serialization of the loaded model. And also try to serialize the deserialized (loaded) model manually to see if that throws any visible exceptions. Sent while mobile. Pls excuse typos etc. On Aug 13, 2014 7:03 AM, lancezhange lancezha...@gmail.com wrote: my prediction codes are simple enough as follows: *val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction = model.predict(point.features) (point.label, prediction) }* when model is the loaded one, above code just can't work. Can you catch the error? Thanks. PS. i use spark-shell under standalone mode, version 1.0.0 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Thu.
Re: How to save mllib model to hdfs and reload it
I think I can reproduce this error. The following code cannot work and report Foo cannot be serialized. (log in gist https://gist.github.com/zsxwing/4f9f17201d4378fe3e16): class Foo { def foo() = Array(1.0) } val t = new Foo val m = t.foo val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_ + m(0)) r2.toArray But the following code can work (log in gist https://gist.github.com/zsxwing/802cade0facb36a37656): class Foo { def foo() = Array(1.0) } var m: Array[Double] = null { val t = new Foo m = t.foo } val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_ + m(0)) r2.toArray Best Regards, Shixiong Zhu 2014-08-14 22:11 GMT+08:00 Christopher Nguyen c...@adatao.com: Hi Hoai-Thu, the issue of private default constructor is unlikely the cause here, since Lance was already able to load/deserialize the model object. And on that side topic, I wish all serdes libraries would just use constructor.setAccessible(true) by default :-) Most of the time that privacy is not about serdes reflection restrictions. Sent while mobile. Pls excuse typos etc. On Aug 14, 2014 1:58 AM, Hoai-Thu Vuong thuv...@gmail.com wrote: A man in this community give me a video: https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question in this community and other guys helped me to solve this problem. I'm trying to load MatrixFactorizationModel from object file, but compiler said that, I can not create object because the constructor is private. To solve this, I put my new object to same package as MatrixFactorizationModel. Luckly it works. On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen c...@adatao.com wrote: Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to isolate the cause to serialization of the loaded model. And also try to serialize the deserialized (loaded) model manually to see if that throws any visible exceptions. Sent while mobile. Pls excuse typos etc. On Aug 13, 2014 7:03 AM, lancezhange lancezha...@gmail.com wrote: my prediction codes are simple enough as follows: *val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction = model.predict(point.features) (point.label, prediction) }* when model is the loaded one, above code just can't work. Can you catch the error? Thanks. PS. i use spark-shell under standalone mode, version 1.0.0 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Thu.
Re: How to save mllib model to hdfs and reload it
Following codes works, too class Foo1 extends Serializable { def foo() = Array(1.0) } val t1 = new Foo1 val m1 = t1.foo val r11 = sc.parallelize(List(1, 2, 3)) val r22 = r11.map(_ + m1(0)) r22.toArray On Thu, Aug 14, 2014 at 10:55 PM, Shixiong Zhu [via Apache Spark User List] ml-node+s1001560n12112...@n3.nabble.com wrote: I think I can reproduce this error. The following code cannot work and report Foo cannot be serialized. (log in gist https://gist.github.com/zsxwing/4f9f17201d4378fe3e16): class Foo { def foo() = Array(1.0) } val t = new Foo val m = t.foo val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_ + m(0)) r2.toArray But the following code can work (log in gist https://gist.github.com/zsxwing/802cade0facb36a37656): class Foo { def foo() = Array(1.0) } var m: Array[Double] = null { val t = new Foo m = t.foo } val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_ + m(0)) r2.toArray Best Regards, Shixiong Zhu 2014-08-14 22:11 GMT+08:00 Christopher Nguyen [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=0: Hi Hoai-Thu, the issue of private default constructor is unlikely the cause here, since Lance was already able to load/deserialize the model object. And on that side topic, I wish all serdes libraries would just use constructor.setAccessible(true) by default :-) Most of the time that privacy is not about serdes reflection restrictions. Sent while mobile. Pls excuse typos etc. On Aug 14, 2014 1:58 AM, Hoai-Thu Vuong [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=1 wrote: A man in this community give me a video: https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question in this community and other guys helped me to solve this problem. I'm trying to load MatrixFactorizationModel from object file, but compiler said that, I can not create object because the constructor is private. To solve this, I put my new object to same package as MatrixFactorizationModel. Luckly it works. On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=2 wrote: Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to isolate the cause to serialization of the loaded model. And also try to serialize the deserialized (loaded) model manually to see if that throws any visible exceptions. Sent while mobile. Pls excuse typos etc. On Aug 13, 2014 7:03 AM, lancezhange [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=3 wrote: my prediction codes are simple enough as follows: *val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction = model.predict(point.features) (point.label, prediction) }* when model is the loaded one, above code just can't work. Can you catch the error? Thanks. PS. i use spark-shell under standalone mode, version 1.0.0 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=4 For additional commands, e-mail: [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=5 -- Thu. -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12112.html To unsubscribe from How to save mllib model to hdfs and reload it, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=11953code=bGFuY2V6aGFuZ2VAZ21haWwuY29tfDExOTUzfDEyOTg0NDQwMjM= . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- -- 张喜升 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12114.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: How to save mllib model to hdfs and reload it
I think in the following case class Foo { def foo() = Array(1.0) } val t = new Foo val m = t.foo val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_ + m(0)) r2.toArray Spark should not serialize t. But looks it will. Best Regards, Shixiong Zhu 2014-08-14 23:22 GMT+08:00 lancezhange lancezha...@gmail.com: Following codes works, too class Foo1 extends Serializable { def foo() = Array(1.0) } val t1 = new Foo1 val m1 = t1.foo val r11 = sc.parallelize(List(1, 2, 3)) val r22 = r11.map(_ + m1(0)) r22.toArray On Thu, Aug 14, 2014 at 10:55 PM, Shixiong Zhu [via Apache Spark User List] [hidden email] http://user/SendEmail.jtp?type=nodenode=12114i=0 wrote: I think I can reproduce this error. The following code cannot work and report Foo cannot be serialized. (log in gist https://gist.github.com/zsxwing/4f9f17201d4378fe3e16): class Foo { def foo() = Array(1.0) } val t = new Foo val m = t.foo val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_ + m(0)) r2.toArray But the following code can work (log in gist https://gist.github.com/zsxwing/802cade0facb36a37656): class Foo { def foo() = Array(1.0) } var m: Array[Double] = null { val t = new Foo m = t.foo } val r1 = sc.parallelize(List(1, 2, 3)) val r2 = r1.map(_ + m(0)) r2.toArray Best Regards, Shixiong Zhu 2014-08-14 22:11 GMT+08:00 Christopher Nguyen [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=0: Hi Hoai-Thu, the issue of private default constructor is unlikely the cause here, since Lance was already able to load/deserialize the model object. And on that side topic, I wish all serdes libraries would just use constructor.setAccessible(true) by default :-) Most of the time that privacy is not about serdes reflection restrictions. Sent while mobile. Pls excuse typos etc. On Aug 14, 2014 1:58 AM, Hoai-Thu Vuong [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=1 wrote: A man in this community give me a video: https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question in this community and other guys helped me to solve this problem. I'm trying to load MatrixFactorizationModel from object file, but compiler said that, I can not create object because the constructor is private. To solve this, I put my new object to same package as MatrixFactorizationModel. Luckly it works. On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=2 wrote: Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to isolate the cause to serialization of the loaded model. And also try to serialize the deserialized (loaded) model manually to see if that throws any visible exceptions. Sent while mobile. Pls excuse typos etc. On Aug 13, 2014 7:03 AM, lancezhange [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=3 wrote: my prediction codes are simple enough as follows: *val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction = model.predict(point.features) (point.label, prediction) }* when model is the loaded one, above code just can't work. Can you catch the error? Thanks. PS. i use spark-shell under standalone mode, version 1.0.0 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=4 For additional commands, e-mail: [hidden email] http://user/SendEmail.jtp?type=nodenode=12112i=5 -- Thu. -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12112.html To unsubscribe from How to save mllib model to hdfs and reload it, click here. NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- -- 张喜升 -- View this message in context: Re: How to save mllib model to hdfs and reload it http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12114.html Sent from the Apache Spark User List mailing list archive http://apache-spark-user-list.1001560.n3.nabble.com/ at
Re: How to save mllib model to hdfs and reload it
I finally solved the problem by following code var m: org.apache.spark.mllib.classification.LogisticRegressionModel = null m = newModel // newModel is the loaded one, see above post of mine val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction = m.predict(point.features) (point.label, prediction) } // this works! Thanks Shixiong for his heuristic codes which lead me to this solution. btw, accoring to this git commit https://www.mail-archive.com/commits@spark.apache.org/msg01988.html , private[mllib] will be removed from linear models' constructors,This is part of SPARK-2495 to allow users construct linear models manually -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12154.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to save mllib model to hdfs and reload it
let's say you have a model which is of class org.apache.spark.mllib.classification.LogisticRegressionModel you can save model to disk as following: /import java.io.FileOutputStream import java.io.ObjectOutputStream val fos = new FileOutputStream(e:/model.obj) val oos = new ObjectOutputStream(fos) oos.writeObject(model) oos.close/ and load it in: /import java.io.FileInputStream import java.io.ObjectInputStream val fos = new FileInputStream(e:/model.obj) val oos = new ObjectInputStream(fos) val newModel = oos.readObject().asInstanceOf[org.apache.spark.mllib.classification.LogisticRegressionModel]/ you can check that '/newModel.weights/' gives you the weights, implying that newModel is loaded successfully. There remains, however, another problem, which confuses me badly: when i use the loaded newModel to predict on LabeledPoints, there is always a Task not serializable exception! Detailed logs: INFO DAGScheduler: Failed to run count at console:49 org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSeri at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndInd at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissing at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(D at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGSch at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) in 2646 ms on localhost (progress: 345/345) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)ed in 528.389 s at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)ed, from pool at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Any help here? PS. any one knows the *constructor function* of the model assuming you have weights and intercept? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12030.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to save mllib model to hdfs and reload it
Hi, I have faced a similar issue when trying to run a map function with predict. In my case I had some non-serializable fields in my calling class. After making those fields transient, the error went away. On Wed, Aug 13, 2014 at 6:39 PM, lancezhange lancezha...@gmail.com wrote: let's say you have a model which is of class org.apache.spark.mllib.classification.LogisticRegressionModel you can save model to disk as following: /import java.io.FileOutputStream import java.io.ObjectOutputStream val fos = new FileOutputStream(e:/model.obj) val oos = new ObjectOutputStream(fos) oos.writeObject(model) oos.close/ and load it in: /import java.io.FileInputStream import java.io.ObjectInputStream val fos = new FileInputStream(e:/model.obj) val oos = new ObjectInputStream(fos) val newModel = oos.readObject().asInstanceOf[org.apache.spark.mllib.classification.LogisticRegressionModel]/ you can check that '/newModel.weights/' gives you the weights, implying that newModel is loaded successfully. There remains, however, another problem, which confuses me badly: when i use the loaded newModel to predict on LabeledPoints, there is always a Task not serializable exception! Detailed logs: INFO DAGScheduler: Failed to run count at console:49 org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSeri at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndInd at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitMissing at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitStage(D at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGSch at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) in 2646 ms on localhost (progress: 345/345) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)ed in 528.389 s at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)ed, from pool at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Any help here? PS. any one knows the *constructor function* of the model assuming you have weights and intercept? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12030.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
Re: How to save mllib model to hdfs and reload it
PS I think that solving not serializable exceptions by adding 'transient' is usually a mistake. It's a band-aid on a design problem. transient causes the default serialization mechanism to not serialize the field when the object is serialized. When deserialized, this field will be null, which often compromises the class's assumptions about state. This keyword is only appropriate when the field can safely be recreated at any time -- things like cached values. In Java, this commonly comes up when declaring anonymous (therefore non-static) inner classes, which have an invisible reference to the containing instance, which can easily cause it to serialize the enclosing class when it's not necessary at all. Inner classes should be static in this case, if possible. Passing values as constructor params takes more code but let you tightly control what the function references. On Wed, Aug 13, 2014 at 2:47 PM, Jaideep Dhok jaideep.d...@inmobi.com wrote: Hi, I have faced a similar issue when trying to run a map function with predict. In my case I had some non-serializable fields in my calling class. After making those fields transient, the error went away. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to save mllib model to hdfs and reload it
my prediction codes are simple enough as follows: *val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction = model.predict(point.features) (point.label, prediction) }* when model is the loaded one, above code just can't work. Can you catch the error? Thanks. PS. i use spark-shell under standalone mode, version 1.0.0 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to save mllib model to hdfs and reload it
+1 what Sean said. And if there are too many state/argument parameters for your taste, you can always create a dedicated (serializable) class to encapsulate them. Sent while mobile. Pls excuse typos etc. On Aug 13, 2014 6:58 AM, Sean Owen so...@cloudera.com wrote: PS I think that solving not serializable exceptions by adding 'transient' is usually a mistake. It's a band-aid on a design problem. transient causes the default serialization mechanism to not serialize the field when the object is serialized. When deserialized, this field will be null, which often compromises the class's assumptions about state. This keyword is only appropriate when the field can safely be recreated at any time -- things like cached values. In Java, this commonly comes up when declaring anonymous (therefore non-static) inner classes, which have an invisible reference to the containing instance, which can easily cause it to serialize the enclosing class when it's not necessary at all. Inner classes should be static in this case, if possible. Passing values as constructor params takes more code but let you tightly control what the function references. On Wed, Aug 13, 2014 at 2:47 PM, Jaideep Dhok jaideep.d...@inmobi.com wrote: Hi, I have faced a similar issue when trying to run a map function with predict. In my case I had some non-serializable fields in my calling class. After making those fields transient, the error went away. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to save mllib model to hdfs and reload it
Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to isolate the cause to serialization of the loaded model. And also try to serialize the deserialized (loaded) model manually to see if that throws any visible exceptions. Sent while mobile. Pls excuse typos etc. On Aug 13, 2014 7:03 AM, lancezhange lancezha...@gmail.com wrote: my prediction codes are simple enough as follows: *val labelsAndPredsOnGoodData = goodDataPoints.map { point = val prediction = model.predict(point.features) (point.label, prediction) }* when model is the loaded one, above code just can't work. Can you catch the error? Thanks. PS. i use spark-shell under standalone mode, version 1.0.0 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to save mllib model to hdfs and reload it
For linear models, the constructors are now public. You can save the weights to HDFS, then load the weights back and use the constructor to create the model. -Xiangrui On Mon, Aug 11, 2014 at 10:27 PM, XiaoQinyu xiaoqinyu_sp...@outlook.com wrote: hello: I want to know,if I use history data to training model and I want to use this model in other app.How should I do? Should I save this model in disk? And when I use this model then load it from disk.But I don't know how to save the mllib model,and reload it? I will be very pleasure,if anyone can give some tips. Thanks XiaoQinyu -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org