subject:"Re\: How to save mllib model to hdfs and reload it"

Re: How to save mllib model to hdfs and reload it

2014-09-13 Thread Yanbo Liang

Shixiong,
These two snippets behave different in Scala.
In the second snippet, you define variable named m and does evaluate the
right hand size as part of the definition.
In other words,  the variable was replaced by the pre-computed value of
Array(1.0) in the subsequently code.
So in the second snippet, you do not need to serialize class and it can
work well even in distributed environment because it only send the
pre-computed value rather than the whole class to different execute nodes.

2014-08-14 22:54 GMT+08:00 Shixiong Zhu zsxw...@gmail.com:

 I think I can reproduce this error.

 The following code cannot work and report Foo cannot be serialized. (log
 in gist https://gist.github.com/zsxwing/4f9f17201d4378fe3e16):

 class Foo { def foo() = Array(1.0) }
 val t = new Foo
 val m = t.foo
 val r1 = sc.parallelize(List(1, 2, 3))
 val r2 = r1.map(_ + m(0))
 r2.toArray

 But the following code can work (log in gist
 https://gist.github.com/zsxwing/802cade0facb36a37656):

  class Foo { def foo() = Array(1.0) }
 var m: Array[Double] = null
 {
 val t = new Foo
 m = t.foo
 }
 val r1 = sc.parallelize(List(1, 2, 3))
 val r2 = r1.map(_ + m(0))
 r2.toArray


 Best Regards,
 Shixiong Zhu


 2014-08-14 22:11 GMT+08:00 Christopher Nguyen c...@adatao.com:

 Hi Hoai-Thu, the issue of private default constructor is unlikely the
 cause here, since Lance was already able to load/deserialize the model
 object.

 And on that side topic, I wish all serdes libraries would just use
 constructor.setAccessible(true) by default :-) Most of the time that
 privacy is not about serdes reflection restrictions.

 Sent while mobile. Pls excuse typos etc.
 On Aug 14, 2014 1:58 AM, Hoai-Thu Vuong thuv...@gmail.com wrote:

 A man in this community give me a video:
 https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question
 in this community and other guys helped me to solve this problem. I'm
 trying to load MatrixFactorizationModel from object file, but compiler said
 that, I can not create object because the constructor is private. To solve
 this, I put my new object to same package as MatrixFactorizationModel.
 Luckly it works.


 On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen c...@adatao.com
 wrote:

 Lance, some debugging ideas: you might try model.predict(RDD[Vector])
 to isolate the cause to serialization of the loaded model. And also try to
 serialize the deserialized (loaded) model manually to see if that throws
 any visible exceptions.

 Sent while mobile. Pls excuse typos etc.
 On Aug 13, 2014 7:03 AM, lancezhange lancezha...@gmail.com wrote:

 my prediction codes are simple enough as follows:

   *val labelsAndPredsOnGoodData = goodDataPoints.map { point =
   val prediction = model.predict(point.features)
   (point.label, prediction)
   }*

 when model is the loaded one, above code just can't work. Can you
 catch the
 error?
 Thanks.

 PS. i use spark-shell under standalone mode, version 1.0.0




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html
 Sent from the Apache Spark User List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --
 Thu.

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread Hoai-Thu Vuong

A man in this community give me a video:
https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question in
this community and other guys helped me to solve this problem. I'm trying
to load MatrixFactorizationModel from object file, but compiler said that,
I can not create object because the constructor is private. To solve this,
I put my new object to same package as MatrixFactorizationModel. Luckly it
works.

On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen c...@adatao.com wrote:

Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to
isolate the cause to serialization of the loaded model. And also try to
serialize the deserialized (loaded) model manually to see if that throws
any visible exceptions.

Sent while mobile. Pls excuse typos etc.
On Aug 13, 2014 7:03 AM, lancezhange lancezha...@gmail.com wrote:

my prediction codes are simple enough as follows:

*val labelsAndPredsOnGoodData = goodDataPoints.map { point =
val prediction = model.predict(point.features)
(point.label, prediction)
}*

when model is the loaded one, above code just can't work. Can you catch
the
error?
Thanks.

PS. i use spark-shell under standalone mode, version 1.0.0

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

--
Thu.

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread Christopher Nguyen

Hi Hoai-Thu, the issue of private default constructor is unlikely the cause
here, since Lance was already able to load/deserialize the model object.

And on that side topic, I wish all serdes libraries would just use
constructor.setAccessible(true) by default :-) Most of the time that
privacy is not about serdes reflection restrictions.

Sent while mobile. Pls excuse typos etc.
On Aug 14, 2014 1:58 AM, Hoai-Thu Vuong thuv...@gmail.com wrote:

On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen c...@adatao.com
wrote:

Sent while mobile. Pls excuse typos etc.
On Aug 13, 2014 7:03 AM, lancezhange lancezha...@gmail.com wrote:

my prediction codes are simple enough as follows:

*val labelsAndPredsOnGoodData = goodDataPoints.map { point =
val prediction = model.predict(point.features)
(point.label, prediction)
}*

when model is the loaded one, above code just can't work. Can you catch
the
error?
Thanks.

PS. i use spark-shell under standalone mode, version 1.0.0

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

--
Thu.

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread Shixiong Zhu

I think I can reproduce this error.

The following code cannot work and report Foo cannot be serialized. (log
in gist https://gist.github.com/zsxwing/4f9f17201d4378fe3e16):

class Foo { def foo() = Array(1.0) }
val t = new Foo
val m = t.foo
val r1 = sc.parallelize(List(1, 2, 3))
val r2 = r1.map(_ + m(0))
r2.toArray

But the following code can work (log in gist
https://gist.github.com/zsxwing/802cade0facb36a37656):

class Foo { def foo() = Array(1.0) }
var m: Array[Double] = null
{
val t = new Foo
m = t.foo
}
val r1 = sc.parallelize(List(1, 2, 3))
val r2 = r1.map(_ + m(0))
r2.toArray


Best Regards,
Shixiong Zhu


2014-08-14 22:11 GMT+08:00 Christopher Nguyen c...@adatao.com:

 Hi Hoai-Thu, the issue of private default constructor is unlikely the
 cause here, since Lance was already able to load/deserialize the model
 object.

 And on that side topic, I wish all serdes libraries would just use
 constructor.setAccessible(true) by default :-) Most of the time that
 privacy is not about serdes reflection restrictions.

 Sent while mobile. Pls excuse typos etc.
 On Aug 14, 2014 1:58 AM, Hoai-Thu Vuong thuv...@gmail.com wrote:

 A man in this community give me a video:
 https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question in
 this community and other guys helped me to solve this problem. I'm trying
 to load MatrixFactorizationModel from object file, but compiler said that,
 I can not create object because the constructor is private. To solve this,
 I put my new object to same package as MatrixFactorizationModel. Luckly it
 works.


 On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen c...@adatao.com
 wrote:

 Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to
 isolate the cause to serialization of the loaded model. And also try to
 serialize the deserialized (loaded) model manually to see if that throws
 any visible exceptions.

 Sent while mobile. Pls excuse typos etc.
 On Aug 13, 2014 7:03 AM, lancezhange lancezha...@gmail.com wrote:

 my prediction codes are simple enough as follows:

   *val labelsAndPredsOnGoodData = goodDataPoints.map { point =
   val prediction = model.predict(point.features)
   (point.label, prediction)
   }*

 when model is the loaded one, above code just can't work. Can you catch
 the
 error?
 Thanks.

 PS. i use spark-shell under standalone mode, version 1.0.0




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --
 Thu.

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread lancezhange

Following codes  works, too

class Foo1 extends Serializable { def foo() = Array(1.0) }
val t1 = new Foo1
val m1 = t1.foo
val r11 = sc.parallelize(List(1, 2, 3))
val r22 = r11.map(_ + m1(0))
r22.toArray





On Thu, Aug 14, 2014 at 10:55 PM, Shixiong Zhu [via Apache Spark User List]
ml-node+s1001560n12112...@n3.nabble.com wrote:

 I think I can reproduce this error.

 The following code cannot work and report Foo cannot be serialized. (log
 in gist https://gist.github.com/zsxwing/4f9f17201d4378fe3e16):

 class Foo { def foo() = Array(1.0) }
 val t = new Foo
 val m = t.foo
 val r1 = sc.parallelize(List(1, 2, 3))
 val r2 = r1.map(_ + m(0))
 r2.toArray

 But the following code can work (log in gist
 https://gist.github.com/zsxwing/802cade0facb36a37656):

  class Foo { def foo() = Array(1.0) }
 var m: Array[Double] = null
 {
 val t = new Foo
 m = t.foo
 }
 val r1 = sc.parallelize(List(1, 2, 3))
 val r2 = r1.map(_ + m(0))
 r2.toArray


 Best Regards,
 Shixiong Zhu


 2014-08-14 22:11 GMT+08:00 Christopher Nguyen [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=0:

 Hi Hoai-Thu, the issue of private default constructor is unlikely the
 cause here, since Lance was already able to load/deserialize the model
 object.

 And on that side topic, I wish all serdes libraries would just use
 constructor.setAccessible(true) by default :-) Most of the time that
 privacy is not about serdes reflection restrictions.

 Sent while mobile. Pls excuse typos etc.
 On Aug 14, 2014 1:58 AM, Hoai-Thu Vuong [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=1 wrote:

 A man in this community give me a video:
 https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question
 in this community and other guys helped me to solve this problem. I'm
 trying to load MatrixFactorizationModel from object file, but compiler said
 that, I can not create object because the constructor is private. To solve
 this, I put my new object to same package as MatrixFactorizationModel.
 Luckly it works.


 On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=2 wrote:

 Lance, some debugging ideas: you might try model.predict(RDD[Vector])
 to isolate the cause to serialization of the loaded model. And also try to
 serialize the deserialized (loaded) model manually to see if that throws
 any visible exceptions.

 Sent while mobile. Pls excuse typos etc.
 On Aug 13, 2014 7:03 AM, lancezhange [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=3 wrote:

 my prediction codes are simple enough as follows:

   *val labelsAndPredsOnGoodData = goodDataPoints.map { point =
   val prediction = model.predict(point.features)
   (point.label, prediction)
   }*

 when model is the loaded one, above code just can't work. Can you
 catch the
 error?
 Thanks.

 PS. i use spark-shell under standalone mode, version 1.0.0




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html
 Sent from the Apache Spark User List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=4
 For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=5




 --
 Thu.




 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12112.html
  To unsubscribe from How to save mllib model to hdfs and reload it, click
 here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=11953code=bGFuY2V6aGFuZ2VAZ21haWwuY29tfDExOTUzfDEyOTg0NDQwMjM=
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
 -- 张喜升




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12114.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread Shixiong Zhu

I think in the following case

class Foo { def foo() = Array(1.0) }
val t = new Foo
val m = t.foo
val r1 = sc.parallelize(List(1, 2, 3))
val r2 = r1.map(_ + m(0))
r2.toArray

Spark should not serialize t. But looks it will.


Best Regards,
Shixiong Zhu


2014-08-14 23:22 GMT+08:00 lancezhange lancezha...@gmail.com:

 Following codes  works, too

 class Foo1 extends Serializable { def foo() = Array(1.0) }
 val t1 = new Foo1
 val m1 = t1.foo
 val r11 = sc.parallelize(List(1, 2, 3))
 val r22 = r11.map(_ + m1(0))
 r22.toArray





 On Thu, Aug 14, 2014 at 10:55 PM, Shixiong Zhu [via Apache Spark User
 List] [hidden email] http://user/SendEmail.jtp?type=nodenode=12114i=0
  wrote:

 I think I can reproduce this error.

 The following code cannot work and report Foo cannot be
 serialized. (log in gist
 https://gist.github.com/zsxwing/4f9f17201d4378fe3e16):

 class Foo { def foo() = Array(1.0) }
 val t = new Foo
 val m = t.foo
 val r1 = sc.parallelize(List(1, 2, 3))
 val r2 = r1.map(_ + m(0))
 r2.toArray

 But the following code can work (log in gist
 https://gist.github.com/zsxwing/802cade0facb36a37656):

  class Foo { def foo() = Array(1.0) }
 var m: Array[Double] = null
 {
 val t = new Foo
 m = t.foo
 }
 val r1 = sc.parallelize(List(1, 2, 3))
 val r2 = r1.map(_ + m(0))
 r2.toArray


 Best Regards,
 Shixiong Zhu


 2014-08-14 22:11 GMT+08:00 Christopher Nguyen [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=0:

 Hi Hoai-Thu, the issue of private default constructor is unlikely the
 cause here, since Lance was already able to load/deserialize the model
 object.

 And on that side topic, I wish all serdes libraries would just use
 constructor.setAccessible(true) by default :-) Most of the time that
 privacy is not about serdes reflection restrictions.

 Sent while mobile. Pls excuse typos etc.
 On Aug 14, 2014 1:58 AM, Hoai-Thu Vuong [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=1 wrote:

 A man in this community give me a video:
 https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question
 in this community and other guys helped me to solve this problem. I'm
 trying to load MatrixFactorizationModel from object file, but compiler said
 that, I can not create object because the constructor is private. To solve
 this, I put my new object to same package as MatrixFactorizationModel.
 Luckly it works.


 On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=2 wrote:

 Lance, some debugging ideas: you might try model.predict(RDD[Vector])
 to isolate the cause to serialization of the loaded model. And also try to
 serialize the deserialized (loaded) model manually to see if that throws
 any visible exceptions.

 Sent while mobile. Pls excuse typos etc.
 On Aug 13, 2014 7:03 AM, lancezhange [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=3 wrote:

 my prediction codes are simple enough as follows:

   *val labelsAndPredsOnGoodData = goodDataPoints.map { point =
   val prediction = model.predict(point.features)
   (point.label, prediction)
   }*

 when model is the loaded one, above code just can't work. Can you
 catch the
 error?
 Thanks.

 PS. i use spark-shell under standalone mode, version 1.0.0




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html
 Sent from the Apache Spark User List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=4
 For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=12112i=5




 --
 Thu.




 --
  If you reply to this email, your message will be added to the
 discussion below:

 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12112.html
  To unsubscribe from How to save mllib model to hdfs and reload it, click
 here.
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




 --
  -- 张喜升

 --
 View this message in context: Re: How to save mllib model to hdfs and
 reload it
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12114.html

 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread lancezhange

I finally solved the problem by following code 

var m: org.apache.spark.mllib.classification.LogisticRegressionModel = null

m = newModel   // newModel is the loaded one, see above post of mine

val labelsAndPredsOnGoodData = goodDataPoints.map { point =
  val prediction = m.predict(point.features)
  (point.label, prediction)
}  // this works!

Thanks  Shixiong for his heuristic codes which lead me to this solution.

btw, accoring to this  git commit
https://www.mail-archive.com/commits@spark.apache.org/msg01988.html  ,
private[mllib] will be removed  from linear models' constructors,This is
part of SPARK-2495 to allow users construct linear models manually



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12154.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread lancezhange

let's say you have a model which is of class
org.apache.spark.mllib.classification.LogisticRegressionModel
you can save model to disk as following:
 
  /import java.io.FileOutputStream
  import java.io.ObjectOutputStream
  val fos = new FileOutputStream(e:/model.obj) 
  val oos = new ObjectOutputStream(fos)   
  oos.writeObject(model)   
  oos.close/

and load it in: 
  /import java.io.FileInputStream
  import java.io.ObjectInputStream
  val fos = new FileInputStream(e:/model.obj)
  val oos = new ObjectInputStream(fos)
  val newModel =
oos.readObject().asInstanceOf[org.apache.spark.mllib.classification.LogisticRegressionModel]/

you can check that '/newModel.weights/' gives you the weights, implying that
newModel is loaded successfully.

There remains, however, another problem, which confuses me badly: when i use
the loaded newModel to predict on LabeledPoints, there is always a Task not
serializable exception! Detailed logs:
INFO DAGScheduler: Failed to run count at console:49
org.apache.spark.SparkException: Job aborted due to stage failure: Task not
serializable: java.io.NotSeri
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndInd
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissing
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(D
at
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGSch
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219) in 2646 ms on
localhost (progress: 345/345)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)ed in
528.389 s
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)ed,
from pool
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Any help here?
 PS. any one knows the *constructor function* of the model assuming you have
weights and intercept?
  
 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12030.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread Jaideep Dhok

Hi,
I have faced a similar issue when trying to run a map function with
predict. In my case I had some non-serializable fields in my calling class.
After making those fields transient, the error went away.


On Wed, Aug 13, 2014 at 6:39 PM, lancezhange lancezha...@gmail.com wrote:

 let's say you have a model which is of class
 org.apache.spark.mllib.classification.LogisticRegressionModel
 you can save model to disk as following:

   /import java.io.FileOutputStream
   import java.io.ObjectOutputStream
   val fos = new FileOutputStream(e:/model.obj)
   val oos = new ObjectOutputStream(fos)
   oos.writeObject(model)
   oos.close/

 and load it in:
   /import java.io.FileInputStream
   import java.io.ObjectInputStream
   val fos = new FileInputStream(e:/model.obj)
   val oos = new ObjectInputStream(fos)
   val newModel =

 oos.readObject().asInstanceOf[org.apache.spark.mllib.classification.LogisticRegressionModel]/

 you can check that '/newModel.weights/' gives you the weights, implying
 that
 newModel is loaded successfully.

 There remains, however, another problem, which confuses me badly: when i
 use
 the loaded newModel to predict on LabeledPoints, there is always a Task
 not
 serializable exception! Detailed logs:
 INFO DAGScheduler: Failed to run count at console:49
 org.apache.spark.SparkException: Job aborted due to stage failure: Task not
 serializable: java.io.NotSeri
 at
 org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$failJobAndInd
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
 at

 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
 at
 org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$submitMissing
 at
 org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$submitStage(D
 at

 org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:697)
 at

 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGSch
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219) in 2646 ms on
 localhost (progress: 345/345)
 at

 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)ed in
 528.389 s
 at

 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)ed,
 from pool
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at

 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 Any help here?
  PS. any one knows the *constructor function* of the model assuming you
 have
 weights and intercept?





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12030.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-- 
_
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread Sean Owen

PS I think that solving not serializable exceptions by adding
'transient' is usually a mistake. It's a band-aid on a design problem.

transient causes the default serialization mechanism to not serialize
the field when the object is serialized. When deserialized, this field
will be null, which often compromises the class's assumptions about
state. This keyword is only appropriate when the field can safely be
recreated at any time -- things like cached values.

In Java, this commonly comes up when declaring anonymous (therefore
non-static) inner classes, which have an invisible reference to the
containing instance, which can easily cause it to serialize the
enclosing class when it's not necessary at all.

Inner classes should be static in this case, if possible. Passing
values as constructor params takes more code but let you tightly
control what the function references.

On Wed, Aug 13, 2014 at 2:47 PM, Jaideep Dhok jaideep.d...@inmobi.com wrote:
 Hi,
 I have faced a similar issue when trying to run a map function with predict.
 In my case I had some non-serializable fields in my calling class. After
 making those fields transient, the error went away.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread lancezhange

my prediction codes are simple enough as follows:

  *val labelsAndPredsOnGoodData = goodDataPoints.map { point =
  val prediction = model.predict(point.features)
  (point.label, prediction)
  }*

when model is the loaded one, above code just can't work. Can you catch the
error?
Thanks.

PS. i use spark-shell under standalone mode, version 1.0.0




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread Christopher Nguyen

+1 what Sean said. And if there are too many state/argument parameters for
your taste, you can always create a dedicated (serializable) class to
encapsulate them.

Sent while mobile. Pls excuse typos etc.
On Aug 13, 2014 6:58 AM, Sean Owen so...@cloudera.com wrote:

 PS I think that solving not serializable exceptions by adding
 'transient' is usually a mistake. It's a band-aid on a design problem.

 transient causes the default serialization mechanism to not serialize
 the field when the object is serialized. When deserialized, this field
 will be null, which often compromises the class's assumptions about
 state. This keyword is only appropriate when the field can safely be
 recreated at any time -- things like cached values.

 In Java, this commonly comes up when declaring anonymous (therefore
 non-static) inner classes, which have an invisible reference to the
 containing instance, which can easily cause it to serialize the
 enclosing class when it's not necessary at all.

 Inner classes should be static in this case, if possible. Passing
 values as constructor params takes more code but let you tightly
 control what the function references.

 On Wed, Aug 13, 2014 at 2:47 PM, Jaideep Dhok jaideep.d...@inmobi.com
 wrote:
  Hi,
  I have faced a similar issue when trying to run a map function with
 predict.
  In my case I had some non-serializable fields in my calling class. After
  making those fields transient, the error went away.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: How to save mllib model to hdfs and reload it

2014-08-13 Thread Christopher Nguyen

Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to
isolate the cause to serialization of the loaded model. And also try to
serialize the deserialized (loaded) model manually to see if that throws
any visible exceptions.

Sent while mobile. Pls excuse typos etc.
On Aug 13, 2014 7:03 AM, lancezhange lancezha...@gmail.com wrote:

 my prediction codes are simple enough as follows:

   *val labelsAndPredsOnGoodData = goodDataPoints.map { point =
   val prediction = model.predict(point.features)
   (point.label, prediction)
   }*

 when model is the loaded one, above code just can't work. Can you catch the
 error?
 Thanks.

 PS. i use spark-shell under standalone mode, version 1.0.0




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: How to save mllib model to hdfs and reload it

2014-08-12 Thread Xiangrui Meng

For linear models, the constructors are now public. You can save the
weights to HDFS, then load the weights back and use the constructor to
create the model. -Xiangrui

On Mon, Aug 11, 2014 at 10:27 PM, XiaoQinyu xiaoqinyu_sp...@outlook.com wrote:
 hello:

 I want to know,if I use history data to training model and I want to use
 this model in other app.How should I do?

 Should I save this model in disk? And when I use this model then load it
 from disk.But I don't know how to save the mllib model,and reload it?

 I will be very pleasure,if anyone can give some tips.

 Thanks

 XiaoQinyu



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

Re: How to save mllib model to hdfs and reload it

14 matches

Site Navigation

Mail list logo

Footer information