Re: Serialize mllib's MatrixFactorizationModel
The thing about MatrixFactorizationModel, compared to other models, is that it is huge. It's not just a few coefficients, but whole RDDs of coefficients. I think you could save these RDDs of user/product factors to persistent storage, load them, then recreate the MatrixFactorizationModel that way. It's a bit manual, but works. This is probably why there is no standard PMML representation for this type of model. It is different from classic regression/classification models, and too big for XML. So efforts to export/import PMML are not relevant IMHO. On Mon, Dec 15, 2014 at 5:09 PM, Albert Manyà wrote: > In that case, what is the strategy to train a model in some background > batch process and make recommendations for some other service in real > time? Run both processes in the same spark cluster? > > Thanks. > > -- > Albert Manyà > alber...@eml.cc > > On Mon, Dec 15, 2014, at 05:58 PM, Sean Owen wrote: >> This class is not going to be serializable, as it contains huge RDDs. >> Even if the right constructor existed the RDDs inside would not >> serialize. >> >> On Mon, Dec 15, 2014 at 4:33 PM, Albert Manyà wrote: >> > Hi all. >> > >> > I'm willing to serialize and later load a model trained using mllib's >> > ALS. >> > >> > I've tried usign Java serialization with something like: >> > >> > val model = ALS.trainImplicit(training, rank, numIter, lambda, 1) >> > val fos = new FileOutputStream("model.bin") >> > val oos = new ObjectOutputStream(fos) >> > oos.writeObject(bestModel.get) >> > >> > But when I try to deserialize it using: >> > >> > val fos = new FileInputStream("model.bin") >> > val oos = new ObjectInputStream(fos) >> > val model = oos.readObject().asInstanceOf[MatrixFactorizationModel] >> > >> > I get the error: >> > >> > Exception in thread "main" java.io.IOException: PARSING_ERROR(2) >> > >> > I've also tried to serialize MatrixFactorizationModel's both RDDs >> > (products and users) and later create the MatrixFactorizationModel by >> > hand passing the RDDs by constructor but I get an error cause its >> > private: >> > >> > Error:(58, 17) constructor MatrixFactorizationModel in class >> > MatrixFactorizationModel cannot be accessed in object RecommendALS >> > val model = new MatrixFactorizationModel (8, userFeatures, >> > productFeatures) >> > >> > Any ideas? >> > >> > Thanks! >> > >> > -- >> > Albert Manyà >> > alber...@eml.cc >> > >> > - >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> > For additional commands, e-mail: user-h...@spark.apache.org >> > >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Serialize mllib's MatrixFactorizationModel
Hi Albert, There is some discussion going on here: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tc20324.html#a20674 I am also looking for this solution.But looks like until mllib pmml export is ready, there is no full proof solution to export the mllib trained model to a different system. Thanks Sourabh On Mon, Dec 15, 2014 at 10:39 PM, Albert Manyà wrote: > > In that case, what is the strategy to train a model in some background > batch process and make recommendations for some other service in real > time? Run both processes in the same spark cluster? > > Thanks. > > -- > Albert Manyà > alber...@eml.cc > > On Mon, Dec 15, 2014, at 05:58 PM, Sean Owen wrote: > > This class is not going to be serializable, as it contains huge RDDs. > > Even if the right constructor existed the RDDs inside would not > > serialize. > > > > On Mon, Dec 15, 2014 at 4:33 PM, Albert Manyà wrote: > > > Hi all. > > > > > > I'm willing to serialize and later load a model trained using mllib's > > > ALS. > > > > > > I've tried usign Java serialization with something like: > > > > > > val model = ALS.trainImplicit(training, rank, numIter, lambda, 1) > > > val fos = new FileOutputStream("model.bin") > > > val oos = new ObjectOutputStream(fos) > > > oos.writeObject(bestModel.get) > > > > > > But when I try to deserialize it using: > > > > > > val fos = new FileInputStream("model.bin") > > > val oos = new ObjectInputStream(fos) > > > val model = oos.readObject().asInstanceOf[MatrixFactorizationModel] > > > > > > I get the error: > > > > > > Exception in thread "main" java.io.IOException: PARSING_ERROR(2) > > > > > > I've also tried to serialize MatrixFactorizationModel's both RDDs > > > (products and users) and later create the MatrixFactorizationModel by > > > hand passing the RDDs by constructor but I get an error cause its > > > private: > > > > > > Error:(58, 17) constructor MatrixFactorizationModel in class > > > MatrixFactorizationModel cannot be accessed in object RecommendALS > > > val model = new MatrixFactorizationModel (8, userFeatures, > > > productFeatures) > > > > > > Any ideas? > > > > > > Thanks! > > > > > > -- > > > Albert Manyà > > > alber...@eml.cc > > > > > > - > > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > > For additional commands, e-mail: user-h...@spark.apache.org > > > > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Serialize mllib's MatrixFactorizationModel
In that case, what is the strategy to train a model in some background batch process and make recommendations for some other service in real time? Run both processes in the same spark cluster? Thanks. -- Albert Manyà alber...@eml.cc On Mon, Dec 15, 2014, at 05:58 PM, Sean Owen wrote: > This class is not going to be serializable, as it contains huge RDDs. > Even if the right constructor existed the RDDs inside would not > serialize. > > On Mon, Dec 15, 2014 at 4:33 PM, Albert Manyà wrote: > > Hi all. > > > > I'm willing to serialize and later load a model trained using mllib's > > ALS. > > > > I've tried usign Java serialization with something like: > > > > val model = ALS.trainImplicit(training, rank, numIter, lambda, 1) > > val fos = new FileOutputStream("model.bin") > > val oos = new ObjectOutputStream(fos) > > oos.writeObject(bestModel.get) > > > > But when I try to deserialize it using: > > > > val fos = new FileInputStream("model.bin") > > val oos = new ObjectInputStream(fos) > > val model = oos.readObject().asInstanceOf[MatrixFactorizationModel] > > > > I get the error: > > > > Exception in thread "main" java.io.IOException: PARSING_ERROR(2) > > > > I've also tried to serialize MatrixFactorizationModel's both RDDs > > (products and users) and later create the MatrixFactorizationModel by > > hand passing the RDDs by constructor but I get an error cause its > > private: > > > > Error:(58, 17) constructor MatrixFactorizationModel in class > > MatrixFactorizationModel cannot be accessed in object RecommendALS > > val model = new MatrixFactorizationModel (8, userFeatures, > > productFeatures) > > > > Any ideas? > > > > Thanks! > > > > -- > > Albert Manyà > > alber...@eml.cc > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Serialize mllib's MatrixFactorizationModel
This class is not going to be serializable, as it contains huge RDDs. Even if the right constructor existed the RDDs inside would not serialize. On Mon, Dec 15, 2014 at 4:33 PM, Albert Manyà wrote: > Hi all. > > I'm willing to serialize and later load a model trained using mllib's > ALS. > > I've tried usign Java serialization with something like: > > val model = ALS.trainImplicit(training, rank, numIter, lambda, 1) > val fos = new FileOutputStream("model.bin") > val oos = new ObjectOutputStream(fos) > oos.writeObject(bestModel.get) > > But when I try to deserialize it using: > > val fos = new FileInputStream("model.bin") > val oos = new ObjectInputStream(fos) > val model = oos.readObject().asInstanceOf[MatrixFactorizationModel] > > I get the error: > > Exception in thread "main" java.io.IOException: PARSING_ERROR(2) > > I've also tried to serialize MatrixFactorizationModel's both RDDs > (products and users) and later create the MatrixFactorizationModel by > hand passing the RDDs by constructor but I get an error cause its > private: > > Error:(58, 17) constructor MatrixFactorizationModel in class > MatrixFactorizationModel cannot be accessed in object RecommendALS > val model = new MatrixFactorizationModel (8, userFeatures, > productFeatures) > > Any ideas? > > Thanks! > > -- > Albert Manyà > alber...@eml.cc > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Serialize mllib's MatrixFactorizationModel
Hi all. I'm willing to serialize and later load a model trained using mllib's ALS. I've tried usign Java serialization with something like: val model = ALS.trainImplicit(training, rank, numIter, lambda, 1) val fos = new FileOutputStream("model.bin") val oos = new ObjectOutputStream(fos) oos.writeObject(bestModel.get) But when I try to deserialize it using: val fos = new FileInputStream("model.bin") val oos = new ObjectInputStream(fos) val model = oos.readObject().asInstanceOf[MatrixFactorizationModel] I get the error: Exception in thread "main" java.io.IOException: PARSING_ERROR(2) I've also tried to serialize MatrixFactorizationModel's both RDDs (products and users) and later create the MatrixFactorizationModel by hand passing the RDDs by constructor but I get an error cause its private: Error:(58, 17) constructor MatrixFactorizationModel in class MatrixFactorizationModel cannot be accessed in object RecommendALS val model = new MatrixFactorizationModel (8, userFeatures, productFeatures) Any ideas? Thanks! -- Albert Manyà alber...@eml.cc - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org