Re: word2vec: how to save an mllib model and reload it?

2016-06-10 Thread sharad82
I am having problem in serializing a ML word2vec model. Am I doing something wrong ? http://stackoverflow.com/questions/37723308/spark-ml-word2vec-serialization-issues -- View this message in context:

Re: word2vec: how to save an mllib model and reload it?

2015-02-05 Thread Carsten Schnober
As a Spark newbie, I've come across this thread. I'm playing with Word2Vec in our Hadoop cluster and here's my issue with classic Java serialization of the model: I don't have SSH access to the cluster master node. Here's my code for computing the model: val input =

Re: word2vec: how to save an mllib model and reload it?

2014-11-07 Thread Duy Huynh
you're right, serialization works. what is your suggestion on saving a distributed model? so part of the model is in one cluster, and some other parts of the model are in other clusters. during runtime, these sub-models run independently in their own clusters (load, train, save). and at some

Re: word2vec: how to save an mllib model and reload it?

2014-11-07 Thread Nick Pentreath
Currently I see the word2vec model is collected onto the master, so the model itself is not distributed.  I guess the question is why do you need  a distributed model? Is the vocab size so large that it's necessary? For model serving in general, unless the model is truly massive (ie cannot

Re: word2vec: how to save an mllib model and reload it?

2014-11-07 Thread Evan R. Sparks
There are a few examples where this is the case. Let's take ALS, where the result is a MatrixFactorizationModel, which is assumed to be big - the model consists of two matrices, one (users x k) and one (k x products). These are represented as RDDs. You can save these RDDs out to disk by doing

Re: word2vec: how to save an mllib model and reload it?

2014-11-07 Thread Nick Pentreath
For ALS if you want real time recs (and usually this is order 10s to a few 100s ms response), then Spark is not the way to go - a serving layer like Oryx, or prediction.io is what you want. (At graphflow we've built our own). You hold the factor matrices in memory and do the dot product in

Re: word2vec: how to save an mllib model and reload it?

2014-11-07 Thread Duy Huynh
hi nick.. sorry about the confusion. originally i had a question specifically about word2vec, but my follow up question on distributed model is a more general question about saving different types of models. on distributed model, i was hoping to implement a model parallelism, so that different

Re: word2vec: how to save an mllib model and reload it?

2014-11-07 Thread Duy Huynh
yep, but that's only if they are already represented as RDDs. which is much more convenient for saving and loading. my question is for the use case that they are not represented as RDDs yet. then, do you think if it makes sense to covert them into RDDs, just for the convenience of saving and

Re: word2vec: how to save an mllib model and reload it?

2014-11-07 Thread Duy Huynh
thansk nick. i'll take a look at oryx and prediction.io. re: private val model in word2vec ;) yes, i couldn't wait so i just changed it in the word2vec source code. but i'm running into some compiliation issue now. hopefully i can fix it soon, so to get this things going. On Fri, Nov 7, 2014

Re: word2vec: how to save an mllib model and reload it?

2014-11-07 Thread Simon Chan
Just want to elaborate more on Duy's suggestion on using PredictionIO. PredictionIO will store the model automatically if you return it in the training function. An example using CF: def train(data: PreparedData): PersistentMatrixFactorizationModel = { val m = ALS.train(data.ratings,

word2vec: how to save an mllib model and reload it?

2014-11-06 Thread ll
what is the best way to save an mllib model that you just trained and reload it in the future? specifically, i'm using the mllib word2vec model... thanks. -- View this message in context:

Re: word2vec: how to save an mllib model and reload it?

2014-11-06 Thread Evan R. Sparks
Plain old java serialization is one straightforward approach if you're in java/scala. On Thu, Nov 6, 2014 at 11:26 PM, ll duy.huynh@gmail.com wrote: what is the best way to save an mllib model that you just trained and reload it in the future? specifically, i'm using the mllib word2vec

Re: word2vec: how to save an mllib model and reload it?

2014-11-06 Thread Duy Huynh
that works. is there a better way in spark? this seems like the most common feature for any machine learning work - to be able to save your model after training it and load it later. On Fri, Nov 7, 2014 at 2:30 AM, Evan R. Sparks evan.spa...@gmail.com wrote: Plain old java serialization is

Re: word2vec: how to save an mllib model and reload it?

2014-11-06 Thread Evan R. Sparks
There's some work going on to support PMML - https://issues.apache.org/jira/browse/SPARK-1406 - but it's not yet been merged into master. What are you used to doing in other environments? In R I'm used to running save(), same with matlab. In python either pickling things or dumping to json seems