Currently I see the word2vec model is collected onto the master, so the model itself is not distributed.
I guess the question is why do you need a distributed model? Is the vocab size so large that it's necessary? For model serving in general, unless the model is truly massive (ie cannot fit into memory on a modern high end box with 64, or 128GB ram) then single instance is way faster and simpler (using a cluster of machines is more for load balancing / fault tolerance). What is your use case for model serving? — Sent from Mailbox On Fri, Nov 7, 2014 at 5:47 PM, Duy Huynh <duy.huynh....@gmail.com> wrote: > you're right, serialization works. > what is your suggestion on saving a "distributed" model? so part of the > model is in one cluster, and some other parts of the model are in other > clusters. during runtime, these sub-models run independently in their own > clusters (load, train, save). and at some point during run time these > sub-models merge into the master model, which also loads, trains, and saves > at the master level. > much appreciated. > On Fri, Nov 7, 2014 at 2:53 AM, Evan R. Sparks <evan.spa...@gmail.com> > wrote: >> There's some work going on to support PMML - >> https://issues.apache.org/jira/browse/SPARK-1406 - but it's not yet been >> merged into master. >> >> What are you used to doing in other environments? In R I'm used to running >> save(), same with matlab. In python either pickling things or dumping to >> json seems pretty common. (even the scikit-learn docs recommend pickling - >> http://scikit-learn.org/stable/modules/model_persistence.html). These all >> seem basically equivalent java serialization to me.. >> >> Would some helper functions (in, say, mllib.util.modelpersistence or >> something) make sense to add? >> >> On Thu, Nov 6, 2014 at 11:36 PM, Duy Huynh <duy.huynh....@gmail.com> >> wrote: >> >>> that works. is there a better way in spark? this seems like the most >>> common feature for any machine learning work - to be able to save your >>> model after training it and load it later. >>> >>> On Fri, Nov 7, 2014 at 2:30 AM, Evan R. Sparks <evan.spa...@gmail.com> >>> wrote: >>> >>>> Plain old java serialization is one straightforward approach if you're >>>> in java/scala. >>>> >>>> On Thu, Nov 6, 2014 at 11:26 PM, ll <duy.huynh....@gmail.com> wrote: >>>> >>>>> what is the best way to save an mllib model that you just trained and >>>>> reload >>>>> it in the future? specifically, i'm using the mllib word2vec model... >>>>> thanks. >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/word2vec-how-to-save-an-mllib-model-and-reload-it-tp18329.html >>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>> >>> >>