The latest version of PredictionIO, which is now under Apache 2 license, supports the deployment of MLlib models on production.
The "engine" you build will including a few components, such as: - Data - includes Data Source and Data Preparator - Algorithm(s) - Serving I believe that you can do the feature vector creation inside the Data Preparator component. Currently, the package comes with two templates: 1) Collaborative Filtering Engine Template - with MLlib ALS; 2) Classification Engine Template - with MLlib Naive Bayes. The latter one may be useful to you. And you can customize the Algorithm component, too. I have just created a doc: http://docs.prediction.io/0.8.1/templates/ Love to hear your feedback! Regards, Simon On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani <chirag.lakh...@gmail.com> wrote: > Would pipelining include model export? I didn't see that in the > documentation. > > Are there ways that this is being done currently? > > > > On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng <men...@gmail.com> wrote: > >> We are working on the pipeline features, which would make this >> procedure much easier in MLlib. This is still a WIP and the main JIRA >> is at: >> >> https://issues.apache.org/jira/browse/SPARK-1856 >> >> Best, >> Xiangrui >> >> On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani >> <chirag.lakh...@gmail.com> wrote: >> > Hello, >> > >> > I have been prototyping a text classification model that my company >> would >> > like to eventually put into production. Our technology stack is >> currently >> > Java based but we would like to be able to build our models in >> Spark/MLlib >> > and then export something like a PMML file which can be used for model >> > scoring in real-time. >> > >> > I have been using scikit learn where I am able to take the training data >> > convert the text data into a sparse data format and then take the other >> > features and use the dictionary vectorizer to do one-hot encoding for >> the >> > other categorical variables. All of those things seem to be possible in >> > mllib but I am still puzzled about how that can be packaged in such a >> way >> > that the incoming data can be first made into feature vectors and then >> > evaluated as well. >> > >> > Are there any best practices for this type of thing in Spark? I hope >> this >> > is clear but if there are any confusions then please let me know. >> > >> > Thanks, >> > >> > Chirag >> > >