The latest version of PredictionIO, which is now under Apache 2 license,
supports the deployment of MLlib models on production.

The "engine" you build will including a few components, such as:
- Data - includes Data Source and Data Preparator
- Algorithm(s)
- Serving
I believe that you can do the feature vector creation inside the Data
Preparator component.

Currently, the package comes with two templates: 1)  Collaborative
Filtering Engine Template - with MLlib ALS; 2) Classification Engine
Template - with MLlib Naive Bayes. The latter one may be useful to you. And
you can customize the Algorithm component, too.

I have just created a doc: http://docs.prediction.io/0.8.1/templates/
Love to hear your feedback!

Regards,
Simon



On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani <chirag.lakh...@gmail.com>
wrote:

> Would pipelining include model export?  I didn't see that in the
> documentation.
>
> Are there ways that this is being done currently?
>
>
>
> On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng <men...@gmail.com> wrote:
>
>> We are working on the pipeline features, which would make this
>> procedure much easier in MLlib. This is still a WIP and the main JIRA
>> is at:
>>
>> https://issues.apache.org/jira/browse/SPARK-1856
>>
>> Best,
>> Xiangrui
>>
>> On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani
>> <chirag.lakh...@gmail.com> wrote:
>> > Hello,
>> >
>> > I have been prototyping a text classification model that my company
>> would
>> > like to eventually put into production.  Our technology stack is
>> currently
>> > Java based but we would like to be able to build our models in
>> Spark/MLlib
>> > and then export something like a PMML file which can be used for model
>> > scoring in real-time.
>> >
>> > I have been using scikit learn where I am able to take the training data
>> > convert the text data into a sparse data format and then take the other
>> > features and use the dictionary vectorizer to do one-hot encoding for
>> the
>> > other categorical variables.  All of those things seem to be possible in
>> > mllib but I am still puzzled about how that can be packaged in such a
>> way
>> > that the incoming data can be first made into feature vectors and then
>> > evaluated as well.
>> >
>> > Are there any best practices for this type of thing in Spark?  I hope
>> this
>> > is clear but if there are any confusions then please let me know.
>> >
>> > Thanks,
>> >
>> > Chirag
>>
>
>

Reply via email to