Re: deploying a model built in mllib

2014-11-07 Thread chirag lakhani
Thanks for letting me know about this, it looks pretty interesting.  From
reading the documentation it seems that the server must be built on a Spark
cluster, is that correct?  Is it possible to deploy it in on a Java
server?  That is how we are currently running our web app.



On Tue, Nov 4, 2014 at 7:57 PM, Simon Chan simonc...@gmail.com wrote:

 The latest version of PredictionIO, which is now under Apache 2 license,
 supports the deployment of MLlib models on production.

 The engine you build will including a few components, such as:
 - Data - includes Data Source and Data Preparator
 - Algorithm(s)
 - Serving
 I believe that you can do the feature vector creation inside the Data
 Preparator component.

 Currently, the package comes with two templates: 1)  Collaborative
 Filtering Engine Template - with MLlib ALS; 2) Classification Engine
 Template - with MLlib Naive Bayes. The latter one may be useful to you. And
 you can customize the Algorithm component, too.

 I have just created a doc: http://docs.prediction.io/0.8.1/templates/
 Love to hear your feedback!

 Regards,
 Simon



 On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani chirag.lakh...@gmail.com
  wrote:

 Would pipelining include model export?  I didn't see that in the
 documentation.

 Are there ways that this is being done currently?



 On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng men...@gmail.com wrote:

 We are working on the pipeline features, which would make this
 procedure much easier in MLlib. This is still a WIP and the main JIRA
 is at:

 https://issues.apache.org/jira/browse/SPARK-1856

 Best,
 Xiangrui

 On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani
 chirag.lakh...@gmail.com wrote:
  Hello,
 
  I have been prototyping a text classification model that my company
 would
  like to eventually put into production.  Our technology stack is
 currently
  Java based but we would like to be able to build our models in
 Spark/MLlib
  and then export something like a PMML file which can be used for model
  scoring in real-time.
 
  I have been using scikit learn where I am able to take the training
 data
  convert the text data into a sparse data format and then take the other
  features and use the dictionary vectorizer to do one-hot encoding for
 the
  other categorical variables.  All of those things seem to be possible
 in
  mllib but I am still puzzled about how that can be packaged in such a
 way
  that the incoming data can be first made into feature vectors and then
  evaluated as well.
 
  Are there any best practices for this type of thing in Spark?  I hope
 this
  is clear but if there are any confusions then please let me know.
 
  Thanks,
 
  Chirag






Re: deploying a model built in mllib

2014-11-07 Thread Donald Szeto
Hi Chirag,

Could you please provide more information on your Java server environment?

Regards,
Donald
ᐧ

On Fri, Nov 7, 2014 at 9:57 AM, chirag lakhani chirag.lakh...@gmail.com
wrote:

 Thanks for letting me know about this, it looks pretty interesting.  From
 reading the documentation it seems that the server must be built on a Spark
 cluster, is that correct?  Is it possible to deploy it in on a Java
 server?  That is how we are currently running our web app.



 On Tue, Nov 4, 2014 at 7:57 PM, Simon Chan simonc...@gmail.com wrote:

 The latest version of PredictionIO, which is now under Apache 2 license,
 supports the deployment of MLlib models on production.

 The engine you build will including a few components, such as:
 - Data - includes Data Source and Data Preparator
 - Algorithm(s)
 - Serving
 I believe that you can do the feature vector creation inside the Data
 Preparator component.

 Currently, the package comes with two templates: 1)  Collaborative
 Filtering Engine Template - with MLlib ALS; 2) Classification Engine
 Template - with MLlib Naive Bayes. The latter one may be useful to you. And
 you can customize the Algorithm component, too.

 I have just created a doc: http://docs.prediction.io/0.8.1/templates/
 Love to hear your feedback!

 Regards,
 Simon



 On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani 
 chirag.lakh...@gmail.com wrote:

 Would pipelining include model export?  I didn't see that in the
 documentation.

 Are there ways that this is being done currently?



 On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng men...@gmail.com
 wrote:

 We are working on the pipeline features, which would make this
 procedure much easier in MLlib. This is still a WIP and the main JIRA
 is at:

 https://issues.apache.org/jira/browse/SPARK-1856

 Best,
 Xiangrui

 On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani
 chirag.lakh...@gmail.com wrote:
  Hello,
 
  I have been prototyping a text classification model that my company
 would
  like to eventually put into production.  Our technology stack is
 currently
  Java based but we would like to be able to build our models in
 Spark/MLlib
  and then export something like a PMML file which can be used for model
  scoring in real-time.
 
  I have been using scikit learn where I am able to take the training
 data
  convert the text data into a sparse data format and then take the
 other
  features and use the dictionary vectorizer to do one-hot encoding for
 the
  other categorical variables.  All of those things seem to be possible
 in
  mllib but I am still puzzled about how that can be packaged in such a
 way
  that the incoming data can be first made into feature vectors and then
  evaluated as well.
 
  Are there any best practices for this type of thing in Spark?  I hope
 this
  is clear but if there are any confusions then please let me know.
 
  Thanks,
 
  Chirag







-- 
Donald Szeto
PredictionIO


Re: deploying a model built in mllib

2014-11-04 Thread Simon Chan
The latest version of PredictionIO, which is now under Apache 2 license,
supports the deployment of MLlib models on production.

The engine you build will including a few components, such as:
- Data - includes Data Source and Data Preparator
- Algorithm(s)
- Serving
I believe that you can do the feature vector creation inside the Data
Preparator component.

Currently, the package comes with two templates: 1)  Collaborative
Filtering Engine Template - with MLlib ALS; 2) Classification Engine
Template - with MLlib Naive Bayes. The latter one may be useful to you. And
you can customize the Algorithm component, too.

I have just created a doc: http://docs.prediction.io/0.8.1/templates/
Love to hear your feedback!

Regards,
Simon



On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani chirag.lakh...@gmail.com
wrote:

 Would pipelining include model export?  I didn't see that in the
 documentation.

 Are there ways that this is being done currently?



 On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng men...@gmail.com wrote:

 We are working on the pipeline features, which would make this
 procedure much easier in MLlib. This is still a WIP and the main JIRA
 is at:

 https://issues.apache.org/jira/browse/SPARK-1856

 Best,
 Xiangrui

 On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani
 chirag.lakh...@gmail.com wrote:
  Hello,
 
  I have been prototyping a text classification model that my company
 would
  like to eventually put into production.  Our technology stack is
 currently
  Java based but we would like to be able to build our models in
 Spark/MLlib
  and then export something like a PMML file which can be used for model
  scoring in real-time.
 
  I have been using scikit learn where I am able to take the training data
  convert the text data into a sparse data format and then take the other
  features and use the dictionary vectorizer to do one-hot encoding for
 the
  other categorical variables.  All of those things seem to be possible in
  mllib but I am still puzzled about how that can be packaged in such a
 way
  that the incoming data can be first made into feature vectors and then
  evaluated as well.
 
  Are there any best practices for this type of thing in Spark?  I hope
 this
  is clear but if there are any confusions then please let me know.
 
  Thanks,
 
  Chirag





deploying a model built in mllib

2014-10-27 Thread chirag lakhani
Hello,

I have been prototyping a text classification model that my company would
like to eventually put into production.  Our technology stack is currently
Java based but we would like to be able to build our models in Spark/MLlib
and then export something like a PMML file which can be used for model
scoring in real-time.

I have been using scikit learn where I am able to take the training data
convert the text data into a sparse data format and then take the other
features and use the dictionary vectorizer to do one-hot encoding for the
other categorical variables.  All of those things seem to be possible in
mllib but I am still puzzled about how that can be packaged in such a way
that the incoming data can be first made into feature vectors and then
evaluated as well.

Are there any best practices for this type of thing in Spark?  I hope this
is clear but if there are any confusions then please let me know.

Thanks,

Chirag


Re: deploying a model built in mllib

2014-10-27 Thread Xiangrui Meng
We are working on the pipeline features, which would make this
procedure much easier in MLlib. This is still a WIP and the main JIRA
is at:

https://issues.apache.org/jira/browse/SPARK-1856

Best,
Xiangrui

On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani
chirag.lakh...@gmail.com wrote:
 Hello,

 I have been prototyping a text classification model that my company would
 like to eventually put into production.  Our technology stack is currently
 Java based but we would like to be able to build our models in Spark/MLlib
 and then export something like a PMML file which can be used for model
 scoring in real-time.

 I have been using scikit learn where I am able to take the training data
 convert the text data into a sparse data format and then take the other
 features and use the dictionary vectorizer to do one-hot encoding for the
 other categorical variables.  All of those things seem to be possible in
 mllib but I am still puzzled about how that can be packaged in such a way
 that the incoming data can be first made into feature vectors and then
 evaluated as well.

 Are there any best practices for this type of thing in Spark?  I hope this
 is clear but if there are any confusions then please let me know.

 Thanks,

 Chirag

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org