Re: deploying a model built in mllib
Thanks for letting me know about this, it looks pretty interesting. From reading the documentation it seems that the server must be built on a Spark cluster, is that correct? Is it possible to deploy it in on a Java server? That is how we are currently running our web app. On Tue, Nov 4, 2014 at 7:57 PM, Simon Chan simonc...@gmail.com wrote: The latest version of PredictionIO, which is now under Apache 2 license, supports the deployment of MLlib models on production. The engine you build will including a few components, such as: - Data - includes Data Source and Data Preparator - Algorithm(s) - Serving I believe that you can do the feature vector creation inside the Data Preparator component. Currently, the package comes with two templates: 1) Collaborative Filtering Engine Template - with MLlib ALS; 2) Classification Engine Template - with MLlib Naive Bayes. The latter one may be useful to you. And you can customize the Algorithm component, too. I have just created a doc: http://docs.prediction.io/0.8.1/templates/ Love to hear your feedback! Regards, Simon On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Would pipelining include model export? I didn't see that in the documentation. Are there ways that this is being done currently? On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng men...@gmail.com wrote: We are working on the pipeline features, which would make this procedure much easier in MLlib. This is still a WIP and the main JIRA is at: https://issues.apache.org/jira/browse/SPARK-1856 Best, Xiangrui On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Hello, I have been prototyping a text classification model that my company would like to eventually put into production. Our technology stack is currently Java based but we would like to be able to build our models in Spark/MLlib and then export something like a PMML file which can be used for model scoring in real-time. I have been using scikit learn where I am able to take the training data convert the text data into a sparse data format and then take the other features and use the dictionary vectorizer to do one-hot encoding for the other categorical variables. All of those things seem to be possible in mllib but I am still puzzled about how that can be packaged in such a way that the incoming data can be first made into feature vectors and then evaluated as well. Are there any best practices for this type of thing in Spark? I hope this is clear but if there are any confusions then please let me know. Thanks, Chirag
Re: deploying a model built in mllib
Hi Chirag, Could you please provide more information on your Java server environment? Regards, Donald ᐧ On Fri, Nov 7, 2014 at 9:57 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Thanks for letting me know about this, it looks pretty interesting. From reading the documentation it seems that the server must be built on a Spark cluster, is that correct? Is it possible to deploy it in on a Java server? That is how we are currently running our web app. On Tue, Nov 4, 2014 at 7:57 PM, Simon Chan simonc...@gmail.com wrote: The latest version of PredictionIO, which is now under Apache 2 license, supports the deployment of MLlib models on production. The engine you build will including a few components, such as: - Data - includes Data Source and Data Preparator - Algorithm(s) - Serving I believe that you can do the feature vector creation inside the Data Preparator component. Currently, the package comes with two templates: 1) Collaborative Filtering Engine Template - with MLlib ALS; 2) Classification Engine Template - with MLlib Naive Bayes. The latter one may be useful to you. And you can customize the Algorithm component, too. I have just created a doc: http://docs.prediction.io/0.8.1/templates/ Love to hear your feedback! Regards, Simon On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Would pipelining include model export? I didn't see that in the documentation. Are there ways that this is being done currently? On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng men...@gmail.com wrote: We are working on the pipeline features, which would make this procedure much easier in MLlib. This is still a WIP and the main JIRA is at: https://issues.apache.org/jira/browse/SPARK-1856 Best, Xiangrui On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Hello, I have been prototyping a text classification model that my company would like to eventually put into production. Our technology stack is currently Java based but we would like to be able to build our models in Spark/MLlib and then export something like a PMML file which can be used for model scoring in real-time. I have been using scikit learn where I am able to take the training data convert the text data into a sparse data format and then take the other features and use the dictionary vectorizer to do one-hot encoding for the other categorical variables. All of those things seem to be possible in mllib but I am still puzzled about how that can be packaged in such a way that the incoming data can be first made into feature vectors and then evaluated as well. Are there any best practices for this type of thing in Spark? I hope this is clear but if there are any confusions then please let me know. Thanks, Chirag -- Donald Szeto PredictionIO
Re: deploying a model built in mllib
The latest version of PredictionIO, which is now under Apache 2 license, supports the deployment of MLlib models on production. The engine you build will including a few components, such as: - Data - includes Data Source and Data Preparator - Algorithm(s) - Serving I believe that you can do the feature vector creation inside the Data Preparator component. Currently, the package comes with two templates: 1) Collaborative Filtering Engine Template - with MLlib ALS; 2) Classification Engine Template - with MLlib Naive Bayes. The latter one may be useful to you. And you can customize the Algorithm component, too. I have just created a doc: http://docs.prediction.io/0.8.1/templates/ Love to hear your feedback! Regards, Simon On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Would pipelining include model export? I didn't see that in the documentation. Are there ways that this is being done currently? On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng men...@gmail.com wrote: We are working on the pipeline features, which would make this procedure much easier in MLlib. This is still a WIP and the main JIRA is at: https://issues.apache.org/jira/browse/SPARK-1856 Best, Xiangrui On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Hello, I have been prototyping a text classification model that my company would like to eventually put into production. Our technology stack is currently Java based but we would like to be able to build our models in Spark/MLlib and then export something like a PMML file which can be used for model scoring in real-time. I have been using scikit learn where I am able to take the training data convert the text data into a sparse data format and then take the other features and use the dictionary vectorizer to do one-hot encoding for the other categorical variables. All of those things seem to be possible in mllib but I am still puzzled about how that can be packaged in such a way that the incoming data can be first made into feature vectors and then evaluated as well. Are there any best practices for this type of thing in Spark? I hope this is clear but if there are any confusions then please let me know. Thanks, Chirag
deploying a model built in mllib
Hello, I have been prototyping a text classification model that my company would like to eventually put into production. Our technology stack is currently Java based but we would like to be able to build our models in Spark/MLlib and then export something like a PMML file which can be used for model scoring in real-time. I have been using scikit learn where I am able to take the training data convert the text data into a sparse data format and then take the other features and use the dictionary vectorizer to do one-hot encoding for the other categorical variables. All of those things seem to be possible in mllib but I am still puzzled about how that can be packaged in such a way that the incoming data can be first made into feature vectors and then evaluated as well. Are there any best practices for this type of thing in Spark? I hope this is clear but if there are any confusions then please let me know. Thanks, Chirag
Re: deploying a model built in mllib
We are working on the pipeline features, which would make this procedure much easier in MLlib. This is still a WIP and the main JIRA is at: https://issues.apache.org/jira/browse/SPARK-1856 Best, Xiangrui On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani chirag.lakh...@gmail.com wrote: Hello, I have been prototyping a text classification model that my company would like to eventually put into production. Our technology stack is currently Java based but we would like to be able to build our models in Spark/MLlib and then export something like a PMML file which can be used for model scoring in real-time. I have been using scikit learn where I am able to take the training data convert the text data into a sparse data format and then take the other features and use the dictionary vectorizer to do one-hot encoding for the other categorical variables. All of those things seem to be possible in mllib but I am still puzzled about how that can be packaged in such a way that the incoming data can be first made into feature vectors and then evaluated as well. Are there any best practices for this type of thing in Spark? I hope this is clear but if there are any confusions then please let me know. Thanks, Chirag - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org