I have a similar issue.  I want to load a model saved by a spark machine 
learning job, in a web application.

                        model.save(jsc.sc(), "myModelPath");

                        LogisticRegressionModel model = 
LogisticRegressionModel.load(
                                        jsc.sc(), "myModelPath");

When I do that, I need to pass a spark context for loading the model.  The 
model is small and can be saved to local file system, so is there any way to 
use it without the spark context?  Looks like creating spark context is an 
expensive step that internally starts a jetty server.  I do not want to start 
one more web server inside a web application.

A solution that I received (pasted below) was to export the model into a 
generic format such as PMML. I haven't tried it, and I am hoping to find a way 
to use the model without adding a lot more dependencies and code to the project.


On Oct 30, 2015, at 2:11 PM, Stefano Baghino 
<stefano.bagh...@radicalbit.io<mailto:stefano.bagh...@radicalbit.io>> wrote:
One possibility would be to export the model as a PMML (Predictive Model Markup 
Language, an XML-based standard to describe predictive models) and then use it 
in your Web app (using something like JPMML<https://github.com/jpmml>, for 
example). You can directly export (some) models (including LinReg) since Spark 
1.4: https://databricks.com/blog/2015/07/02/pmml-support-in-spark-mllib.html

For more info on PMML support on MLlib (including model support): 
https://spark.apache.org/docs/latest/mllib-pmml-model-export.html
For more info on the PMML standard: 
http://dmg.org/pmml/v4-2-1/GeneralStructure.html


Thanks
Viju





From: Andy Davidson [mailto:a...@santacruzintegration.com]
Sent: Tuesday, November 10, 2015 1:32 PM
To: user @spark
Subject: thought experiment: use spark ML to real time prediction

Lets say I have use spark ML to train a linear model. I know I can save and 
load the model to disk. I am not sure how I can use the model in a real time 
environment. For example I do not think I can return a "prediction" to the 
client using spark streaming easily. Also for some applications the extra 
latency created by the batch process might not be acceptable.

If I was not using spark I would re-implement the model I trained in my batch 
environment in a lang like Java  and implement a rest service that uses the 
model to create a prediction and return the prediction to the client. Many 
models make predictions using linear algebra. Implementing predictions is 
relatively easy if you have a good vectorized LA package. Is there a way to use 
a model I trained using spark ML outside of spark?

As a motivating example, even if its possible to return data to the client 
using spark streaming. I think the mini batch latency would not be acceptable 
for a high frequency stock trading system.

Kind regards

Andy

P.s. The examples I have seen so far use spark streaming to "preprocess" 
predictions. For example a recommender system might use what current users are 
watching to calculate "trending recommendations". These are stored on disk and 
served up to users when the use the "movie guide". If a recommendation was a 
couple of min. old it would not effect the end users experience.

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.

Reply via email to