I have a similar issue. I want to load a model saved by a spark machine learning job, in a web application.
model.save(jsc.sc(), "myModelPath"); LogisticRegressionModel model = LogisticRegressionModel.load( jsc.sc(), "myModelPath"); When I do that, I need to pass a spark context for loading the model. The model is small and can be saved to local file system, so is there any way to use it without the spark context? Looks like creating spark context is an expensive step that internally starts a jetty server. I do not want to start one more web server inside a web application. A solution that I received (pasted below) was to export the model into a generic format such as PMML. I haven't tried it, and I am hoping to find a way to use the model without adding a lot more dependencies and code to the project. On Oct 30, 2015, at 2:11 PM, Stefano Baghino <stefano.bagh...@radicalbit.io<mailto:stefano.bagh...@radicalbit.io>> wrote: One possibility would be to export the model as a PMML (Predictive Model Markup Language, an XML-based standard to describe predictive models) and then use it in your Web app (using something like JPMML<https://github.com/jpmml>, for example). You can directly export (some) models (including LinReg) since Spark 1.4: https://databricks.com/blog/2015/07/02/pmml-support-in-spark-mllib.html For more info on PMML support on MLlib (including model support): https://spark.apache.org/docs/latest/mllib-pmml-model-export.html For more info on the PMML standard: http://dmg.org/pmml/v4-2-1/GeneralStructure.html Thanks Viju From: Andy Davidson [mailto:a...@santacruzintegration.com] Sent: Tuesday, November 10, 2015 1:32 PM To: user @spark Subject: thought experiment: use spark ML to real time prediction Lets say I have use spark ML to train a linear model. I know I can save and load the model to disk. I am not sure how I can use the model in a real time environment. For example I do not think I can return a "prediction" to the client using spark streaming easily. Also for some applications the extra latency created by the batch process might not be acceptable. If I was not using spark I would re-implement the model I trained in my batch environment in a lang like Java and implement a rest service that uses the model to create a prediction and return the prediction to the client. Many models make predictions using linear algebra. Implementing predictions is relatively easy if you have a good vectorized LA package. Is there a way to use a model I trained using spark ML outside of spark? As a motivating example, even if its possible to return data to the client using spark streaming. I think the mini batch latency would not be acceptable for a high frequency stock trading system. Kind regards Andy P.s. The examples I have seen so far use spark streaming to "preprocess" predictions. For example a recommender system might use what current users are watching to calculate "trending recommendations". These are stored on disk and served up to users when the use the "movie guide". If a recommendation was a couple of min. old it would not effect the end users experience. ---------------------------------------------------------------------- This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.