Hi All, I am doing model training using Spark MLLIB inside our hadoop cluster. But prediction happens in a different realtime synchronous system(Web application). I am currently exploring different options to export the trained Mllib models from spark.
1. *Export model as PMML:* I found the projects under JPMML: Java PMML API <https://github.com/jpmml> is quite interesting. Use JPMML <https://github.com/jpmml/jpmml> to convert the mllib model entity to PMML. And use PMML evaluator <https://github.com/jpmml/jpmml-evaluator> for prediction in a different system. Or we can also explore openscoring rest api <https://github.com/jpmml/openscoring> for model deployment and prediction. This could be standard approach if we need to port models across different systems. But converting non linear Mllib models to PMML might be a complex task. Apart from that I need to keep on updating my Mllib to PMML conversion code for any new Mllib models or any change in Mllib entities. I have not evaluated any of these JPMML projects personally and I see there is only single contributor for these projects. Just wondering if enough people have already started using these projects. Please share if any of you have any points on this. 2. *Export MLLIB model as serialized form:* Mllib models can be serialized using Kryo serialization <http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAFRXrqdpkfCX41=JyTSmmtt8aNWrSdpJvxE3FmYVZ=uuepe...@mail.gmail.com%3E> or normal java serialization <http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-td11953.html> . And the same model can be deserialized by different other standalone applications and use the mllib entity for prediction. This blog <http://blog.knoldus.com/2014/07/21/play-with-spark-building-spark-mllib-in-a-play-spark-application/> shows an example how spark mllib can be used inside Play web application. I am expecting, I can use spark mllib in any other JVM based web application in the same way(?). Please share if any one has any experience on this. Advantage of this approach is : -> No recurring effort to support any new model or any change in Mllib model entity in future version. -> Less dependency on any other tools Disadvantages: -> Model can not be ported to non JVM system -> Model serialized using one version of Mllib entity, may not be deserializable using a different version of mllib entity(?). I think this is a quite common problem.I am really interested to hear from you people how you are solving this and what are the approaches and pros and cons. Thanks Sourabh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org