Hi All,
I am doing model training using Spark MLLIB inside our hadoop cluster. But
prediction happens in a different realtime synchronous system(Web
application). I am currently exploring different options to export the
trained Mllib models from spark.

   1. *Export model as PMML:* I found the projects under  JPMML: Java PMML
API <https://github.com/jpmml>   is quite interesting. Use  JPMML
<https://github.com/jpmml/jpmml>   to convert the mllib model entity to
PMML. And use  PMML evaluator <https://github.com/jpmml/jpmml-evaluator>  
for prediction in a different system. Or we can also explore  openscoring
rest api <https://github.com/jpmml/openscoring>   for model deployment and
prediction.

This could be standard approach if we need to port models across different
systems. But converting non linear Mllib models to PMML might be a complex
task. Apart from that I need to keep on updating my Mllib to PMML conversion
code for any new Mllib models or any change in Mllib entities.
I have not evaluated any of these JPMML projects personally and I see there
is only single contributor for these projects. Just wondering if enough
people have already started using these projects. Please share if any of you
have any points on this.

   2. *Export MLLIB model as serialized form:* Mllib models can be
serialized using  Kryo serialization
<http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAFRXrqdpkfCX41=JyTSmmtt8aNWrSdpJvxE3FmYVZ=uuepe...@mail.gmail.com%3E>
  
or normal  java serialization
<http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-td11953.html>
 
. And the same model can be deserialized by different other standalone
applications and use the mllib entity for prediction.  This blog
<http://blog.knoldus.com/2014/07/21/play-with-spark-building-spark-mllib-in-a-play-spark-application/>
  
shows an example how spark mllib can be used inside Play web application. I
am expecting, I can use spark mllib in any other JVM based web application
in the same way(?). Please share if any one has any experience on this.
  Advantage of this approach is :
     -> No recurring effort to support any new model or any change in Mllib
model entity in future version.
     -> Less dependency on any other tools
  Disadvantages:
     -> Model can not be ported to non JVM system
     -> Model serialized using one version of Mllib entity, may not be
deserializable using a different version of mllib entity(?).

I think this is a quite common problem.I am really interested to hear from
you people how you are solving this and what are the approaches and pros and
cons.

Thanks
Sourabh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to