Re: MLLIB model export: PMML vs MLLIB serialization
Thanks Vincenzo. Are you trying out all the models implemented in mllib? Actually I don't see decision tree there. Sorry if I missed it. When are you planning to merge this to spark branch? Thanks Sourabh On Sun, Dec 14, 2014 at 5:54 PM, selvinsource [via Apache Spark User List] ml-node+s1001560n20674...@n3.nabble.com wrote: Hi Sourabh, have a look at https://issues.apache.org/jira/browse/SPARK-1406, I am looking into exporting models in PMML using JPMML. Regards, Vincenzo -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20674.html To unsubscribe from MLLIB model export: PMML vs MLLIB serialization, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=20324code=Y2hha2kuc291cmFiaEBnbWFpbC5jb218MjAzMjR8LTY5MzQzMTU5OQ== . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20688.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: MLLIB model export: PMML vs MLLIB serialization
I am going to try to export decision tree next, so far I focused on linear models and k-means. Regards, Vincenzo sourabh wrote Thanks Vincenzo. Are you trying out all the models implemented in mllib? Actually I don't see decision tree there. Sorry if I missed it. When are you planning to merge this to spark branch? Thanks Sourabh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20693.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLLIB model export: PMML vs MLLIB serialization
Hi Sourabh, have a look at https://issues.apache.org/jira/browse/SPARK-1406, I am looking into exporting models in PMML using JPMML. Regards, Vincenzo -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20674.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLLIB model export: PMML vs MLLIB serialization
Hi Sourabh, I came across same problem as you. One workable solution for me was to serialize the parts of model that can be used again to recreate it. I serialize RDD's in my model using saveAsObjectFile with a time stamp attached to it in HDFS. My other spark application read from the latest stored dir from HDFS using sc.ObjectFile and recreate the recently trained model for prediction. I think this is not the best solution but it worked for me. I am also looking for other efficient approaches for such problem where exporting of model to some other application is required. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20348.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
MLLIB model export: PMML vs MLLIB serialization
Hi All, I am doing model training using Spark MLLIB inside our hadoop cluster. But prediction happens in a different realtime synchronous system(Web application). I am currently exploring different options to export the trained Mllib models from spark. 1. *Export model as PMML:* I found the projects under JPMML: Java PMML API https://github.com/jpmml is quite interesting. Use JPMML https://github.com/jpmml/jpmml to convert the mllib model entity to PMML. And use PMML evaluator https://github.com/jpmml/jpmml-evaluator for prediction in a different system. Or we can also explore openscoring rest api https://github.com/jpmml/openscoring for model deployment and prediction. This could be standard approach if we need to port models across different systems. But converting non linear Mllib models to PMML might be a complex task. Apart from that I need to keep on updating my Mllib to PMML conversion code for any new Mllib models or any change in Mllib entities. I have not evaluated any of these JPMML projects personally and I see there is only single contributor for these projects. Just wondering if enough people have already started using these projects. Please share if any of you have any points on this. 2. *Export MLLIB model as serialized form:* Mllib models can be serialized using Kryo serialization http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAFRXrqdpkfCX41=JyTSmmtt8aNWrSdpJvxE3FmYVZ=uuepe...@mail.gmail.com%3E or normal java serialization http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-td11953.html . And the same model can be deserialized by different other standalone applications and use the mllib entity for prediction. This blog http://blog.knoldus.com/2014/07/21/play-with-spark-building-spark-mllib-in-a-play-spark-application/ shows an example how spark mllib can be used inside Play web application. I am expecting, I can use spark mllib in any other JVM based web application in the same way(?). Please share if any one has any experience on this. Advantage of this approach is : - No recurring effort to support any new model or any change in Mllib model entity in future version. - Less dependency on any other tools Disadvantages: - Model can not be ported to non JVM system - Model serialized using one version of Mllib entity, may not be deserializable using a different version of mllib entity(?). I think this is a quite common problem.I am really interested to hear from you people how you are solving this and what are the approaches and pros and cons. Thanks Sourabh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org