Re: MLLIB model export: PMML vs MLLIB serialization

2014-12-15 Thread sourabh
Thanks Vincenzo.
Are you trying out all the models implemented in mllib? Actually I don't
see decision tree there. Sorry if I missed it. When are you planning to
merge this to spark branch?

Thanks
Sourabh

On Sun, Dec 14, 2014 at 5:54 PM, selvinsource [via Apache Spark User List] 
ml-node+s1001560n20674...@n3.nabble.com wrote:

 Hi Sourabh,

 have a look at https://issues.apache.org/jira/browse/SPARK-1406, I am
 looking into exporting models in PMML using JPMML.

 Regards,
 Vincenzo

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20674.html
  To unsubscribe from MLLIB model export: PMML vs MLLIB serialization, click
 here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=20324code=Y2hha2kuc291cmFiaEBnbWFpbC5jb218MjAzMjR8LTY5MzQzMTU5OQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20688.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: MLLIB model export: PMML vs MLLIB serialization

2014-12-15 Thread selvinsource
I am going to try to export decision tree next, so far I focused on linear
models and k-means.

Regards,
Vincenzo





sourabh wrote
 Thanks Vincenzo.
 Are you trying out all the models implemented in mllib? Actually I don't
 see decision tree there. Sorry if I missed it. When are you planning to
 merge this to spark branch?
 
 Thanks
 Sourabh





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20693.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: MLLIB model export: PMML vs MLLIB serialization

2014-12-14 Thread selvinsource
Hi Sourabh,

have a look at https://issues.apache.org/jira/browse/SPARK-1406, I am
looking into exporting models in PMML using JPMML.

Regards,
Vincenzo



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20674.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: MLLIB model export: PMML vs MLLIB serialization

2014-12-04 Thread manish_k
Hi Sourabh,

I came across same problem as you. One workable solution for me was to
serialize the parts of model that can be used again to recreate it. I
serialize RDD's in my model using saveAsObjectFile with a time stamp
attached to it in HDFS. My other spark application read from the latest
stored dir from HDFS using sc.ObjectFile and recreate the recently trained
model for prediction. 

I think this is not the best solution but it worked for me. I am also
looking for other efficient approaches for such problem where exporting of
model to some other application is required.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324p20348.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



MLLIB model export: PMML vs MLLIB serialization

2014-12-03 Thread sourabh
Hi All,
I am doing model training using Spark MLLIB inside our hadoop cluster. But
prediction happens in a different realtime synchronous system(Web
application). I am currently exploring different options to export the
trained Mllib models from spark.

   1. *Export model as PMML:* I found the projects under  JPMML: Java PMML
API https://github.com/jpmml   is quite interesting. Use  JPMML
https://github.com/jpmml/jpmml   to convert the mllib model entity to
PMML. And use  PMML evaluator https://github.com/jpmml/jpmml-evaluator  
for prediction in a different system. Or we can also explore  openscoring
rest api https://github.com/jpmml/openscoring   for model deployment and
prediction.

This could be standard approach if we need to port models across different
systems. But converting non linear Mllib models to PMML might be a complex
task. Apart from that I need to keep on updating my Mllib to PMML conversion
code for any new Mllib models or any change in Mllib entities.
I have not evaluated any of these JPMML projects personally and I see there
is only single contributor for these projects. Just wondering if enough
people have already started using these projects. Please share if any of you
have any points on this.

   2. *Export MLLIB model as serialized form:* Mllib models can be
serialized using  Kryo serialization
http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAFRXrqdpkfCX41=JyTSmmtt8aNWrSdpJvxE3FmYVZ=uuepe...@mail.gmail.com%3E
  
or normal  java serialization
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-td11953.html
 
. And the same model can be deserialized by different other standalone
applications and use the mllib entity for prediction.  This blog
http://blog.knoldus.com/2014/07/21/play-with-spark-building-spark-mllib-in-a-play-spark-application/
  
shows an example how spark mllib can be used inside Play web application. I
am expecting, I can use spark mllib in any other JVM based web application
in the same way(?). Please share if any one has any experience on this.
  Advantage of this approach is :
 - No recurring effort to support any new model or any change in Mllib
model entity in future version.
 - Less dependency on any other tools
  Disadvantages:
 - Model can not be ported to non JVM system
 - Model serialized using one version of Mllib entity, may not be
deserializable using a different version of mllib entity(?).

I think this is a quite common problem.I am really interested to hear from
you people how you are solving this and what are the approaches and pros and
cons.

Thanks
Sourabh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-model-export-PMML-vs-MLLIB-serialization-tp20324.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org