This will bring the whole dependencies of spark will may break the web app.
Sincerely, DB Tsai ---------------------------------------------------------- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Thu, Nov 12, 2015 at 8:15 PM, Nirmal Fernando <nir...@wso2.com> wrote: > > > On Fri, Nov 13, 2015 at 2:04 AM, darren <dar...@ontrenet.com> wrote: > >> I agree 100%. Making the model requires large data and many cpus. >> >> Using it does not. >> >> This is a very useful side effect of ML models. >> >> If mlib can't use models outside spark that's a real shame. >> > > Well you can as mentioned earlier. You don't need Spark runtime for > predictions, save the serialized model and deserialize to use. (you need > the Spark Jars in the classpath though) > >> >> >> Sent from my Verizon Wireless 4G LTE smartphone >> >> >> -------- Original message -------- >> From: "Kothuvatiparambil, Viju" <viju.kothuvatiparam...@bankofamerica.com> >> >> Date: 11/12/2015 3:09 PM (GMT-05:00) >> To: DB Tsai <dbt...@dbtsai.com>, Sean Owen <so...@cloudera.com> >> Cc: Felix Cheung <felixcheun...@hotmail.com>, Nirmal Fernando < >> nir...@wso2.com>, Andy Davidson <a...@santacruzintegration.com>, Adrian >> Tanase <atan...@adobe.com>, "user @spark" <user@spark.apache.org>, >> Xiangrui Meng <men...@gmail.com>, hol...@pigscanfly.ca >> Subject: RE: thought experiment: use spark ML to real time prediction >> >> I am glad to see DB’s comments, make me feel I am not the only one facing >> these issues. If we are able to use MLLib to load the model in web >> applications (outside the spark cluster), that would have solved the >> issue. I understand Spark is manly for processing big data in a >> distributed mode. But, there is no purpose in training a model using MLLib, >> if we are not able to use it in applications where needs to access the >> model. >> >> >> >> Thanks >> >> Viju >> >> >> >> *From:* DB Tsai [mailto:dbt...@dbtsai.com] >> *Sent:* Thursday, November 12, 2015 11:04 AM >> *To:* Sean Owen >> *Cc:* Felix Cheung; Nirmal Fernando; Andy Davidson; Adrian Tanase; user >> @spark; Xiangrui Meng; hol...@pigscanfly.ca >> *Subject:* Re: thought experiment: use spark ML to real time prediction >> >> >> >> I think the use-case can be quick different from PMML. >> >> >> >> By having a Spark platform independent ML jar, this can empower users to >> do the following, >> >> >> >> 1) PMML doesn't contain all the models we have in mllib. Also, for a ML >> pipeline trained by Spark, most of time, PMML is not expressive enough to >> do all the transformation we have in Spark ML. As a result, if we are able >> to serialize the entire Spark ML pipeline after training, and then load >> them back in app without any Spark platform for production scorning, this >> will be very useful for production deployment of Spark ML models. The only >> issue will be if the transformer involves with shuffle, we need to figure >> out a way to handle it. When I chatted with Xiangrui about this, he >> suggested that we may tag if a transformer is shuffle ready. Currently, at >> Netflix, we are not able to use ML pipeline because of those issues, and we >> have to write our own scorers in our production which is quite a duplicated >> work. >> >> >> >> 2) If users can use Spark's linear algebra like vector or matrix code in >> their application, this will be very useful. This can help to share code in >> Spark training pipeline and production deployment. Also, lots of good stuff >> at Spark's mllib doesn't depend on Spark platform, and people can use them >> in their application without pulling lots of dependencies. In fact, in my >> project, I have to copy & paste code from mllib into my project to use >> those goodies in apps. >> >> >> >> 3) Currently, mllib depends on graphx which means in graphx, there is no >> way to use mllib's vector or matrix. And >> > > > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > >