Re: Status of MLLib exporting models to PMML

2014-11-28 Thread selvinsource
Hi,

so you know, I added PMML export for linear models (linear, ridge and lasso)
as suggested by Xiangrui. 

I will be looking at SVMs and Logistic regression next.

Vincenzo



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Status-of-MLLib-exporting-models-to-PMML-tp18514p20005.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Status of MLLib exporting models to PMML

2014-11-18 Thread Charles Earl
Yes,
The case is convincing for PMML with Oryx. I will also investigate
parameter server.
Cheers,
Charles

On Tuesday, November 18, 2014, Sean Owen so...@cloudera.com wrote:

 I'm just using PMML. I haven't hit any limitation of its
 expressiveness, for the model types is supports. I don't think there
 is a point in defining a new format for models, excepting that PMML
 can get very big. Still, just compressing the XML gets it down to a
 manageable size for just about any realistic model.*

 I can imagine some kind of translation from PMML-in-XML to
 PMML-in-something-else that is more compact. I've not seen anyone do
 this.

 * there still aren't formats for factored matrices and probably won't
 ever quite be, since they're just too large for a file format.

 On Tue, Nov 18, 2014 at 5:34 AM, Manish Amde manish...@gmail.com
 javascript:; wrote:
  Hi Charles,
 
  I am not aware of other storage formats. Perhaps Sean or Sandy can
 elaborate
  more given their experience with Oryx.
 
  There is work by Smola et al at Google that talks about large scale model
  update and deployment.
 
 https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu
 
  -Manish
 



-- 
- Charles


Re: Status of MLLib exporting models to PMML

2014-11-17 Thread Manish Amde
Hi Charles,

I am not aware of other storage formats. Perhaps Sean or Sandy can
elaborate more given their experience with Oryx.

There is work by Smola et al at Google that talks about large scale model
update and deployment.
https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu

-Manish

On Sunday, November 16, 2014, Charles Earl charles.ce...@gmail.com wrote:

 Manish and others,
 A follow up question on my mind is whether there are protobuf (or other
 binary format) frameworks in the vein of PMML. Perhaps scientific data
 storage frameworks like netcdf, root are possible also.
 I like the comprehensiveness of PMML but as you mention the complexity of
 management for large models is a concern.
 Cheers

 On Fri, Nov 14, 2014 at 1:35 AM, Manish Amde manish...@gmail.com
 javascript:_e(%7B%7D,'cvml','manish...@gmail.com'); wrote:

 @Aris, we are closely following the PMML work that is going on and as
 Xiangrui mentioned, it might be easier to migrate models such as logistic
 regression and then migrate trees. Some of the models get fairly large (as
 pointed out by Sung Chung) with deep trees as building blocks and we might
 have to consider a distributed storage and prediction strategy.


 On Tuesday, November 11, 2014, Xiangrui Meng men...@gmail.com
 javascript:_e(%7B%7D,'cvml','men...@gmail.com'); wrote:

 Vincenzo sent a PR and included k-means as an example. Sean is helping
 review it. PMML standard is quite large. So we may start with simple
 model export, like linear methods, then move forward to tree-based.
 -Xiangrui

 On Mon, Nov 10, 2014 at 11:27 AM, Aris arisofala...@gmail.com wrote:
  Hello Spark and MLLib folks,
 
  So a common problem in the real world of using machine learning is
 that some
  data analysis use tools like R, but the more data engineers out
 there will
  use more advanced systems like Spark MLLib or even Python Scikit Learn.
 
  In the real world, I want to have a system where multiple different
  modeling environments can learn from data / build models, represent the
  models in a common language, and then have a layer which just takes the
  model and run model.predict() all day long -- scores the models in
 other
  words.
 
  It looks like the project openscoring.io and jpmml-evaluator are some
  amazing systems for this, but they fundamentally use PMML as the model
  representation here.
 
  I have read some JIRA tickets that Xiangrui Meng is interested in
 getting
  PMML implemented to export MLLib models, is that happening? Further,
 would
  something like Manish Amde's boosted ensemble tree methods be
 representable
  in PMML?
 
  Thank you!!
  Aris

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --
 - Charles



Re: Status of MLLib exporting models to PMML

2014-11-17 Thread Sean Owen
I'm just using PMML. I haven't hit any limitation of its
expressiveness, for the model types is supports. I don't think there
is a point in defining a new format for models, excepting that PMML
can get very big. Still, just compressing the XML gets it down to a
manageable size for just about any realistic model.*

I can imagine some kind of translation from PMML-in-XML to
PMML-in-something-else that is more compact. I've not seen anyone do
this.

* there still aren't formats for factored matrices and probably won't
ever quite be, since they're just too large for a file format.

On Tue, Nov 18, 2014 at 5:34 AM, Manish Amde manish...@gmail.com wrote:
 Hi Charles,

 I am not aware of other storage formats. Perhaps Sean or Sandy can elaborate
 more given their experience with Oryx.

 There is work by Smola et al at Google that talks about large scale model
 update and deployment.
 https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu

 -Manish


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Status of MLLib exporting models to PMML

2014-11-16 Thread Charles Earl
Manish and others,
A follow up question on my mind is whether there are protobuf (or other
binary format) frameworks in the vein of PMML. Perhaps scientific data
storage frameworks like netcdf, root are possible also.
I like the comprehensiveness of PMML but as you mention the complexity of
management for large models is a concern.
Cheers

On Fri, Nov 14, 2014 at 1:35 AM, Manish Amde manish...@gmail.com wrote:

 @Aris, we are closely following the PMML work that is going on and as
 Xiangrui mentioned, it might be easier to migrate models such as logistic
 regression and then migrate trees. Some of the models get fairly large (as
 pointed out by Sung Chung) with deep trees as building blocks and we might
 have to consider a distributed storage and prediction strategy.


 On Tuesday, November 11, 2014, Xiangrui Meng men...@gmail.com wrote:

 Vincenzo sent a PR and included k-means as an example. Sean is helping
 review it. PMML standard is quite large. So we may start with simple
 model export, like linear methods, then move forward to tree-based.
 -Xiangrui

 On Mon, Nov 10, 2014 at 11:27 AM, Aris arisofala...@gmail.com wrote:
  Hello Spark and MLLib folks,
 
  So a common problem in the real world of using machine learning is that
 some
  data analysis use tools like R, but the more data engineers out there
 will
  use more advanced systems like Spark MLLib or even Python Scikit Learn.
 
  In the real world, I want to have a system where multiple different
  modeling environments can learn from data / build models, represent the
  models in a common language, and then have a layer which just takes the
  model and run model.predict() all day long -- scores the models in other
  words.
 
  It looks like the project openscoring.io and jpmml-evaluator are some
  amazing systems for this, but they fundamentally use PMML as the model
  representation here.
 
  I have read some JIRA tickets that Xiangrui Meng is interested in
 getting
  PMML implemented to export MLLib models, is that happening? Further,
 would
  something like Manish Amde's boosted ensemble tree methods be
 representable
  in PMML?
 
  Thank you!!
  Aris

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
- Charles


Re: Status of MLLib exporting models to PMML

2014-11-13 Thread Manish Amde
@Aris, we are closely following the PMML work that is going on and as
Xiangrui mentioned, it might be easier to migrate models such as logistic
regression and then migrate trees. Some of the models get fairly large (as
pointed out by Sung Chung) with deep trees as building blocks and we might
have to consider a distributed storage and prediction strategy.


On Tuesday, November 11, 2014, Xiangrui Meng men...@gmail.com wrote:

 Vincenzo sent a PR and included k-means as an example. Sean is helping
 review it. PMML standard is quite large. So we may start with simple
 model export, like linear methods, then move forward to tree-based.
 -Xiangrui

 On Mon, Nov 10, 2014 at 11:27 AM, Aris arisofala...@gmail.com
 javascript:; wrote:
  Hello Spark and MLLib folks,
 
  So a common problem in the real world of using machine learning is that
 some
  data analysis use tools like R, but the more data engineers out there
 will
  use more advanced systems like Spark MLLib or even Python Scikit Learn.
 
  In the real world, I want to have a system where multiple different
  modeling environments can learn from data / build models, represent the
  models in a common language, and then have a layer which just takes the
  model and run model.predict() all day long -- scores the models in other
  words.
 
  It looks like the project openscoring.io and jpmml-evaluator are some
  amazing systems for this, but they fundamentally use PMML as the model
  representation here.
 
  I have read some JIRA tickets that Xiangrui Meng is interested in getting
  PMML implemented to export MLLib models, is that happening? Further,
 would
  something like Manish Amde's boosted ensemble tree methods be
 representable
  in PMML?
 
  Thank you!!
  Aris

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org javascript:;
 For additional commands, e-mail: user-h...@spark.apache.org javascript:;




Re: Status of MLLib exporting models to PMML

2014-11-12 Thread Villu Ruusmann
Hi DB,


DB Tsai wrote
 I also worry about that the author of JPMML changed the license of
 jpmml-evaluator due to his interest of his commercial business, and he
 might change the license of jpmml-model in the future.

I am the principal author of the said Java PMML API projects and I want to
assure you that I have no plans of changing the license of the JPMML-Model
project now or in the future. In fact, most of the codebase is copyrighted
by University of Tartu, so I can not do it even if I wanted to.

I would also like to clarify the licensing of the JPMML-Evaluator project.
This is a fork of the legacy JPMML project (https://github.com/jpmml/jpmml),
which was started in early 2014 in order to provide support for the PMML
specification version 4.2, implement missing functionality and do other
enhancements. The project was initiated with the AGPLv3 license, there have
been no unexpected license changes.

Developing Java PMML APIs is a full-time work for me. If you (or anybody
else) can suggest how I can support myself doing this under some license
other than (A)GPLv3, I'd be interested to find out more. So far, I have
learned that BSD 3-Clause License doesn't work - I've yet to receive a
single thank you message for my previous work in this field, and many
other fields.


VR



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Status-of-MLLib-exporting-models-to-PMML-tp18514p18729.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Status of MLLib exporting models to PMML

2014-11-11 Thread Xiangrui Meng
Vincenzo sent a PR and included k-means as an example. Sean is helping
review it. PMML standard is quite large. So we may start with simple
model export, like linear methods, then move forward to tree-based.
-Xiangrui

On Mon, Nov 10, 2014 at 11:27 AM, Aris arisofala...@gmail.com wrote:
 Hello Spark and MLLib folks,

 So a common problem in the real world of using machine learning is that some
 data analysis use tools like R, but the more data engineers out there will
 use more advanced systems like Spark MLLib or even Python Scikit Learn.

 In the real world, I want to have a system where multiple different
 modeling environments can learn from data / build models, represent the
 models in a common language, and then have a layer which just takes the
 model and run model.predict() all day long -- scores the models in other
 words.

 It looks like the project openscoring.io and jpmml-evaluator are some
 amazing systems for this, but they fundamentally use PMML as the model
 representation here.

 I have read some JIRA tickets that Xiangrui Meng is interested in getting
 PMML implemented to export MLLib models, is that happening? Further, would
 something like Manish Amde's boosted ensemble tree methods be representable
 in PMML?

 Thank you!!
 Aris

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Status of MLLib exporting models to PMML

2014-11-11 Thread DB Tsai
JPMML evaluator just changed their license to AGPL or commercial
license, and I think AGPL is not compatible with apache project. Any
advice?

https://github.com/jpmml/jpmml-evaluator

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Tue, Nov 11, 2014 at 10:07 AM, Xiangrui Meng men...@gmail.com wrote:
 Vincenzo sent a PR and included k-means as an example. Sean is helping
 review it. PMML standard is quite large. So we may start with simple
 model export, like linear methods, then move forward to tree-based.
 -Xiangrui

 On Mon, Nov 10, 2014 at 11:27 AM, Aris arisofala...@gmail.com wrote:
 Hello Spark and MLLib folks,

 So a common problem in the real world of using machine learning is that some
 data analysis use tools like R, but the more data engineers out there will
 use more advanced systems like Spark MLLib or even Python Scikit Learn.

 In the real world, I want to have a system where multiple different
 modeling environments can learn from data / build models, represent the
 models in a common language, and then have a layer which just takes the
 model and run model.predict() all day long -- scores the models in other
 words.

 It looks like the project openscoring.io and jpmml-evaluator are some
 amazing systems for this, but they fundamentally use PMML as the model
 representation here.

 I have read some JIRA tickets that Xiangrui Meng is interested in getting
 PMML implemented to export MLLib models, is that happening? Further, would
 something like Manish Amde's boosted ensemble tree methods be representable
 in PMML?

 Thank you!!
 Aris

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Status of MLLib exporting models to PMML

2014-11-11 Thread Sean Owen
Yes, jpmml-evaluator is AGPL, but things like jpmml-model are not; they're
3-clause BSD:

https://github.com/jpmml/jpmml-model

So some of the scoring components are off-limits for an AL2 project but the
core model components are OK.

On Tue, Nov 11, 2014 at 7:40 PM, DB Tsai dbt...@dbtsai.com wrote:

 JPMML evaluator just changed their license to AGPL or commercial
 license, and I think AGPL is not compatible with apache project. Any
 advice?

 https://github.com/jpmml/jpmml-evaluator

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Tue, Nov 11, 2014 at 10:07 AM, Xiangrui Meng men...@gmail.com wrote:
  Vincenzo sent a PR and included k-means as an example. Sean is helping
  review it. PMML standard is quite large. So we may start with simple
  model export, like linear methods, then move forward to tree-based.
  -Xiangrui
 
  On Mon, Nov 10, 2014 at 11:27 AM, Aris arisofala...@gmail.com wrote:
  Hello Spark and MLLib folks,
 
  So a common problem in the real world of using machine learning is that
 some
  data analysis use tools like R, but the more data engineers out there
 will
  use more advanced systems like Spark MLLib or even Python Scikit Learn.
 
  In the real world, I want to have a system where multiple different
  modeling environments can learn from data / build models, represent the
  models in a common language, and then have a layer which just takes the
  model and run model.predict() all day long -- scores the models in other
  words.
 
  It looks like the project openscoring.io and jpmml-evaluator are some
  amazing systems for this, but they fundamentally use PMML as the model
  representation here.
 
  I have read some JIRA tickets that Xiangrui Meng is interested in
 getting
  PMML implemented to export MLLib models, is that happening? Further,
 would
  something like Manish Amde's boosted ensemble tree methods be
 representable
  in PMML?
 
  Thank you!!
  Aris
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 



Re: Status of MLLib exporting models to PMML

2014-11-11 Thread DB Tsai
I also worry about that the author of JPMML changed the license of
jpmml-evaluator due to his interest of his commercial business, and he
might change the license of jpmml-model in the future.

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Tue, Nov 11, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote:
 Yes, jpmml-evaluator is AGPL, but things like jpmml-model are not; they're
 3-clause BSD:

 https://github.com/jpmml/jpmml-model

 So some of the scoring components are off-limits for an AL2 project but the
 core model components are OK.

 On Tue, Nov 11, 2014 at 7:40 PM, DB Tsai dbt...@dbtsai.com wrote:

 JPMML evaluator just changed their license to AGPL or commercial
 license, and I think AGPL is not compatible with apache project. Any
 advice?

 https://github.com/jpmml/jpmml-evaluator

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Tue, Nov 11, 2014 at 10:07 AM, Xiangrui Meng men...@gmail.com wrote:
  Vincenzo sent a PR and included k-means as an example. Sean is helping
  review it. PMML standard is quite large. So we may start with simple
  model export, like linear methods, then move forward to tree-based.
  -Xiangrui
 
  On Mon, Nov 10, 2014 at 11:27 AM, Aris arisofala...@gmail.com wrote:
  Hello Spark and MLLib folks,
 
  So a common problem in the real world of using machine learning is that
  some
  data analysis use tools like R, but the more data engineers out there
  will
  use more advanced systems like Spark MLLib or even Python Scikit Learn.
 
  In the real world, I want to have a system where multiple different
  modeling environments can learn from data / build models, represent the
  models in a common language, and then have a layer which just takes the
  model and run model.predict() all day long -- scores the models in
  other
  words.
 
  It looks like the project openscoring.io and jpmml-evaluator are some
  amazing systems for this, but they fundamentally use PMML as the model
  representation here.
 
  I have read some JIRA tickets that Xiangrui Meng is interested in
  getting
  PMML implemented to export MLLib models, is that happening? Further,
  would
  something like Manish Amde's boosted ensemble tree methods be
  representable
  in PMML?
 
  Thank you!!
  Aris
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Status of MLLib exporting models to PMML

2014-11-11 Thread Sean Owen
Yes although I think this difference is on purpose as part of that
commercial strategy. If future versions change license it would still be
possible to not upgrade. Or fork / recreate the bean classes. Not worried
so much but it is a good point.
On Nov 11, 2014 10:06 PM, DB Tsai dbt...@dbtsai.com wrote:

 I also worry about that the author of JPMML changed the license of
 jpmml-evaluator due to his interest of his commercial business, and he
 might change the license of jpmml-model in the future.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Tue, Nov 11, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote:
  Yes, jpmml-evaluator is AGPL, but things like jpmml-model are not;
 they're
  3-clause BSD:
 
  https://github.com/jpmml/jpmml-model
 
  So some of the scoring components are off-limits for an AL2 project but
 the
  core model components are OK.
 
  On Tue, Nov 11, 2014 at 7:40 PM, DB Tsai dbt...@dbtsai.com wrote:
 
  JPMML evaluator just changed their license to AGPL or commercial
  license, and I think AGPL is not compatible with apache project. Any
  advice?
 
  https://github.com/jpmml/jpmml-evaluator
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Tue, Nov 11, 2014 at 10:07 AM, Xiangrui Meng men...@gmail.com
 wrote:
   Vincenzo sent a PR and included k-means as an example. Sean is helping
   review it. PMML standard is quite large. So we may start with simple
   model export, like linear methods, then move forward to tree-based.
   -Xiangrui
  
   On Mon, Nov 10, 2014 at 11:27 AM, Aris arisofala...@gmail.com
 wrote:
   Hello Spark and MLLib folks,
  
   So a common problem in the real world of using machine learning is
 that
   some
   data analysis use tools like R, but the more data engineers out
 there
   will
   use more advanced systems like Spark MLLib or even Python Scikit
 Learn.
  
   In the real world, I want to have a system where multiple different
   modeling environments can learn from data / build models, represent
 the
   models in a common language, and then have a layer which just takes
 the
   model and run model.predict() all day long -- scores the models in
   other
   words.
  
   It looks like the project openscoring.io and jpmml-evaluator are
 some
   amazing systems for this, but they fundamentally use PMML as the
 model
   representation here.
  
   I have read some JIRA tickets that Xiangrui Meng is interested in
   getting
   PMML implemented to export MLLib models, is that happening? Further,
   would
   something like Manish Amde's boosted ensemble tree methods be
   representable
   in PMML?
  
   Thank you!!
   Aris
  
   -
   To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
   For additional commands, e-mail: user-h...@spark.apache.org