Re: Contributing to MLlib on GLM

2014-07-07 Thread xwei
Hi Gang,

No admin is looking at our patch:( do you have some suggestions so that our
patch can get noticed by the admin?

Best regards,

Xiaokai


On Mon, Jun 30, 2014 at 8:18 PM, Gang Bai [via Apache Spark Developers
List] ml-node+s1001551n713...@n3.nabble.com wrote:

 Thanks Xiaokai,

 I’ve created a pull request to merge features in my PR to your repo.
 Please take a review here https://github.com/xwei-datageek/spark/pull/2 .

 As for GLMs, here at Sina, we are solving the problem of predicting the
 num of visitors who read a particular news article or watch an online
 sports live stream in a particular period. I’m trying to improve the
 prediction results by tuning features and incorporating new models. So I’ll
 try Gamma regression later. Thanks for the implementation.

 Cheers,
 -Gang

 On Jun 29, 2014, at 8:17 AM, xwei [hidden email]
 http://user/SendEmail.jtp?type=nodenode=7131i=0 wrote:

  Hi Gang,
 
  No worries!
 
  I agree LBFGS would converge faster and your test suite is more
 comprehensive. I'd like to merge my branch with yours.
 
  I also agree with your viewpoint on the redundancy issue. For different
 GLMs, usually they only differ in gradient calculation but the
 regression.scala files are quite similar. For example,
 linearRegressionSGD, logisticRegressionSGD, RidgeRegressionSGD,
 poissonRegressionSGD all share quite a bit of common code in their class
 implementations. Since such redundancy is already there in the legacy code,
 simply merging Poisson and Gamma does not seem to help much. So I suggest
 we just leave them as separate classes for the time being.
 
 
  Best regards,
 
  Xiaokai
 
  On Jun 27, 2014, at 6:45 PM, Gang Bai [via Apache Spark Developers List]
 wrote:
 
  Hi Xiaokai,
 
  My bad. I didn't notice this before I created another PR for Poisson
 regression. The mails were buried in junk by the corp mail master. Also,
 thanks for considering my comments and advice in your PR.
 
  Adding my two cents here:
 
  * PoissonRegressionModel and GammaRegressionModel have the same fields
 and prediction method. Shall we use one instead of two redundant classes?
 Say, a LogLinearModel.
  * The LBFGS optimizer takes fewer iterations and results in better
 convergence than SGD. I implemented two GeneralizedLinearAlgorithm classes
 using LBFGS and SGD respectively. You may take a look into it. If it's OK
 to you, I'd be happy to send a PR to your branch.
  * In addition to the generated test data, We may use some real-world
 data for testing. In my implementation, I added the test data from
 https://onlinecourses.science.psu.edu/stat504/node/223. Please check my
 test suite.
 
  -Gang
  Sent from my iPad
 
  On 2014年6月27日, at 下午6:03, xwei [hidden email] wrote:
 
 
  Yes, that's what we did: adding two gradient functions to
 Gradient.scala and
  create PoissonRegression and GammaRegression using these gradients. We
 made
  a PR on this.
 
 
 
  --
  View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html
  Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.
 
 
  If you reply to this email, your message will be added to the
 discussion below:
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7107.html
  To unsubscribe from Contributing to MLlib on GLM, click here.
  NAML
 
 
 
 
 
  --
  View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7117.html

  Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7131.html
  To unsubscribe from Contributing to MLlib on GLM, click here
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=7033code=d2VpeGlhb2thaUBnbWFpbC5jb218NzAzM3w2NTc5NDUzMzA=
 .
 NAML
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7197.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib on GLM

2014-06-30 Thread Gang Bai
Thanks Xiaokai,

I’ve created a pull request to merge features in my PR to your repo. Please 
take a review here https://github.com/xwei-datageek/spark/pull/2 .

As for GLMs, here at Sina, we are solving the problem of predicting the num of 
visitors who read a particular news article or watch an online sports live 
stream in a particular period. I’m trying to improve the prediction results by 
tuning features and incorporating new models. So I’ll try Gamma regression 
later. Thanks for the implementation.

Cheers,
-Gang

On Jun 29, 2014, at 8:17 AM, xwei weixiao...@gmail.com wrote:

 Hi Gang,
 
 No worries! 
 
 I agree LBFGS would converge faster and your test suite is more 
 comprehensive. I'd like to merge my branch with yours.
 
 I also agree with your viewpoint on the redundancy issue. For different GLMs, 
 usually they only differ in gradient calculation but the regression.scala 
 files are quite similar. For example, linearRegressionSGD, 
 logisticRegressionSGD, RidgeRegressionSGD, poissonRegressionSGD all share 
 quite a bit of common code in their class implementations. Since such 
 redundancy is already there in the legacy code, simply merging Poisson and 
 Gamma does not seem to help much. So I suggest we just leave them as separate 
 classes for the time being. 
 
 
 Best regards,
 
 Xiaokai
 
 On Jun 27, 2014, at 6:45 PM, Gang Bai [via Apache Spark Developers List] 
 wrote:
 
 Hi Xiaokai, 
 
 My bad. I didn't notice this before I created another PR for Poisson 
 regression. The mails were buried in junk by the corp mail master. Also, 
 thanks for considering my comments and advice in your PR. 
 
 Adding my two cents here: 
 
 * PoissonRegressionModel and GammaRegressionModel have the same fields and 
 prediction method. Shall we use one instead of two redundant classes? Say, a 
 LogLinearModel. 
 * The LBFGS optimizer takes fewer iterations and results in better 
 convergence than SGD. I implemented two GeneralizedLinearAlgorithm classes 
 using LBFGS and SGD respectively. You may take a look into it. If it's OK to 
 you, I'd be happy to send a PR to your branch. 
 * In addition to the generated test data, We may use some real-world data 
 for testing. In my implementation, I added the test data from 
 https://onlinecourses.science.psu.edu/stat504/node/223. Please check my test 
 suite. 
 
 -Gang 
 Sent from my iPad 
 
 On 2014年6月27日, at 下午6:03, xwei [hidden email] wrote: 
 
 
 Yes, that's what we did: adding two gradient functions to Gradient.scala 
 and 
 create PoissonRegression and GammaRegression using these gradients. We made 
 a PR on this. 
 
 
 
 -- 
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html
 Sent from the Apache Spark Developers List mailing list archive at 
 Nabble.com. 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7107.html
 To unsubscribe from Contributing to MLlib on GLM, click here.
 NAML
 
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7117.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.



Re: Contributing to MLlib on GLM

2014-06-27 Thread 白刚
Hi Xiaokai,

My bad. I didn't notice this before I created another PR for Poisson 
regression. The mails were buried in junk by the corp mail master. Also, thanks 
for considering my comments and advice in your PR.

Adding my two cents here:

* PoissonRegressionModel and GammaRegressionModel have the same fields and 
prediction method. Shall we use one instead of two redundant classes? Say, a 
LogLinearModel.
* The LBFGS optimizer takes fewer iterations and results in better convergence 
than SGD. I implemented two GeneralizedLinearAlgorithm classes using LBFGS and 
SGD respectively. You may take a look into it. If it's OK to you, I'd be happy 
to send a PR to your branch.
* In addition to the generated test data, We may use some real-world data for 
testing. In my implementation, I added the test data from 
https://onlinecourses.science.psu.edu/stat504/node/223. Please check my test 
suite.

-Gang
Sent from my iPad

 On 2014年6月27日, at 下午6:03, xwei weixiao...@gmail.com wrote:
 
 
 Yes, that's what we did: adding two gradient functions to Gradient.scala and
 create PoissonRegression and GammaRegression using these gradients. We made
 a PR on this.
 
 
 
 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: Contributing to MLlib on GLM

2014-06-26 Thread xwei
Yes, that's what we did: adding two gradient functions to Gradient.scala and
create PoissonRegression and GammaRegression using these gradients. We made
a PR on this.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: Contributing to MLlib on GLM

2014-06-25 Thread Sung Hwan Chung
Well, as you said, MLLib already supports GLM in a sense. Except they only
support two link functions - identity (linear regression) and logit
(logistic regression). It should not be too hard to add other link
functions, as all you have to do is add a different gradient function for
Poisson/Gamma, etc - look at Gradient.scala in mllib.


On Tue, Jun 17, 2014 at 5:00 PM, Xiaokai Wei x...@palantir.com wrote:

 Hi,

 I am an intern at PalantirTech and we are building some stuff on top of
 MLlib. In Particular, GLM is of great interest to us.  Though
 GeneralizedLinearModel in MLlib 1.0.0 has some important GLMs such as
 Logistic Regression, Linear Regression, some other important GLMs like
 Poisson Regression are still missing.

 I am curious that if anyone is already working on other GLMs (e.g.
 Poisson, Gamma). If not, we would like to contribute to MLlib on GLM. Is
 adding more GLMs on the roadmap of MLlib?


 Sincerely,

 Xiaokai



Re: Contributing to MLlib on GLM

2014-06-17 Thread Sandy Ryza
Hi Xiaokai,

I think MLLib is definitely interested in supporting additional GLMs.  I'm
not aware of anybody working on this at the moment.

-Sandy


On Tue, Jun 17, 2014 at 5:00 PM, Xiaokai Wei x...@palantir.com wrote:

 Hi,

 I am an intern at PalantirTech and we are building some stuff on top of
 MLlib. In Particular, GLM is of great interest to us.  Though
 GeneralizedLinearModel in MLlib 1.0.0 has some important GLMs such as
 Logistic Regression, Linear Regression, some other important GLMs like
 Poisson Regression are still missing.

 I am curious that if anyone is already working on other GLMs (e.g.
 Poisson, Gamma). If not, we would like to contribute to MLlib on GLM. Is
 adding more GLMs on the roadmap of MLlib?


 Sincerely,

 Xiaokai



Re: Contributing to MLlib on GLM

2014-06-17 Thread Andrew Ash
Hi Xiaokai,

Also take a look through Xiangrui's slides from HadoopSummit a few weeks
back: http://www.slideshare.net/xrmeng/m-llib-hadoopsummit  The roadmap
starting at slide 51 will probably be interesting to you.

Andrew


On Tue, Jun 17, 2014 at 7:37 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Xiaokai,

 I think MLLib is definitely interested in supporting additional GLMs.  I'm
 not aware of anybody working on this at the moment.

 -Sandy


 On Tue, Jun 17, 2014 at 5:00 PM, Xiaokai Wei x...@palantir.com wrote:

  Hi,
 
  I am an intern at PalantirTech and we are building some stuff on top of
  MLlib. In Particular, GLM is of great interest to us.  Though
  GeneralizedLinearModel in MLlib 1.0.0 has some important GLMs such as
  Logistic Regression, Linear Regression, some other important GLMs like
  Poisson Regression are still missing.
 
  I am curious that if anyone is already working on other GLMs (e.g.
  Poisson, Gamma). If not, we would like to contribute to MLlib on GLM. Is
  adding more GLMs on the roadmap of MLlib?
 
 
  Sincerely,
 
  Xiaokai