subject:"Re\: Contributing to MLlib on GLM"

Re: Contributing to MLlib on GLM

2014-07-07 Thread xwei

Hi Gang,

No admin is looking at our patch:( do you have some suggestions so that our
patch can get noticed by the admin?

Best regards,

Xiaokai

On Mon, Jun 30, 2014 at 8:18 PM, Gang Bai [via Apache Spark Developers
List] ml-node+s1001551n713...@n3.nabble.com wrote:

Thanks Xiaokai,

I’ve created a pull request to merge features in my PR to your repo.
Please take a review here https://github.com/xwei-datageek/spark/pull/2 .

As for GLMs, here at Sina, we are solving the problem of predicting the
num of visitors who read a particular news article or watch an online
sports live stream in a particular period. I’m trying to improve the
prediction results by tuning features and incorporating new models. So I’ll
try Gamma regression later. Thanks for the implementation.

Cheers,
-Gang

On Jun 29, 2014, at 8:17 AM, xwei [hidden email]
http://user/SendEmail.jtp?type=nodenode=7131i=0 wrote:

Hi Gang,

No worries!

I agree LBFGS would converge faster and your test suite is more
comprehensive. I'd like to merge my branch with yours.

I also agree with your viewpoint on the redundancy issue. For different
GLMs, usually they only differ in gradient calculation but the
regression.scala files are quite similar. For example,
linearRegressionSGD, logisticRegressionSGD, RidgeRegressionSGD,
poissonRegressionSGD all share quite a bit of common code in their class
implementations. Since such redundancy is already there in the legacy code,
simply merging Poisson and Gamma does not seem to help much. So I suggest
we just leave them as separate classes for the time being.

Best regards,

Xiaokai

On Jun 27, 2014, at 6:45 PM, Gang Bai [via Apache Spark Developers List]
wrote:

Hi Xiaokai,

My bad. I didn't notice this before I created another PR for Poisson
regression. The mails were buried in junk by the corp mail master. Also,
thanks for considering my comments and advice in your PR.

Adding my two cents here:

* PoissonRegressionModel and GammaRegressionModel have the same fields
and prediction method. Shall we use one instead of two redundant classes?
Say, a LogLinearModel.
* The LBFGS optimizer takes fewer iterations and results in better
convergence than SGD. I implemented two GeneralizedLinearAlgorithm classes
using LBFGS and SGD respectively. You may take a look into it. If it's OK
to you, I'd be happy to send a PR to your branch.
* In addition to the generated test data, We may use some real-world
data for testing. In my implementation, I added the test data from
https://onlinecourses.science.psu.edu/stat504/node/223. Please check my
test suite.

-Gang
Sent from my iPad

On 2014年6月27日, at 下午6:03, xwei [hidden email] wrote:

Yes, that's what we did: adding two gradient functions to
Gradient.scala and
create PoissonRegression and GammaRegression using these gradients. We
made
a PR on this.

--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html
Sent from the Apache Spark Developers List mailing list archive at
Nabble.com.

If you reply to this email, your message will be added to the
discussion below:

http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7107.html
To unsubscribe from Contributing to MLlib on GLM, click here.
NAML

--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7117.html

Sent from the Apache Spark Developers List mailing list archive at
Nabble.com.

--
If you reply to this email, your message will be added to the discussion
below:

http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7131.html
To unsubscribe from Contributing to MLlib on GLM, click here
http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=7033code=d2VpeGlhb2thaUBnbWFpbC5jb218NzAzM3w2NTc5NDUzMzA=
.
NAML
http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7197.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib on GLM

2014-06-30 Thread Gang Bai

Thanks Xiaokai,

I’ve created a pull request to merge features in my PR to your repo. Please
take a review here https://github.com/xwei-datageek/spark/pull/2 .

As for GLMs, here at Sina, we are solving the problem of predicting the num of
visitors who read a particular news article or watch an online sports live
stream in a particular period. I’m trying to improve the prediction results by
tuning features and incorporating new models. So I’ll try Gamma regression
later. Thanks for the implementation.

Cheers,
-Gang

On Jun 29, 2014, at 8:17 AM, xwei weixiao...@gmail.com wrote:

Hi Gang,

No worries!

I agree LBFGS would converge faster and your test suite is more
comprehensive. I'd like to merge my branch with yours.

I also agree with your viewpoint on the redundancy issue. For different GLMs,
usually they only differ in gradient calculation but the regression.scala
files are quite similar. For example, linearRegressionSGD,
logisticRegressionSGD, RidgeRegressionSGD, poissonRegressionSGD all share
quite a bit of common code in their class implementations. Since such
redundancy is already there in the legacy code, simply merging Poisson and
Gamma does not seem to help much. So I suggest we just leave them as separate
classes for the time being.

Best regards,

Xiaokai

On Jun 27, 2014, at 6:45 PM, Gang Bai [via Apache Spark Developers List]
wrote:

Hi Xiaokai,

My bad. I didn't notice this before I created another PR for Poisson
regression. The mails were buried in junk by the corp mail master. Also,
thanks for considering my comments and advice in your PR.

Adding my two cents here:

* PoissonRegressionModel and GammaRegressionModel have the same fields and
prediction method. Shall we use one instead of two redundant classes? Say, a
LogLinearModel.
* The LBFGS optimizer takes fewer iterations and results in better
convergence than SGD. I implemented two GeneralizedLinearAlgorithm classes
using LBFGS and SGD respectively. You may take a look into it. If it's OK to
you, I'd be happy to send a PR to your branch.
* In addition to the generated test data, We may use some real-world data
for testing. In my implementation, I added the test data from
https://onlinecourses.science.psu.edu/stat504/node/223. Please check my test
suite.

-Gang
Sent from my iPad

On 2014年6月27日, at 下午6:03, xwei [hidden email] wrote:

Yes, that's what we did: adding two gradient functions to Gradient.scala
and
create PoissonRegression and GammaRegression using these gradients. We made
a PR on this.

If you reply to this email, your message will be added to the discussion
below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7107.html
To unsubscribe from Contributing to MLlib on GLM, click here.
NAML

--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7117.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib on GLM

2014-06-27 Thread 白刚

Hi Xiaokai,

My bad. I didn't notice this before I created another PR for Poisson 
regression. The mails were buried in junk by the corp mail master. Also, thanks 
for considering my comments and advice in your PR.

Adding my two cents here:

* PoissonRegressionModel and GammaRegressionModel have the same fields and 
prediction method. Shall we use one instead of two redundant classes? Say, a 
LogLinearModel.
* The LBFGS optimizer takes fewer iterations and results in better convergence 
than SGD. I implemented two GeneralizedLinearAlgorithm classes using LBFGS and 
SGD respectively. You may take a look into it. If it's OK to you, I'd be happy 
to send a PR to your branch.
* In addition to the generated test data, We may use some real-world data for 
testing. In my implementation, I added the test data from 
https://onlinecourses.science.psu.edu/stat504/node/223. Please check my test 
suite.

-Gang
Sent from my iPad

 On 2014年6月27日, at 下午6:03, xwei weixiao...@gmail.com wrote:
 
 
 Yes, that's what we did: adding two gradient functions to Gradient.scala and
 create PoissonRegression and GammaRegression using these gradients. We made
 a PR on this.
 
 
 
 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib on GLM

2014-06-26 Thread xwei

Yes, that's what we did: adding two gradient functions to Gradient.scala and
create PoissonRegression and GammaRegression using these gradients. We made
a PR on this.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib on GLM

2014-06-25 Thread Sung Hwan Chung

Well, as you said, MLLib already supports GLM in a sense. Except they only
support two link functions - identity (linear regression) and logit
(logistic regression). It should not be too hard to add other link
functions, as all you have to do is add a different gradient function for
Poisson/Gamma, etc - look at Gradient.scala in mllib.


On Tue, Jun 17, 2014 at 5:00 PM, Xiaokai Wei x...@palantir.com wrote:

 Hi,

 I am an intern at PalantirTech and we are building some stuff on top of
 MLlib. In Particular, GLM is of great interest to us.  Though
 GeneralizedLinearModel in MLlib 1.0.0 has some important GLMs such as
 Logistic Regression, Linear Regression, some other important GLMs like
 Poisson Regression are still missing.

 I am curious that if anyone is already working on other GLMs (e.g.
 Poisson, Gamma). If not, we would like to contribute to MLlib on GLM. Is
 adding more GLMs on the roadmap of MLlib?


 Sincerely,

 Xiaokai

Re: Contributing to MLlib on GLM

2014-06-17 Thread Sandy Ryza

Hi Xiaokai,

I think MLLib is definitely interested in supporting additional GLMs.  I'm
not aware of anybody working on this at the moment.

-Sandy


On Tue, Jun 17, 2014 at 5:00 PM, Xiaokai Wei x...@palantir.com wrote:

 Hi,

 I am an intern at PalantirTech and we are building some stuff on top of
 MLlib. In Particular, GLM is of great interest to us.  Though
 GeneralizedLinearModel in MLlib 1.0.0 has some important GLMs such as
 Logistic Regression, Linear Regression, some other important GLMs like
 Poisson Regression are still missing.

 I am curious that if anyone is already working on other GLMs (e.g.
 Poisson, Gamma). If not, we would like to contribute to MLlib on GLM. Is
 adding more GLMs on the roadmap of MLlib?


 Sincerely,

 Xiaokai

Re: Contributing to MLlib on GLM

2014-06-17 Thread Andrew Ash

Hi Xiaokai,

Also take a look through Xiangrui's slides from HadoopSummit a few weeks
back: http://www.slideshare.net/xrmeng/m-llib-hadoopsummit  The roadmap
starting at slide 51 will probably be interesting to you.

Andrew


On Tue, Jun 17, 2014 at 7:37 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Xiaokai,

 I think MLLib is definitely interested in supporting additional GLMs.  I'm
 not aware of anybody working on this at the moment.

 -Sandy


 On Tue, Jun 17, 2014 at 5:00 PM, Xiaokai Wei x...@palantir.com wrote:

  Hi,
 
  I am an intern at PalantirTech and we are building some stuff on top of
  MLlib. In Particular, GLM is of great interest to us.  Though
  GeneralizedLinearModel in MLlib 1.0.0 has some important GLMs such as
  Logistic Regression, Linear Regression, some other important GLMs like
  Poisson Regression are still missing.
 
  I am curious that if anyone is already working on other GLMs (e.g.
  Poisson, Gamma). If not, we would like to contribute to MLlib on GLM. Is
  adding more GLMs on the roadmap of MLlib?
 
 
  Sincerely,
 
  Xiaokai

Re: Contributing to MLlib on GLM

Re: Contributing to MLlib on GLM

Re: Contributing to MLlib on GLM

Re: Contributing to MLlib on GLM

Re: Contributing to MLlib on GLM

Re: Contributing to MLlib on GLM

Re: Contributing to MLlib on GLM

7 matches

Site Navigation

Mail list logo

Footer information