Re: Contributing to MLlib on GLM

Gang Bai Mon, 07 Jul 2014 19:33:33 -0700

Poisson and Gamma regressions for modeling count data are definitely important 
in spark.mllib.regression. So don’t worry. Let’s change the updater to 
SquaredL2Updater as we discussed in the PR. Then we can ask Jenkins to run the 
test.


On Jul 8, 2014, at 3:00 AM, xwei <[email protected]> wrote:

> Hi Gang,
> 
> No admin is looking at our patch:( do you have some suggestions so that our
> patch can get noticed by the admin?
> 
> Best regards,
> 
> Xiaokai
> 
> 
> On Mon, Jun 30, 2014 at 8:18 PM, Gang Bai [via Apache Spark Developers
> List] <[email protected]> wrote:
> 
>> Thanks Xiaokai,
>> 
>> I’ve created a pull request to merge features in my PR to your repo.
>> Please take a review here https://github.com/xwei-datageek/spark/pull/2 .
>> 
>> As for GLMs, here at Sina, we are solving the problem of predicting the
>> num of visitors who read a particular news article or watch an online
>> sports live stream in a particular period. I’m trying to improve the
>> prediction results by tuning features and incorporating new models. So I’ll
>> try Gamma regression later. Thanks for the implementation.
>> 
>> Cheers,
>> -Gang
>> 
>> On Jun 29, 2014, at 8:17 AM, xwei <[hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7131&i=0>> wrote:
>> 
>>> Hi Gang,
>>> 
>>> No worries!
>>> 
>>> I agree LBFGS would converge faster and your test suite is more
>> comprehensive. I'd like to merge my branch with yours.
>>> 
>>> I also agree with your viewpoint on the redundancy issue. For different
>> GLMs, usually they only differ in gradient calculation but the
>> ****regression.scala files are quite similar. For example,
>> linearRegressionSGD, logisticRegressionSGD, RidgeRegressionSGD,
>> poissonRegressionSGD all share quite a bit of common code in their class
>> implementations. Since such redundancy is already there in the legacy code,
>> simply merging Poisson and Gamma does not seem to help much. So I suggest
>> we just leave them as separate classes for the time being.
>>> 
>>> 
>>> Best regards,
>>> 
>>> Xiaokai
>>> 
>>> On Jun 27, 2014, at 6:45 PM, Gang Bai [via Apache Spark Developers List]
>> wrote:
>>> 
>>>> Hi Xiaokai,
>>>> 
>>>> My bad. I didn't notice this before I created another PR for Poisson
>> regression. The mails were buried in junk by the corp mail master. Also,
>> thanks for considering my comments and advice in your PR.
>>>> 
>>>> Adding my two cents here:
>>>> 
>>>> * PoissonRegressionModel and GammaRegressionModel have the same fields
>> and prediction method. Shall we use one instead of two redundant classes?
>> Say, a LogLinearModel.
>>>> * The LBFGS optimizer takes fewer iterations and results in better
>> convergence than SGD. I implemented two GeneralizedLinearAlgorithm classes
>> using LBFGS and SGD respectively. You may take a look into it. If it's OK
>> to you, I'd be happy to send a PR to your branch.
>>>> * In addition to the generated test data, We may use some real-world
>> data for testing. In my implementation, I added the test data from
>> https://onlinecourses.science.psu.edu/stat504/node/223. Please check my
>> test suite.
>>>> 
>>>> -Gang
>>>> Sent from my iPad
>>>> 
>>>>> On 2014年6月27日, at 下午6:03, "xwei" <[hidden email]> wrote:
>>>>> 
>>>>> 
>>>>> Yes, that's what we did: adding two gradient functions to
>> Gradient.scala and
>>>>> create PoissonRegression and GammaRegression using these gradients. We
>> made
>>>>> a PR on this.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html
>>>>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>>> 
>>>> 
>>>> If you reply to this email, your message will be added to the
>> discussion below:
>>>> 
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7107.html
>>>> To unsubscribe from Contributing to MLlib on GLM, click here.
>>>> NAML
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7117.html
>> 
>>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>> 
>> 
>> 
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>> 
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7131.html
>> To unsubscribe from Contributing to MLlib on GLM, click here
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7033&code=d2VpeGlhb2thaUBnbWFpbC5jb218NzAzM3w2NTc5NDUzMzA=>
>> .
>> NAML
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7197.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib on GLM

Reply via email to