GitHub user actuaryzhang reopened a pull request:

    https://github.com/apache/spark/pull/16344

    [SPARK-18929][ML] Add Tweedie distribution in GLM

    ## What changes were proposed in this pull request?
    I propose to add the full Tweedie family into the 
GeneralizedLinearRegression model. The Tweedie family is characterized by a 
power variance function. Currently supported distributions such as Gaussian, 
Poisson and Gamma families are a special case of the Tweedie 
https://en.wikipedia.org/wiki/Tweedie_distribution.
    
    @yanboliang @srowen @sethah 
    
    I propose to add support for the other distributions:
    - compound Poisson: 1 < varPower < 2. This one is widely used to model 
zero-inflated continuous distributions, e.g., in insurance, finance, ecology, 
meteorology, advertising etc.
    - positive stable: varPower > 2 and varPower != 3. Used to model extreme 
values.
    - inverse Gaussian: varPower = 3.
    
    The Tweedie family is supported in most statistical packages such as R 
(statmod), SAS, h2o etc.
    
    Changes made:
    - Allow `tweedie` in family. Only `identity` and `log` links are allowed 
for now. 
    - Add `varPower` to `GeneralizedLinearRegressionBase`, which takes values 
in (1, 2) and (2, infty). Also set default value to 1.5 and add getter method.
    - Add `Tweedie` class
    - Add tests for tweedie GLM
    
    Note:
    - In computing deviance, use `math.max(y, 0.1)` to avoid taking inverse of 
0. This is the same as in R: `tweedie()$dev.res`
    - `aic` is not supported in this PR because the evaluation of the [Tweedie 
density](http://www.statsci.org/smyth/pubs/tweediepdf-series-preprint.pdf) in 
these cases are non-trivial. I will implement the density approximation method 
in a future PR.  R returns `null` (see `tweedie()$aic`).


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/actuaryzhang/spark tweedie

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16344.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16344
    
----
commit 952887e485fb0d5fa669b3b4c9289b8069ee7769
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-16T00:50:51Z

    Add Tweedie family to GLM

commit 4f184ec458f5ed7d70bc5b8165481425f911d2a3
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-19T22:50:02Z

    Fix calculation in dev resid; Add test for different var power

commit 7fe39106332663d3671b94a8ffac48ca61c48470
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-19T23:14:37Z

    Merge test into GLR

commit bfcc4fb08d54156efc66b90d14c62ea7ff172afa
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-20T22:59:05Z

    Use Tweedie class instead of global object Tweedie; change variancePower to 
varPower

commit a8feea7d8095170c1b5f18b7887f16a6d763e42c
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-21T23:42:40Z

    Allow Family to use GLRBase object directly

commit 233e2d338be8d36a74eaf578bfea804ae3617d4e
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-22T01:56:34Z

    Add TweedieFamily and implement specific distn within Tweedie

commit 17c55816c914bc96a8b5141356e3c117f343f303
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-22T04:39:54Z

    Clean up doc

commit 0b41825e99020976a34d8fe9c983f26de6c8c40f
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-22T17:52:01Z

    Move defaultLink and name to subclass of TweedieFamily

commit 6e8e60771afb4abe43e47c7fe186bad1541a8fac
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-22T18:10:51Z

    Change style for AIC

commit 8d7d34e258f9c7c03c80754d837ce847fcb0526e
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-23T19:10:20Z

    Rename Family methods and restore methods for tweedie subclasses

commit 6da7e3068e2c45a0faf7ff35c10b2750784d765e
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-23T19:12:25Z

    Update test

commit 9a71e89f629260c775922901a04c989f36ea4946
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-27T17:16:40Z

    Clean up doc

commit f461c09e65360f695ad3092b41bc26e0c61bbd95
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-27T22:18:39Z

    Put delta in Tweedie companion object

commit a839c4631dd17c4f3d0a0cc99e1b0af81419dda4
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2016-12-27T22:23:57Z

    Clean up doc

commit fab265278109eede4cce7ee506e8b29d481c4549
Author: actuaryzhang <actuaryzhan...@gmail.com>
Date:   2017-01-05T19:32:06Z

    Allow more link functions in tweedie

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to