[ https://issues.apache.org/jira/browse/SPARK-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gang Bai closed SPARK-2303. --------------------------- Resolution: Fixed > Poisson regression model for count data > --------------------------------------- > > Key: SPARK-2303 > URL: https://issues.apache.org/jira/browse/SPARK-2303 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Gang Bai > > Modeling count data is of great importance in solving real-world statistic > problems. Currently mllib.regression provides models mostly for numeric data, > i.e fitting curves with various regularization on resulted weights, but still > lacks the support of count data modeling. > A very basic model for this is the Poisson regression. Following the patterns > in mllib and reusing the components, we address the parameter estimation for > Poisson regression in a maximum likelihood manner. In detail, to add Poisson > regression to mllib.regression, we need to: > # Add the gradient of the negative log-likelihood into > mllib/optimization/Gradients.scala. > # Add the implementations of PoissonRegressionModel, which extends > GeneralizedLinearModel with RegressionModel. Here we need the implementation > of the predict method. > # Add the implementations of the generalized linear algorithm class. Here we > can use either LBFGS or GradientDescent as the optimizer. So we implement > both as class PoissonRegressionWithSGD and class PoissonRegressionWithLBFGS > respectively. > # Add the companion object PoissonRegressionWithSGD and > PoissonRegressionWithLBFGS as drivers. > # Test suites > ## Test the gradient computation. > ## Test the regression method using generated data, which requires a > PoissonRegressionDataGenerator. > ## Test the regression method using a real-world data set. > # Add the documents. -- This message was sent by Atlassian JIRA (v6.2#6252)