Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13139#discussion_r64328639 --- Diff: docs/ml-classification-regression.md --- @@ -374,6 +374,154 @@ regression model and extracting model summary statistics. </div> +## Generalized linear regression + +Contrasted with linear regression where the output is assumed to follow a Gaussian +distribution, [generalized linear models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) are specifications of linear models where the response variable $Y_i$ may take on _any_ +distribution from the [exponential family of distributions](https://en.wikipedia.org/wiki/Exponential_family). +Spark's `GeneralizedLinearRegression` interface +allows for flexible specification of GLMs which can be used for various types of +prediction problems including linear regression, Poisson regression, logistic regression, and others. +Currently in `spark.ml`, only a subset of the exponential family distributions are supported and they are listed +[below](#available-families). + +**NOTE**: Spark currently only supports up to 4096 features for GLM models, and will throw an exception if this +constraint is exceeded. See the [optimization section](#optimization) for more details. + +In a GLM the resonse variable $Y_i$ is assumed to be drawn from an exponential family distribution: + +$$ +Y_i \sim f\left(\cdot|\theta_i, \phi, w_i\right) --- End diff -- Same for any other notation
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org