Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15365#discussion_r83121575 --- Diff: R/pkg/R/mllib.R --- @@ -647,6 +654,195 @@ setMethod("predict", signature(object = "KMeansModel"), predict_internal(object, newData) }) +#' Logistic Regression Model +#' +#' Fits an logistic regression model against a Spark DataFrame. It supports "binomial": Binary logistic regression +#' with pivoting; "multinomial": Multinomial logistic (softmax) regression without pivoting, similar to glmnet. +#' Users can print, make predictions on the produced model and save the model to the input path. +#' +#' @param data SparkDataFrame for training +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' operators are supported, including '~', '.', ':', '+', and '-'. +#' @param regParam the regularization parameter. Default is 0.0. +#' @param elasticNetParam the ElasticNet mixing parameter. For alpha = 0, the penalty is an L2 penalty. +#' For alpha = 1, it is an L1 penalty. For 0 < alpha < 1, the penalty is a combination +#' of L1 and L2. Default is 0.0 which is an L2 penalty. +#' @param maxIter maximum iteration number. +#' @param tol convergence tolerance of iterations. +#' @param fitIntercept whether to fit an intercept term. Default is TRUE. +#' @param family the name of family which is a description of the label distribution to be used in the model. +#' Supported options: +#' - "auto": Automatically select the family based on the number of classes: +#' If numClasses == 1 || numClasses == 2, set to "binomial". +#' Else, set to "multinomial". +#' - "binomial": Binary logistic regression with pivoting. +#' - "multinomial": Multinomial logistic (softmax) regression without pivoting. +#' Default is "auto". +#' @param standardization whether to standardize the training features before fitting the model. The coefficients +#' of models will be always returned on the original scale, so it will be transparent for +#' users. Note that with/without standardization, the models should be always converged +#' to the same solution when no regularization is applied. Default is TRUE, same as glmnet. +#' @param threshold in binary classification, in range [0, 1]. If the estimated probability of class label 1 +#' is > threshold, then predict 1, else 0. A high threshold encourages the model to predict 0 +#' more often; a low threshold encourages the model to predict 1 more often. Note: Setting this with +#' threshold p is equivalent to setting thresholds (Array(1-p, p)). When threshold is set, any user-set +#' value for thresholds will be cleared. If both threshold and thresholds are set, then they must be +#' equivalent. Default is 0.5. +#' @param thresholds in multiclass (or binary) classification to adjust the probability of predicting each class. +#' Array must have length equal to the number of classes, with values > 0, excepting that at most one --- End diff -- Modified to `c(p, 1-p)`
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org