Github user jkbradley commented on a diff in the pull request:
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -69,19 +69,57 @@ private[regression] trait LinearRegressionParams 
extends PredictorParams
         "The solver algorithm for optimization. Supported options: " +
           s"${supportedSolvers.mkString(", ")}. (Default auto)",
    +  /**
    +   * The loss function to be optimized.
    +   * Supported options: "leastSquares" and "huber".
    +   * Default: "leastSquares"
    +   *
    +   * @group param
    +   */
    +  @Since("2.3.0")
    +  final override val loss: Param[String] = new Param[String](this, "loss", 
"The loss function to" +
    +    s" be optimized. Supported options: ${supportedLosses.mkString(", ")}. 
(Default leastSquares)",
    +    ParamValidators.inArray[String](supportedLosses))
    +  /**
    +   * The shape parameter to control the amount of robustness. Must be > 
    +   * At larger values of epsilon, the huber criterion becomes more similar 
to least squares
    +   * regression; for small values of epsilon, the criterion is more 
similar to L1 regression.
    +   * Default is 1.35 to get as much robustness as possible while retaining
    +   * 95% statistical efficiency for normally distributed data.
    +   * Only valid when "loss" is "huber".
    +   */
    +  @Since("2.3.0")
    +  final val epsilon = new DoubleParam(this, "epsilon", "The shape 
parameter to control the " +
    +    "amount of robustness. Must be > 1.0.",
    +  /** @group getParam */
    +  @Since("2.3.0")
    +  def getEpsilon: Double = $(epsilon)
    +  override protected def validateAndTransformSchema(
    +      schema: StructType,
    +      fitting: Boolean,
    +      featuresDataType: DataType): StructType = {
    +    if ($(loss) == Huber) {
    +      require($(solver)!= Normal, "LinearRegression with huber loss 
doesn't support " +
    +        "normal solver, please change solver to auto or l-bfgs.")
    +      require($(elasticNetParam) == 0.0, "LinearRegression with huber loss 
only supports " +
    +        s"L2 regularization, but got elasticNetParam = 
    +    }
    +    super.validateAndTransformSchema(schema, fitting, featuresDataType)
    +  }
      * Linear regression.
    - * The learning objective is to minimize the squared error, with 
    - * The specific squared error loss function used is:
    - *
    - * <blockquote>
    - *    $$
    - *    L = 1/2n ||A coefficients - y||^2^
    - *    $$
    - * </blockquote>
    + * The learning objective is to minimize the specified loss function, with 
    + * This supports two loss functions:
    + *  - leastSquares (a.k.a squared loss)
    --- End diff --
    Let's keep exact specifications of the losses being used.  This is one of 
my big annoyances with many ML libraries: It's hard to tell exactly what loss 
is being used, which makes it hard to compare/validate results across different 
ML libraries.
    It'd also be nice to make it clear what we mean by "huber," in particular 
that we estimate the scale parameter from data.


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to