Weichen Xu created SPARK-16638:
----------------------------------

             Summary: The L2 regularization of LinearRegression seems wrong 
when standardization is false
                 Key: SPARK-16638
                 URL: https://issues.apache.org/jira/browse/SPARK-16638
             Project: Spark
          Issue Type: Bug
          Components: ML, Optimizer
            Reporter: Weichen Xu


The original L2 is
0.5 * effectiveL2regParam * sigma( wi^2 )
(wi is the coefficients we want to train)

And in linearRegression code, when standardization == false, the code modify L2 
into:

0.5 * effectiveL2regParam * sigma( ( wi / featuresStd(i) )^2 )

It is obviously wrong, I think.

As the purpose of author wrote in the code comment, the modification to L2 reg 
should be:

0.5 * effectiveL2regParam * sigma( ( wi * featuresStd(i) )^2 )

wi should not be divided by featuresStd(i), but should be multiplied by 
featuresStd(i)

We can simply think this problem in the following way:

assume a training data, there is a dimension k with very large std ( the value 
of featuresStd(k) is very large), so, we hope the training result coefficient 
w(k) to be small to keep the numerical stability. As a way of that, we can add 
a penalty to the L2 reg on this dimension.
so that it should be surely w(k) * featuresStd(k), not  w(k) / featuresStd(k)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to