[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127437#comment-15127437
 ] 

Imran Younus edited comment on SPARK-11918 at 2/2/16 2:12 AM:
--------------------------------------------------------------

Several columns in the given dataset contain only zeros. In this case, the data 
matrix is no full rank. Therefore the Gramian matrix is singular and hence not 
invertible. The Cholesky decomposition will fail in this case.

This will also happen if standard deviation of more than one columns is zero 
(even if the values are not zero).

I think we should catch this error in the code and exit with a warning message.
 OR we can drop columns with zero variance, and continue with the algorithm.


was (Author: iyounus):
Several columns in the given dataset contain only zeros. In this case, the data 
matrix is no full rank. Therefore the Gramian matrix is singular and hence not 
invertible. The Cholesky decomposition will fail in this case.

This will also happen if standard deviation of more than one columns is zero 
(even if the values are not zero).

I think we should catch this error in the code and exit with a warning message.


> WLS can not resolve some kinds of equation
> ------------------------------------------
>
>                 Key: SPARK-11918
>                 URL: https://issues.apache.org/jira/browse/SPARK-11918
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yanbo Liang
>            Priority: Minor
>              Labels: starter
>         Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed (But "l-bfgs" can train and get the 
> model). The failure is caused by the underneath lapack library return error 
> value when Cholesky decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>       at scala.Predef$.assert(Predef.scala:179)
>       at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>       at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>       at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>       at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>       at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to