[ https://issues.apache.org/jira/browse/SPARK-19184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-19184: ------------------------------------ Assignee: Apache Spark > Improve numerical stability for method tallSkinnyQR. > ---------------------------------------------------- > > Key: SPARK-19184 > URL: https://issues.apache.org/jira/browse/SPARK-19184 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 2.2.0 > Reporter: Huamin Li > Assignee: Apache Spark > Priority: Minor > Labels: None > > In method tallSkinnyQR, the final Q is calculated by A * inv(R) ([Github > Link|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L562]). > When the upper triangular matrix R is ill-conditioned, computing the inverse > of R can result in catastrophic cancellation. Instead, we should consider > using a forward solve for solving Q such that Q * R = A. > I first create a 4 by 4 RowMatrix A = > (1,1,1,1;0,1E-5,0,0;0,0,1E-10,1;0,0,0,1E-14), and then I apply method > tallSkinnyQR to A to find RowMatrix Q and Matrix R such that A = Q*R. In this > case, A is ill-conditioned and so is R. > See codes in Spark Shell: > {code:none} > import org.apache.spark.mllib.linalg.{Matrices, Vector, Vectors} > import org.apache.spark.mllib.linalg.distributed.RowMatrix > // Create RowMatrix A. > val mat = Seq(Vectors.dense(1,1,1,1), Vectors.dense(0, 1E-5, 1,1), > Vectors.dense(0,0,1E-10,1), Vectors.dense(0,0,0,1E-14)) > val denseMat = new RowMatrix(sc.parallelize(mat, 2)) > // Apply tallSkinnyQR to A. > val result = denseMat.tallSkinnyQR(true) > // Print the calculated Q and R. > result.Q.rows.collect.foreach(println) > result.R > // Calculate Q*R. Ideally, this should be close to A. > val reconstruct = result.Q.multiply(result.R) > reconstruct.rows.collect.foreach(println) > // Calculate Q'*Q. Ideally, this should be close to the identity matrix. > result.Q.computeGramianMatrix() > System.exit(0) > {code} > it will output the following results: > {code:none} > scala> result.Q.rows.collect.foreach(println) > [1.0,0.0,0.0,1.5416524685312E13] > [0.0,0.9999999999999999,0.0,8011776.0] > [0.0,0.0,1.0,0.0] > [0.0,0.0,0.0,1.0] > scala> result.R > 1.0 1.0 1.0 1.0 > 0.0 1.0E-5 1.0 1.0 > 0.0 0.0 1.0E-10 1.0 > 0.0 0.0 0.0 1.0E-14 > scala> reconstruct.rows.collect.foreach(println) > [1.0,1.0,1.0,1.15416524685312] > [0.0,9.999999999999999E-6,0.9999999999999999,1.00000008011776] > [0.0,0.0,1.0E-10,1.0] > [0.0,0.0,0.0,1.0E-14] > scala> result.Q.computeGramianMatrix() > 1.0 0.0 0.0 1.5416524685312E13 > 0.0 0.9999999999999998 0.0 8011775.999999999 > 0.0 0.0 1.0 0.0 > 1.5416524685312E13 8011775.999999999 0.0 2.3766923337289844E26 > {code} > With forward solve for solving Q such that Q * R = A rather than computing > the inverse of R, it will output the following results instead: > {code:none} > scala> result.Q.rows.collect.foreach(println) > [1.0,0.0,0.0,0.0] > [0.0,1.0,0.0,0.0] > [0.0,0.0,1.0,0.0] > [0.0,0.0,0.0,1.0] > scala> result.R > 1.0 1.0 1.0 1.0 > 0.0 1.0E-5 1.0 1.0 > 0.0 0.0 1.0E-10 1.0 > 0.0 0.0 0.0 1.0E-14 > scala> reconstruct.rows.collect.foreach(println) > [1.0,1.0,1.0,1.0] > [0.0,1.0E-5,1.0,1.0] > [0.0,0.0,1.0E-10,1.0] > [0.0,0.0,0.0,1.0E-14] > scala> result.Q.computeGramianMatrix() > 1.0 0.0 0.0 0.0 > 0.0 1.0 0.0 0.0 > 0.0 0.0 1.0 0.0 > 0.0 0.0 0.0 1.0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org