[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r116614596 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -148,7 +154,8 @@ sealed trait Matrix extends Serializable { * and column indices respectively with the type `Int`, and the final parameter is the * corresponding value in the matrix with type `Double`. */ - private[spark] def foreachActive(f: (Int, Int, Double) => Unit) + @Since("2.2.0") + def foreachActive(f: (Int, Int, Double) => Unit): Unit --- End diff -- That's a good point. I guess we can leave it until someone complains & add a Java-friendly one as needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r116603167 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -148,7 +154,8 @@ sealed trait Matrix extends Serializable { * and column indices respectively with the type `Int`, and the final parameter is the * corresponding value in the matrix with type `Double`. */ - private[spark] def foreachActive(f: (Int, Int, Double) => Unit) + @Since("2.2.0") + def foreachActive(f: (Int, Int, Double) => Unit): Unit --- End diff -- I think we should do the same for `foreachActive` in Vector which was already a public api long time ago. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r116596231 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -148,7 +154,8 @@ sealed trait Matrix extends Serializable { * and column indices respectively with the type `Int`, and the final parameter is the * corresponding value in the matrix with type `Double`. */ - private[spark] def foreachActive(f: (Int, Int, Double) => Unit) + @Since("2.2.0") + def foreachActive(f: (Int, Int, Double) => Unit): Unit --- End diff -- @sethah @dbtsai Hi all, just saw this during QA. This method is not very Java-friendly. I'm OK with adding it as long as we document the fact that it's not Java-friendly. We could also consider adding a Java-friendly version, perhaps using https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/function/Function2.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r109616018 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) + var j = 0 --- End diff -- I see. `DenseMatrix.update()` modifies the contexts of `value`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15628 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107972466 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -1079,4 +1267,15 @@ object Matrices { SparseMatrix.fromCOO(numRows, numCols, entries) } } + + private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = { +// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 12 + 12 + 12 + 1 +12L * numActives + 4L * numPtrs + 37L --- End diff -- Nice that we can get `37L` using java apis to ensure the portability. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107963752 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -1079,4 +1267,15 @@ object Matrices { SparseMatrix.fromCOO(numRows, numCols, entries) } } + + private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = { +// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 12 + 12 + 12 + 1 +12L * numActives + 4L * numPtrs + 37L + } + + private[ml] def getDenseSize(numCols: Long, numRows: Long): Long = { +// 8 * values.length + 12 + 1 --- End diff -- Can you document what is the magical number `12 + 1`? Also, can we make it `java.lang.Double.BYTES * numCols * numRows + 13L` since the size of primitive type can depend on the implementation of JVM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107961128 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +168,116 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix while maintaining the layout of the current matrix. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = !isTransposed) --- End diff -- `def toSparse: SparseMatrix = toSparseMatrix(colMajor = isColMajor)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107966467 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -1079,4 +1267,15 @@ object Matrices { SparseMatrix.fromCOO(numRows, numCols, entries) } } + + private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = { +// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 12 + 12 + 12 + 1 +12L * numActives + 4L * numPtrs + 37L --- End diff -- `(java.lang.Double.BYTES + java.lang.Integer.BYTES) * numActives + java.lang.Integer.BYTES * numPtrs + 37L` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107961223 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +168,116 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix while maintaining the layout of the current matrix. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = !isTransposed) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix while maintaining the layout of the current matrix. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = !isTransposed) --- End diff -- `def toDense: DenseMatrix = toDenseMatrix(colMajor = isColMajor)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107954866 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107844935 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107842023 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107831775 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (isTransposed && colMajor) { + new DenseMatrix(numRows, numCols, toArray, isTransposed = false) +} else if (!isTransposed && !colMajor) { + new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = true) +} else { + this } -new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) } --- End diff -- Sounds good. Let's do it in another PR. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107832496 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor +} else { + toSparseColMajor +} + } + + /** + * Returns a matrix in dense or sparse row major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedRowMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) { + toDenseRowMajor +} else { + toSparseRowMajor +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(colMajor = true) +val csrSize = getSparseSizeInBytes(colMajor = false) +if (getDenseSizeInBytes < math.min(cscSize, csrSize)) { + // dense matrix size is the same for column major and row major, so maintain current layout + toDenseMatrix(!isTransposed) +} else { + if (cscSize <= csrSize) { +toSparseMatrix(colMajor = true) + } else { +toSparseMatrix(colMajor = false) --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107836197 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107836216 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107837259 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107837491 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107834704 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) --- End diff -- Can you group the tests either by `dm1` and `dm2` or by the same methods? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107832751 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -587,18 +720,67 @@ class SparseMatrix @Since("2.0.0") ( } } + override def numNonzeros: Int = values.count(_ != 0) + + override def numActives: Int = values.length + /** - * Generate a `DenseMatrix` from the given `SparseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit zero values if they + * exist. + * + * @param colMajor Whether or not the resulting `SparseMatrix` values are in column major + *order. */ - @Since("2.0.0") - def toDense: DenseMatrix = { -new DenseMatrix(numRows, numCols, toArray) + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!isTransposed && !colMajor) { + // it is row major and we want col major, use breeze to remove explicit zeros + val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]].t + Matrices.fromBreeze(breezeTransposed).transpose.asInstanceOf[SparseMatrix] +} else if (isTransposed && colMajor) { + // it is col major and we want row major, use breeze to remove explicit zeros + val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]] + Matrices.fromBreeze(breezeTransposed).asInstanceOf[SparseMatrix] +} else { + val nnz = numNonzeros + if (nnz != numActives) { +// remove explicit zeros +val rr = new Array[Int](nnz) +val vv = new Array[Double](nnz) +val numPtrs = if (isTransposed) numRows else numCols +val cc = new Array[Int](numPtrs + 1) +var nzIdx = 0 +var j = 0 +while (j < numPtrs) { + var idx = colPtrs(j) + val idxEnd = colPtrs(j + 1) + cc(j) = nzIdx + while (idx < idxEnd) { +if (values(idx) != 0.0) { + vv(nzIdx) = values(idx) + rr(nzIdx) = rowIndices(idx) + nzIdx += 1 +} +idx += 1 + } + j += 1 +} +cc(j) = nnz +new SparseMatrix(numRows, numCols, cc, rr, vv, isTransposed = isTransposed) + } else { +this + } +} } - override def numNonzeros: Int = values.count(_ != 0) - - override def numActives: Int = values.length + /** + * Generate a `DenseMatrix` from the given `SparseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values are in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (colMajor) new DenseMatrix(numRows, numCols, toArray) --- End diff -- `new DenseMatrix(numRows, numCols, this.toArray, isTransposed = false)` to make the style consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107836205 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107835539 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107832900 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = !isTransposed) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = !isTransposed) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor --- End diff -- nit, with `this` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107837448 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107835746 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107837113 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107836138 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107836155 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107835989 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107835326 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) --- End diff -- `val dm4 = dm1.toDenseRowMajor` and `val dm7 = dm1.toDenseRowMajor` are the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107835213 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + --- End diff -- You tested `dm1.toSparseColMajor` twice. Will be nice to group them like ```scala val sm1 = dm1.toSparseColMajor val sm2 = dm2.toSparseColMajor val sm3 = dm3.toSparseColMajor val sm4 = dm1.toSparseRowMajor val sm5 = dm2.toSparseRowMajor val sm6 = dm3.toSparseRowMajor val sm7 = dm1.toSparse val sm8 = dm2.toSparse val sm9 = dm3.toSparse ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107832481 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor +} else { + toSparseColMajor +} + } + + /** + * Returns a matrix in dense or sparse row major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedRowMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) { + toDenseRowMajor +} else { + toSparseRowMajor +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(colMajor = true) +val csrSize = getSparseSizeInBytes(colMajor = false) +if (getDenseSizeInBytes < math.min(cscSize, csrSize)) { + // dense matrix size is the same for column major and row major, so maintain current layout + toDenseMatrix(!isTransposed) +} else { + if (cscSize <= csrSize) { +toSparseMatrix(colMajor = true) --- End diff -- ditto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107837410 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107832469 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor +} else { + toSparseColMajor +} + } + + /** + * Returns a matrix in dense or sparse row major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedRowMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) { + toDenseRowMajor +} else { + toSparseRowMajor +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(colMajor = true) +val csrSize = getSparseSizeInBytes(colMajor = false) +if (getDenseSizeInBytes < math.min(cscSize, csrSize)) { + // dense matrix size is the same for column major and row major, so maintain current layout + toDenseMatrix(!isTransposed) --- End diff -- I don't see the change here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107835655 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseRowMajor +assert(dm4 === dm1) +assert(dm4.isTransposed) +assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm5 = dm2.toDenseColMajor +assert(dm5 === dm2) +assert(!dm5.isTransposed) +assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0)) + +val dm6 = dm2.toDenseRowMajor +assert(dm6 === dm2) +assert(dm6.isTransposed) +assert(dm6.values.equals(dm2.values)) + +val dm7 = dm1.toDenseRowMajor +assert(dm7 === dm1) +assert(dm7.isTransposed) +assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0)) + +val dm8 = dm1.toDenseColMajor +assert(dm8 === dm1) +assert(!dm8.isTransposed) +assert(dm8.values.equals(dm1.values)) + +val dm9 = dm2.toDense +assert(dm9 === dm2) +assert(dm9.isTransposed) +assert(dm9.values.equals(dm2.values)) + } -val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values) -val deMat1 = new DenseMatrix(m, n, allValues) + test("dense to sparse") { +/* + dm1 = 0.0 4.0 5.0 +0.0 2.0 0.0 + + dm2 = 0.0 4.0 5.0 +0.0 2.0 0.0 -val spMat2 = deMat1.toSparse -val deMat2 = spMat1.toDense + dm3 = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0)) +val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), isTransposed = true) +val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)) + +val sm1 = dm1.toSparseColMajor +assert(sm1 === dm1) +assert(!sm1.isTransposed) +assert(sm1.values === Array(4.0, 2.0, 5.0)) + +val sm2 = dm1.toSparseRowMajor +assert(sm2 === dm1) +assert(sm2.isTransposed) +assert(sm2.values === Array(4.0, 5.0, 2.0)) + +val sm3 = dm2.toSparseColMajor +assert(sm3 === dm2) +assert(!sm3.isTransposed) +assert(sm3.values === Array(4.0, 2.0, 5.0)) + +val sm4 = dm2.toSparseRowMajor +assert(sm4 === dm2) +assert(sm4.isTransposed) +assert(sm4.values === Array(4.0, 5.0, 2.0)) + +val sm5 = dm3.toSparseColMajor +assert(sm5 === dm3) +assert(sm5.values === Array.empty[Double]) +assert(!sm5.isTransposed) + +val sm6 = dm3.toSparseRowMajor +assert(sm6 === dm3) +assert(sm6.values === Array.empty[Double]) +assert(sm6.isTransposed) + +val sm7 = dm1.toSparse +assert(sm7 === dm1) +assert(sm7.values === Array(4.0, 2.0, 5.0)) +assert(!sm7.isTransposed) + +val sm8 = dm1.toSparseColMajor +assert(sm8 === dm1) +assert(sm8.values === Array(4.0, 2.0, 5.0)) +assert(!sm8.isTransposed) + +val sm9 = dm2.toSparseRowMajor +assert(sm9 === dm2) +assert(sm9.values === Array(4.0, 5.0, 2.0)) +assert(sm9.isTransposed) + +val sm10 = dm2.toSparse +assert(sm10 === dm2) +assert(sm10.values === Array(4.0, 5.0, 2.0)) +assert(sm10.isTransposed) + } + + test("sparse to sparse") { +/* + sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0 + 0.0 2.0 0.0 + smZeros = 0.0 0.0 0.0 +0.0 0.0 0.0 + */ +val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), Array(4.0, 2.0, 5.0)) +val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), Array(4.0, 5.0, 2.0), + isTransposed = true) +val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107832867 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor +} else { + toSparseColMajor +} + } + + /** + * Returns a matrix in dense or sparse row major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedRowMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) { + toDenseRowMajor +} else { + toSparseRowMajor +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(colMajor = true) +val csrSize = getSparseSizeInBytes(colMajor = false) +if (getDenseSizeInBytes < math.min(cscSize, csrSize)) { + // dense matrix size is the same for column major and row major, so maintain current layout + toDenseMatrix(!isTransposed) --- End diff -- `this.toDense` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107832826 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (isTransposed && colMajor) { + new DenseMatrix(numRows, numCols, toArray, isTransposed = false) +} else if (!isTransposed && !colMajor) { + new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = true) --- End diff -- I'll call `this.toArray`, and `this.transpose.toArray` as you did in other place to make it explicit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107832982 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ --- End diff -- Ok, I added these methods. I updated the test suites to use them instead of `isTransposed`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107803088 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (isTransposed && colMajor) { + new DenseMatrix(numRows, numCols, toArray, isTransposed = false) +} else if (!isTransposed && !colMajor) { + new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = true) +} else { + this } -new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) } --- End diff -- I have to say I'm a bit surprised - I thought that's what `toArray` already did! Yeah, that seems like a good change, but I'd prefer to do it in another pr because we need to make sure that this doesn't adversely affect other places that use `toArray` as well as adding unit tests. If that sounds ok, I'll make a JIRA for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107801091 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor +} else { + toSparseColMajor +} + } + + /** + * Returns a matrix in dense or sparse row major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedRowMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) { + toDenseRowMajor +} else { + toSparseRowMajor +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(colMajor = true) +val csrSize = getSparseSizeInBytes(colMajor = false) +if (getDenseSizeInBytes < math.min(cscSize, csrSize)) { + // dense matrix size is the same for column major and row major, so maintain current layout + toDenseMatrix(!isTransposed) +} else { + if (cscSize <= csrSize) { +toSparseMatrix(colMajor = true) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107801133 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor +} else { + toSparseColMajor +} + } + + /** + * Returns a matrix in dense or sparse row major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedRowMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) { + toDenseRowMajor +} else { + toSparseRowMajor +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(colMajor = true) +val csrSize = getSparseSizeInBytes(colMajor = false) +if (getDenseSizeInBytes < math.min(cscSize, csrSize)) { + // dense matrix size is the same for column major and row major, so maintain current layout + toDenseMatrix(!isTransposed) +} else { + if (cscSize <= csrSize) { +toSparseMatrix(colMajor = true) + } else { +toSparseMatrix(colMajor = false) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107801115 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor +} else { + toSparseColMajor +} + } + + /** + * Returns a matrix in dense or sparse row major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedRowMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) { + toDenseRowMajor +} else { + toSparseRowMajor +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(colMajor = true) +val csrSize = getSparseSizeInBytes(colMajor = false) +if (getDenseSizeInBytes < math.min(cscSize, csrSize)) { + // dense matrix size is the same for column major and row major, so maintain current layout + toDenseMatrix(!isTransposed) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107784952 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor +} else { + toSparseColMajor +} + } + + /** + * Returns a matrix in dense or sparse row major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedRowMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) { + toDenseRowMajor +} else { + toSparseRowMajor +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(colMajor = true) +val csrSize = getSparseSizeInBytes(colMajor = false) +if (getDenseSizeInBytes < math.min(cscSize, csrSize)) { + // dense matrix size is the same for column major and row major, so maintain current layout + toDenseMatrix(!isTransposed) +} else { + if (cscSize <= csrSize) { +toSparseMatrix(colMajor = true) + } else { +toSparseMatrix(colMajor = false) --- End diff -- `toSparseRowMajor` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107785054 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor +} else { + toSparseColMajor +} + } + + /** + * Returns a matrix in dense or sparse row major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedRowMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) { + toDenseRowMajor +} else { + toSparseRowMajor +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(colMajor = true) +val csrSize = getSparseSizeInBytes(colMajor = false) +if (getDenseSizeInBytes < math.min(cscSize, csrSize)) { + // dense matrix size is the same for column major and row major, so maintain current layout + toDenseMatrix(!isTransposed) --- End diff -- Call `toDense` if we decide to make `toDense` and `toSparse` outputting the same layout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107784897 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in dense or sparse column major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedColMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) { + toDenseColMajor +} else { + toSparseColMajor +} + } + + /** + * Returns a matrix in dense or sparse row major format, whichever uses less storage. + */ + @Since("2.2.0") + def compressedRowMajor: Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) { + toDenseRowMajor +} else { + toSparseRowMajor +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(colMajor = true) +val csrSize = getSparseSizeInBytes(colMajor = false) +if (getDenseSizeInBytes < math.min(cscSize, csrSize)) { + // dense matrix size is the same for column major and row major, so maintain current layout + toDenseMatrix(!isTransposed) +} else { + if (cscSize <= csrSize) { +toSparseMatrix(colMajor = true) --- End diff -- `toSparseColumnMajor` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107794035 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (isTransposed && colMajor) { + new DenseMatrix(numRows, numCols, toArray, isTransposed = false) +} else if (!isTransposed && !colMajor) { + new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = true) +} else { + this } -new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) } --- End diff -- Could we override the `toArray` in DenseMatrix so when `this` is column major, we just return `this.values`? Otherwise, it's very expensive to create a new array. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107791878 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ --- End diff -- Minor, can we have ```scala protected def isColMajor = !isTransposed protected def isRowMajor = isTransposed ``` So the code can be understood easier? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107781333 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + --- End diff -- ditto. should we consider to maintain the same layout? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107779234 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,110 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSC: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSR: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) --- End diff -- But I thought this is a new api being added, so we can make it to maintain the same layout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107689306 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) + var j = 0 --- End diff -- I don't see how there is an unnecessary copy. `toArray` copies the elements of the current matrix to a new Array, then uses that as the backing array of a new `DenseMatrix`. We cannot modify the original matrix values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107609351 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) + var j = 0 --- End diff -- I have one question regarding performance. To use [`toArray`](https://github.com/sethah/spark/blob/4746ec0d97c002241be344494a6d2ddee3a7c2d5/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala#L49-L55) introduce allocation of a temporary array and data copy. Can we avoid this allocation and copy by passing an original array and access function to `new DenseMatrix`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107606305 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) + var j = 0 --- End diff -- It is fine with me since `isTransposed` is checked beforehand. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107557490 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) --- End diff -- This looks great to me! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107556503 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") ( } } + override def numNonzeros: Int = values.count(_ != 0) + + override def numActives: Int = values.length + /** - * Generate a `DenseMatrix` from the given `SparseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit zero values if they + * exist. + * + * @param colMajor Whether or not the resulting `SparseMatrix` values are in column major + *order. */ - @Since("2.0.0") - def toDense: DenseMatrix = { -new DenseMatrix(numRows, numCols, toArray) + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!(colMajor ^ isTransposed)) { + // breeze transpose rearranges values in column major and removes explicit zeros --- End diff -- This is not a blocker. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107550502 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) + var j = 0 --- End diff -- See above discussion, I am going to change this to use `toArray`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107550356 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,110 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSC: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSR: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) --- End diff -- Well, this would change the behavior of external facing code. Before if I called `toSparse` on a row major matrix, I'd get a column major matrix. If we maintain the layout, then I'd now get something different (column major). Otherwise, I'd agree it is best to maintain the layout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107550135 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,385 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) --- End diff -- I'm not sure I understand your meaning here. These are made to be two entirely different matrices anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107549650 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") ( } } + override def numNonzeros: Int = values.count(_ != 0) + + override def numActives: Int = values.length + /** - * Generate a `DenseMatrix` from the given `SparseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit zero values if they + * exist. + * + * @param colMajor Whether or not the resulting `SparseMatrix` values are in column major + *order. */ - @Since("2.0.0") - def toDense: DenseMatrix = { -new DenseMatrix(numRows, numCols, toArray) + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!(colMajor ^ isTransposed)) { + // breeze transpose rearranges values in column major and removes explicit zeros --- End diff -- I pretty much agree with you, but this is non-trivial code if we want to do it efficiently. Breeze has a pretty well-optimized implementation to do this. I would leave it as a follow up JIRA, or do it when/if we ever remove the Breeze dependency. Or do you think this is a blocker for this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107549283 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) --- End diff -- Hm, I don't think this solution is better. The entire point of abstract methods is to allow subclasses to implement a method differently. Since we need different implementations depending on the subclass, we should just implement them in the subclasses. We can do this with the following: scala trait Matrix { def toDenseMatrix(colMajor: Boolean): Matrix } class DenseMatrix extends Matrix { private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { if (isTransposed && colMajor) { new DenseMatrix(numRows, numCols, toArray, isTransposed = false) } else if (!isTransposed && !colMajor) { new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = true) } else { this } } } class SparseMatrix extends Matrix { private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { if (colMajor) new DenseMatrix(numRows, numCols, toArray) else new DenseMatrix(numRows, numCols, this.transpose.toArray, isTransposed = true) } } Which is less verbose than the previous code. I'm going to put that in the next commit. Let me know what you think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107519619 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSC: SparseMatrix = toSparseMatrix(colMajor = true) --- End diff -- After thinking about it again, let's have it as `toSparseColumnMajor` to make the apis consistent with the dense ones if you don't mind? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107518109 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable { */ --- End diff -- BTW, it's nice to have return type in public method. Can you add `Unit` as return type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107519221 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) --- End diff -- The following could work, and we only need one implementation in trait. Thanks. ```scala trait Matrix { var isTransposed: Boolean = true var numCols: Int = 0 var numRows: Int = 0 def foreachActive(f: (Int, Int, Double) => Unit): Unit def toDenseMatrix(colMajor: Boolean): Matrix = { this match { case _: DenseMatrix if this.isTransposed != colMajor => this case _: SparseMatrix | _: DenseMatrix if this.isTransposed == colMajor => val newValues = new Array[Double](numCols * numRows) this.foreachActive { case (row, col, value) => // filling the newValues } new DenseMatrix(numRows, numCols, newValues, isTransposed = !colMajor) case _ => throw new IllegalArgumentException("") } } } class DenseMatrix extends Matrix { def foreachActive(f: (Int, Int, Double) => Unit): Unit = { } } class SparseMatrix extends Matrix { def foreachActive(f: (Int, Int, Double) => Unit): Unit = { } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107442966 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) --- End diff -- I don't think we can put this in the trait, since when it is called on dense matrix we would like to return `this` in some cases (when no layout change is needed). But, yes, I think it simpler to use `toArray` which calls `foreachActive`. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107436372 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSC: SparseMatrix = toSparseMatrix(colMajor = true) --- End diff -- I'm not sure I have a preference. I don't mind leaving them as CSC and CSR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107352903 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) + var j = 0 --- End diff -- Can we move `if (isTransposed) {` out of a loop like the following since it is a loop invariant and we want to remove div/mod operations? ``` if (isTransposed) { // it is row major and we want column major var j = 0 var col = 0 while (col < numCols) { var row = 0 while (row < numRows) { ... } col += 1 } } else { // it is column major and we want row major var j = 0 var row = 0 while (row < numRows) { var col = 0 while (col < numCols) { ... } rows += 1 } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107313343 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,385 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) --- End diff -- Why not just make `dm2` dm1.transposed, but explicitly assign the value? Thus, you don't need to type the value in the array for the comparison. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107312629 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala --- @@ -160,22 +160,385 @@ class MatricesSuite extends SparkMLFunSuite { assert(sparseMat.values(2) === 10.0) } - test("toSparse, toDense") { -val m = 3 -val n = 2 -val values = Array(1.0, 2.0, 4.0, 5.0) -val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0) -val colPtrs = Array(0, 2, 4) -val rowIndices = Array(0, 1, 1, 2) + test("dense to dense") { +/* + dm1 = 4.0 2.0 -8.0 +-1.0 7.0 4.0 + + dm2 = 5.0 -9.0 4.0 +1.0 -3.0 -8.0 + */ +val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0)) +val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true) + +val dm3 = dm1.toDense +assert(dm3 === dm1) +assert(!dm3.isTransposed) +assert(dm3.values.equals(dm1.values)) + +val dm4 = dm1.toDenseMatrix(false) --- End diff -- I would like to make `toDenseMatrix` as private, and we test against `toDenseRowMajor` which is more explicit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107312905 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,110 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSC: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSR: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) --- End diff -- I'm debating that should we keep the same ordering of layout when we call `toSparse` or `toDense`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107306663 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSC: SparseMatrix = toSparseMatrix(colMajor = true) --- End diff -- I'm not good at naming, but since we use `toDenseRowMajor` for dense vector, should we use `toSparseColumnMajor`? Almost many packages are using `toCSC`, but I think we can make them consistent. Just my 2 cents. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107312194 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") ( } } + override def numNonzeros: Int = values.count(_ != 0) + + override def numActives: Int = values.length + /** - * Generate a `DenseMatrix` from the given `SparseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit zero values if they + * exist. + * + * @param colMajor Whether or not the resulting `SparseMatrix` values are in column major + *order. */ - @Since("2.0.0") - def toDense: DenseMatrix = { -new DenseMatrix(numRows, numCols, toArray) + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!(colMajor ^ isTransposed)) { + // breeze transpose rearranges values in column major and removes explicit zeros + if (!isTransposed) { +// it is row major and we want col major +val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]].t + Matrices.fromBreeze(breezeTransposed).transpose.asInstanceOf[SparseMatrix] + } else { +// it is col major and we want row major +val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]] +Matrices.fromBreeze(breezeTransposed).asInstanceOf[SparseMatrix] + } +} else { --- End diff -- Can we document here that it's when the layout of this and colMajor is different? Easier read than `(colMajor ^ isTranspose)` condition here. Even more readable to use pattern matching with exact boolean on both variables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107307120 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSC: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSR: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix in row major order. + */ + @Since("2.2.0") + def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false) + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in either dense or sparse format, whichever uses less storage. + * + * @param colMajor Whether the values of the resulting matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + @Since("2.2.0") + def compressed(colMajor: Boolean): Matrix = { --- End diff -- Let's make it private, and follow the previous style. Should add `compressedRowMajor`, `compressedColumnMajor` since it can be dense matrix in certain situations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107309786 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) --- End diff -- Simpler to use `foreachActive`? With it, both `toDenseMatrix` can have the same implementation for sparse and dense in trait. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107311742 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") ( } } + override def numNonzeros: Int = values.count(_ != 0) + + override def numActives: Int = values.length + /** - * Generate a `DenseMatrix` from the given `SparseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit zero values if they + * exist. + * + * @param colMajor Whether or not the resulting `SparseMatrix` values are in column major + *order. */ - @Since("2.0.0") - def toDense: DenseMatrix = { -new DenseMatrix(numRows, numCols, toArray) + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!(colMajor ^ isTransposed)) { + // breeze transpose rearranges values in column major and removes explicit zeros --- End diff -- I think it's hacky to use breeze's transpose behavior to remove zeros in sparse matrices. Can we have our own implementation given we're potentially remove breeze? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107306774 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSC: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSR: SparseMatrix = toSparseMatrix(colMajor = false) --- End diff -- Same question. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107303875 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in either dense or sparse format, whichever uses less storage. + * + * @param colMajor Whether the values of the resulting matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + @Since("2.2.0") + def compressed(colMajor: Boolean): Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor)) { + toDenseMatrix(colMajor) +} else { + toSparseMatrix(colMajor) +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { --- End diff -- +1 on the later one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107303720 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix --- End diff -- Fair enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105810508 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + --- End diff -- Added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105807171 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix --- End diff -- You mean `@inline private[ml] def ...` ? Do we expect this to be called often enough for that to make a difference? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105806817 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in either dense or sparse format, whichever uses less storage. + * + * @param colMajor Whether the values of the resulting matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + @Since("2.2.0") + def compressed(colMajor: Boolean): Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor)) { + toDenseMatrix(colMajor) +} else { + toSparseMatrix(colMajor) +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { --- End diff -- It's only a problem if we override/implement it in a subclass. Since it's contained wholly in the trait, it will be fine. I think this is ok to leave, though we could make it final? Also we could make three methods: `compressed`, `compressedCSC`, `compressedCSR`. I think the latter is a good solution, thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105272926 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable { */ --- End diff -- Should be fine. Small enough change :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105274636 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix --- End diff -- we may `inline` this as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105272811 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + + /** + * Returns a matrix in either dense or sparse format, whichever uses less storage. + * + * @param colMajor Whether the values of the resulting matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + @Since("2.2.0") + def compressed(colMajor: Boolean): Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor)) { + toDenseMatrix(colMajor) +} else { + toSparseMatrix(colMajor) +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.2.0") + def compressed: Matrix = { --- End diff -- Won't `compressed(colMajor: Boolean)` and `compressed` cause the overloading ambiguous issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105273751 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false) --- End diff -- `toCSR`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105273579 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true) --- End diff -- How about we follow [scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html), and call it as `toCSC`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105273397 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix --- End diff -- Maybe we can `inline` this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105274573 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param colMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.2.0") + def toDense: DenseMatrix = toDenseMatrix(colMajor = true) + --- End diff -- Could we add `toColumnMajorDense` and `toRowMajorDense`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105268787 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable { */ --- End diff -- I made the change. Not sure if we should do this in a separate PR though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105261250 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param columnMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix --- End diff -- Should we just leave `toDenseMatrix` private and have `toDense` always use colMajor = true? I think that's ok to do for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105032243 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable { */ --- End diff -- Can you make `foreachActive(f: (Int, Int, Double) => Unit)` public? This is public for vector. I believe it will be very useful, and I think it's stable enough to make it public. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r105068149 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param columnMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix --- End diff -- Yeah, this is very hacky in my opinion too! The problem is that when one overloads a function without parenthesis, the ambiguity happens because when that function is invoked without parenthesis, this can be calling the actual function without parenthesis, or getting the function with parenthesis. The following is an example demonstrating the issue. In my opinion, I would like to call it `toSparse(columnMajor: Boolean)` and `toSparse() = toSparse(true)`, but in the vector api, we already use the one without parenthesis, so it will result inconsistency in the api design. I think exposing the ability of converting it to `columnMajor` or `rowMajor` is very useful, as a result, we can expose it as `toCSRMatrix`, `toCSCMatrix`, and `toSparse` which converts the matrix to the one with smallest storage. ```scala scala> trait A { | def foo(b: Boolean): String | def foo: String = foo(true) | } defined trait A scala> class B extends A { | def foo(b: Boolean): String = b.toString | } defined class B scala> val b = new B b: B = B@67b6d4ae scala> b.foo :18: error: ambiguous reference to overloaded definition, both method foo in class B of type (b: Boolean)String and method foo in trait A of type => String match expected type ? b.foo ^ scala> val x: String = b.foo x: String = true scala> val y: Boolean=> String = b.foo y: Boolean => String = ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r87668326 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param columnMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.1.0") + def toSparse: SparseMatrix = toSparseMatrix(columnMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param columnMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private [ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.1.0") + def toDense: DenseMatrix = toDenseMatrix(columnMajor = true) --- End diff -- Nit, since we're using `numCols` already, should we call it `colMajor`? I saw couple packages using `colMajor` as the variable name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r87669463 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param columnMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.1.0") + def toSparse: SparseMatrix = toSparseMatrix(columnMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param columnMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private [ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix + + /** + * Converts this matrix to a dense matrix in column major order. + */ + @Since("2.1.0") + def toDense: DenseMatrix = toDenseMatrix(columnMajor = true) + + /** + * Returns a matrix in either dense or sparse format, whichever uses less storage. + * + * @param columnMajor Whether the values of the resulting matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + @Since("2.1.0") + def compressed(columnMajor: Boolean): Matrix = { +if (getDenseSizeInBytes < getSparseSizeInBytes(columnMajor)) { + toDenseMatrix(columnMajor) +} else { + toSparseMatrix(columnMajor) +} + } + + /** + * Returns a matrix in dense column major, dense row major, sparse row major, or sparse column + * major format, whichever uses less storage. When dense representation is optimal, it maintains + * the current layout order. + */ + @Since("2.1.0") + def compressed: Matrix = { +val cscSize = getSparseSizeInBytes(columnMajor = true) +val csrSize = getSparseSizeInBytes(columnMajor = false) +val minSparseSize = cscSize.min(csrSize) +if (getDenseSizeInBytes < minSparseSize) { + // size is the same either way, so maintain current layout --- End diff -- ``` scala if (getDenseSizeInBytes < math.min(cscSize, csrSize)) ... ... if (cscSize < csrSize) ``` could be easier to read. Also, can you elaborate the comment like ``` // sizes for dense matrix in row major or column major are the same, so maintain current layout ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r86360982 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -1076,4 +1240,15 @@ object Matrices { SparseMatrix.fromCOO(numRows, numCols, entries) } } + + private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = { +// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 8 + 8 + 1 +12L * numActives + 4L * numPtrs + 17L --- End diff -- ah right - no that's fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r86358529 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -1076,4 +1240,15 @@ object Matrices { SparseMatrix.fromCOO(numRows, numCols, entries) } } + + private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = { +// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 8 + 8 + 1 +12L * numActives + 4L * numPtrs + 17L --- End diff -- I wondered how confusing this comment might be. Since `values.length == rowIndices.length == numActives`: `8 * values.length + 4 * rowIndices.length = 8 * numActives + 4 * numActives = 12 * numActives` The comment is meant to show where each number comes from and the implementation is meant to just be a condensed computation. But please let me know if you think it's too confusing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r86305786 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -1076,4 +1240,15 @@ object Matrices { SparseMatrix.fromCOO(numRows, numCols, entries) } } + + private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = { +// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 8 + 8 + 1 +12L * numActives + 4L * numPtrs + 17L --- End diff -- The comment says 8 * values while this is 12? Seems like a mistype? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r85799898 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param columnMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix --- End diff -- A bit of explanation: I made this a private method with a different name because `toDense: DenseMatrix` and `toSparse: SparseMatrix` should be implemented in the trait, not in the subclasses. But we can't just put them here and use overloading, because we will get ambiguous reference compile errors. So, we implement them here and make this private with a different name to avoid this. I appreciate feedback on this approach - it feels a bit awkward. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r85799040 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param columnMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.1.0") + def toSparse: SparseMatrix = toSparseMatrix(columnMajor = true) + + /** + * Converts this matrix to a dense matrix. + * + * @param columnMajor Whether the values of the resulting dense matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private [ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix --- End diff -- minor: space --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/15628 [SPARK-17471][ML] Add compressed method to ML matrices ## What changes were proposed in this pull request? This patch adds a `compressed` method to ML `Matrix` class, which returns the minimal storage representation of the matrix - either sparse or dense. Because the space occupied by a sparse matrix is dependent upon its layout (i.e. column major or row major), this method must consider both cases. It may also be useful to force the layout to be column or row major beforehand, so an overload is added which takes in a `columnMajor: Boolean` parameter. The compressed implementation relies upon two new abstract methods `toDense(columnMajor: Boolean)` and `toSparse(columnMajor: Boolean)`, similar to the compressed method implemented in the `Vector` class. These methods also allow the layout of the resulting matrix to be specified via the `columnMajor` parameter. More detail on the new methods is given below. ## How was this patch tested? Added many new unit tests ## New methods (summary, not exhaustive list) **Matrix trait** * `def toDense(columnMajor: Boolean): DenseMatrix` (abstract) - converts the matrix (either sparse or dense) to dense format * `def toSparse(columnMajor: Boolean): SparseMatrix` (abstract) - converts the matrix (either sparse or dense) to sparse format * `def compressed: Matrix` - finds the minimum space representation of this matrix, considering both column and row major layouts, and converts it * `def compressed(columnMajor: Boolean): Matrix` - finds the minimum space representation of this matrix considering only column OR row major, and converts it **DenseMatrix class** * `def toDense(columnMajor: Boolean): DenseMatrix` - converts the dense matrix to a dense matrix, optionally changing the layout (data is NOT duplicated if the layouts are the same) * `def toSparse(columnMajors: Boolean): SparseMatrix` - converts the dense matrix to sparse matrix, using the specified layout **SparseMatrix class** * `def toDense(columnMajor: Boolean): DenseMatrix` - converts the sparse matrix to a dense matrix, using the specified layout * `def toSparse(columnMajors: Boolean): SparseMatrix` - converts the sparse matrix to sparse matrix. If the sparse matrix contains any explicit zeros, they are removed. If the layout requested does not match the current layout, data is copied to a new representation. If the layouts match and no explicit zeros exist, the current matrix is returned. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sethah/spark matrix_compress Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15628.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15628 commit 5a29a4513b9a917c05c117cd03efe79a2dd2875a Author: sethah Date: 2016-09-08T21:52:42Z first commit commit d2abb730f6a152f43d9afbee416e36fc2d4e16b2 Author: sethah Date: 2016-09-22T14:48:02Z start to add tests commit ee8ca60096f54beabf8cce9f348bda6f78fdfbd2 Author: sethah Date: 2016-09-23T22:55:20Z sparse to sparse stuff commit 68fc20e3cf9087e855edcbd12177183a77c3c36b Author: sethah Date: 2016-10-25T17:25:47Z improve test cases and cleanup commit 011b6019d78eb73e39a0de51d6a4d905a43fb2ad Author: sethah Date: 2016-10-25T18:22:23Z adding some helper methods and shoring up test cases commit d00926efbe637133b0f2d27dbfba14ddd97f9e57 Author: sethah Date: 2016-10-25T19:34:01Z cleanup commit a51e2173089cf79781b0d9a37492a4c4b4080881 Author: sethah Date: 2016-10-25T19:51:07Z minor cleanup --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org