subject:"\[GitHub\] spark pull request #15628\: \[SPARK\-17471\]\[ML\] Add compressed method to ML mat..."

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-05-15 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r116614596
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -148,7 +154,8 @@ sealed trait Matrix extends Serializable {
*  and column indices respectively with the type `Int`, and the 
final parameter is the
*  corresponding value in the matrix with type `Double`.
*/
-  private[spark] def foreachActive(f: (Int, Int, Double) => Unit)
+  @Since("2.2.0")
+  def foreachActive(f: (Int, Int, Double) => Unit): Unit
--- End diff --

That's a good point.  I guess we can leave it until someone complains & add 
a Java-friendly one as needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-05-15 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r116603167
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -148,7 +154,8 @@ sealed trait Matrix extends Serializable {
*  and column indices respectively with the type `Int`, and the 
final parameter is the
*  corresponding value in the matrix with type `Double`.
*/
-  private[spark] def foreachActive(f: (Int, Int, Double) => Unit)
+  @Since("2.2.0")
+  def foreachActive(f: (Int, Int, Double) => Unit): Unit
--- End diff --

I think we should do the same for `foreachActive` in Vector which was 
already a public api long time ago.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-05-15 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r116596231
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -148,7 +154,8 @@ sealed trait Matrix extends Serializable {
*  and column indices respectively with the type `Int`, and the 
final parameter is the
*  corresponding value in the matrix with type `Double`.
*/
-  private[spark] def foreachActive(f: (Int, Int, Double) => Unit)
+  @Since("2.2.0")
+  def foreachActive(f: (Int, Int, Double) => Unit): Unit
--- End diff --

@sethah @dbtsai Hi all, just saw this during QA.  This method is not very 
Java-friendly.  I'm OK with adding it as long as we document the fact that it's 
not Java-friendly.  We could also consider adding a Java-friendly version, 
perhaps using 
https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/function/Function2.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-04-04 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r109616018
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
+  var j = 0
--- End diff --

I see. `DenseMatrix.update()` modifies the contexts of `value`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15628


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107972466
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -1079,4 +1267,15 @@ object Matrices {
   SparseMatrix.fromCOO(numRows, numCols, entries)
 }
   }
+
+  private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
+// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 12 
+ 12 + 12 + 1
+12L * numActives + 4L * numPtrs + 37L
--- End diff --

Nice that we can get `37L` using java apis to ensure the portability. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107963752
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -1079,4 +1267,15 @@ object Matrices {
   SparseMatrix.fromCOO(numRows, numCols, entries)
 }
   }
+
+  private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
+// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 12 
+ 12 + 12 + 1
+12L * numActives + 4L * numPtrs + 37L
+  }
+
+  private[ml] def getDenseSize(numCols: Long, numRows: Long): Long = {
+// 8 * values.length + 12 + 1
--- End diff --

Can you document what is the magical number `12 + 1`? Also, can we make it

`java.lang.Double.BYTES * numCols * numRows + 13L`

since the size of primitive type can depend on the implementation of JVM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107961128
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +168,116 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix while maintaining the layout 
of the current matrix.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = !isTransposed)
--- End diff --

`def toSparse: SparseMatrix = toSparseMatrix(colMajor = isColMajor)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107966467
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -1079,4 +1267,15 @@ object Matrices {
   SparseMatrix.fromCOO(numRows, numCols, entries)
 }
   }
+
+  private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
+// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 12 
+ 12 + 12 + 1
+12L * numActives + 4L * numPtrs + 37L
--- End diff --

`(java.lang.Double.BYTES + java.lang.Integer.BYTES) * numActives + 
java.lang.Integer.BYTES * numPtrs + 37L`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107961223
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +168,116 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix while maintaining the layout 
of the current matrix.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = !isTransposed)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix while maintaining the layout 
of the current matrix.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = !isTransposed)
--- End diff --

`def toDense: DenseMatrix = toDenseMatrix(colMajor = isColMajor)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107954866
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107844935
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107842023
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107831775
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (isTransposed && colMajor) {
+  new DenseMatrix(numRows, numCols, toArray, isTransposed = false)
+} else if (!isTransposed && !colMajor) {
+  new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = 
true)
+} else {
+  this
 }
-new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
   }
 
--- End diff --

Sounds good. Let's do it in another PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832496
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
+} else {
+  if (cscSize <= csrSize) {
+toSparseMatrix(colMajor = true)
+  } else {
+toSparseMatrix(colMajor = false)
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107836197
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107836216
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107837259
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107837491
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107834704
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
--- End diff --

Can you group the tests either by `dm1` and `dm2` or by the same methods?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832751
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -587,18 +720,67 @@ class SparseMatrix @Since("2.0.0") (
 }
   }
 
+  override def numNonzeros: Int = values.count(_ != 0)
+
+  override def numActives: Int = values.length
+
   /**
-   * Generate a `DenseMatrix` from the given `SparseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit 
zero values if they
+   * exist.
+   *
+   * @param colMajor Whether or not the resulting `SparseMatrix` values 
are in column major
+   *order.
*/
-  @Since("2.0.0")
-  def toDense: DenseMatrix = {
-new DenseMatrix(numRows, numCols, toArray)
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!isTransposed && !colMajor) {
+  // it is row major and we want col major, use breeze to remove 
explicit zeros
+  val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]].t
+  
Matrices.fromBreeze(breezeTransposed).transpose.asInstanceOf[SparseMatrix]
+} else if (isTransposed && colMajor) {
+  // it is col major and we want row major, use breeze to remove 
explicit zeros
+  val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]]
+  Matrices.fromBreeze(breezeTransposed).asInstanceOf[SparseMatrix]
+} else {
+  val nnz = numNonzeros
+  if (nnz != numActives) {
+// remove explicit zeros
+val rr = new Array[Int](nnz)
+val vv = new Array[Double](nnz)
+val numPtrs = if (isTransposed) numRows else numCols
+val cc = new Array[Int](numPtrs + 1)
+var nzIdx = 0
+var j = 0
+while (j < numPtrs) {
+  var idx = colPtrs(j)
+  val idxEnd = colPtrs(j + 1)
+  cc(j) = nzIdx
+  while (idx < idxEnd) {
+if (values(idx) != 0.0) {
+  vv(nzIdx) = values(idx)
+  rr(nzIdx) = rowIndices(idx)
+  nzIdx += 1
+}
+idx += 1
+  }
+  j += 1
+}
+cc(j) = nnz
+new SparseMatrix(numRows, numCols, cc, rr, vv, isTransposed = 
isTransposed)
+  } else {
+this
+  }
+}
   }
 
-  override def numNonzeros: Int = values.count(_ != 0)
-
-  override def numActives: Int = values.length
+  /**
+   * Generate a `DenseMatrix` from the given `SparseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values are in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (colMajor) new DenseMatrix(numRows, numCols, toArray)
--- End diff --

`new DenseMatrix(numRows, numCols, this.toArray, isTransposed = false)` to 
make the style consistent. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107836205
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835539
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832900
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = !isTransposed)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = !isTransposed)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
--- End diff --

nit, with `this`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107837448
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835746
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107837113
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107836138
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107836155
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835989
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835326
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
--- End diff --

`val dm4 = dm1.toDenseRowMajor` and `val dm7 = dm1.toDenseRowMajor` are the 
same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835213
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
--- End diff --

You tested `dm1.toSparseColMajor` twice.

Will be nice to group them like
```scala
val sm1 = dm1.toSparseColMajor

val sm2 = dm2.toSparseColMajor

val sm3 = dm3.toSparseColMajor

val sm4 = dm1.toSparseRowMajor

val sm5 = dm2.toSparseRowMajor

val sm6 = dm3.toSparseRowMajor

val sm7 = dm1.toSparse

val sm8 = dm2.toSparse

val sm9 = dm3.toSparse
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832481
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
+} else {
+  if (cscSize <= csrSize) {
+toSparseMatrix(colMajor = true)
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107837410
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832469
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
--- End diff --

I don't see the change here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835655
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new SparseMatrix(2, 3, Array(0, 0, 2, 4), Array(0,

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832867
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
--- End diff --

`this.toDense`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832826
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (isTransposed && colMajor) {
+  new DenseMatrix(numRows, numCols, toArray, isTransposed = false)
+} else if (!isTransposed && !colMajor) {
+  new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = 
true)
--- End diff --

I'll call `this.toArray`, and `this.transpose.toArray` as you did in other 
place to make it explicit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832982
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
--- End diff --

Ok, I added these methods. I updated the test suites to use them instead of 
`isTransposed`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107803088
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (isTransposed && colMajor) {
+  new DenseMatrix(numRows, numCols, toArray, isTransposed = false)
+} else if (!isTransposed && !colMajor) {
+  new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = 
true)
+} else {
+  this
 }
-new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
   }
 
--- End diff --

I have to say I'm a bit surprised - I thought that's what `toArray` already 
did! Yeah, that seems like a good change, but I'd prefer to do it in another pr 
because we need to make sure that this doesn't adversely affect other places 
that use `toArray` as well as adding unit tests. If that sounds ok, I'll make a 
JIRA for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107801091
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
+} else {
+  if (cscSize <= csrSize) {
+toSparseMatrix(colMajor = true)
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107801133
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
+} else {
+  if (cscSize <= csrSize) {
+toSparseMatrix(colMajor = true)
+  } else {
+toSparseMatrix(colMajor = false)
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107801115
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107784952
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
+} else {
+  if (cscSize <= csrSize) {
+toSparseMatrix(colMajor = true)
+  } else {
+toSparseMatrix(colMajor = false)
--- End diff --

`toSparseRowMajor`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107785054
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
--- End diff --

Call `toDense` if we decide to make `toDense` and `toSparse` outputting the 
same layout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107784897
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
+} else {
+  if (cscSize <= csrSize) {
+toSparseMatrix(colMajor = true)
--- End diff --

`toSparseColumnMajor`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107794035
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (isTransposed && colMajor) {
+  new DenseMatrix(numRows, numCols, toArray, isTransposed = false)
+} else if (!isTransposed && !colMajor) {
+  new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = 
true)
+} else {
+  this
 }
-new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
   }
 
--- End diff --

Could we override the `toArray` in DenseMatrix so when `this` is column 
major, we just return `this.values`? Otherwise, it's very expensive to create a 
new array.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107791878
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
--- End diff --

Minor, can we have 
```scala
protected def isColMajor = !isTransposed

protected def isRowMajor = isTransposed
```

So the code can be understood easier? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107781333
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
--- End diff --

ditto. should we consider to maintain the same layout?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107779234
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,110 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSR: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

But I thought this is a new api being added, so we can make it to maintain 
the same layout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107689306
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
+  var j = 0
--- End diff --

I don't see how there is an unnecessary copy. `toArray` copies the elements 
of the current matrix to a new Array, then uses that as the backing array of a 
new `DenseMatrix`. We cannot modify the original matrix values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107609351
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
+  var j = 0
--- End diff --

I have one question regarding performance.
To use 
[`toArray`](https://github.com/sethah/spark/blob/4746ec0d97c002241be344494a6d2ddee3a7c2d5/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala#L49-L55)
 introduce allocation of a temporary array and data copy.
Can we avoid this allocation and copy by passing an original array and 
access function to `new DenseMatrix`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107606305
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
+  var j = 0
--- End diff --

It is fine with me since `isTransposed` is checked beforehand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107557490
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
--- End diff --

This looks great to me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107556503
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") (
 }
   }
 
+  override def numNonzeros: Int = values.count(_ != 0)
+
+  override def numActives: Int = values.length
+
   /**
-   * Generate a `DenseMatrix` from the given `SparseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit 
zero values if they
+   * exist.
+   *
+   * @param colMajor Whether or not the resulting `SparseMatrix` values 
are in column major
+   *order.
*/
-  @Since("2.0.0")
-  def toDense: DenseMatrix = {
-new DenseMatrix(numRows, numCols, toArray)
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!(colMajor ^ isTransposed)) {
+  // breeze transpose rearranges values in column major and removes 
explicit zeros
--- End diff --

This is not a blocker.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107550502
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
+  var j = 0
--- End diff --

See above discussion, I am going to change this to use `toArray`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107550356
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,110 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSR: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

Well, this would change the behavior of external facing code. Before if I 
called `toSparse` on a row major matrix, I'd get a column major matrix. If we 
maintain the layout, then I'd now get something different (column major). 
Otherwise, I'd agree it is best to maintain the layout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107550135
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,385 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
--- End diff --

I'm not sure I understand your meaning here. These are made to be two 
entirely different matrices anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107549650
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") (
 }
   }
 
+  override def numNonzeros: Int = values.count(_ != 0)
+
+  override def numActives: Int = values.length
+
   /**
-   * Generate a `DenseMatrix` from the given `SparseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit 
zero values if they
+   * exist.
+   *
+   * @param colMajor Whether or not the resulting `SparseMatrix` values 
are in column major
+   *order.
*/
-  @Since("2.0.0")
-  def toDense: DenseMatrix = {
-new DenseMatrix(numRows, numCols, toArray)
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!(colMajor ^ isTransposed)) {
+  // breeze transpose rearranges values in column major and removes 
explicit zeros
--- End diff --

I pretty much agree with you, but this is non-trivial code if we want to do 
it efficiently. Breeze has a pretty well-optimized implementation to do this. I 
would leave it as a follow up JIRA, or do it when/if we ever remove the Breeze 
dependency. Or do you think this is a blocker for this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107549283
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
--- End diff --

Hm, I don't think this solution is better. The entire point of abstract 
methods is to allow subclasses to implement a method differently. Since we need 
different implementations depending on the subclass, we should just implement 
them in the subclasses. We can do this with the following:

scala
trait Matrix {
  def toDenseMatrix(colMajor: Boolean): Matrix
}
class DenseMatrix extends Matrix {
private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix 
= {
if (isTransposed && colMajor) {
  new DenseMatrix(numRows, numCols, toArray, isTransposed = false)
} else if (!isTransposed && !colMajor) {
  new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = 
true)
} else {
  this
}
  }
}
class SparseMatrix extends Matrix {
  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = {
if (colMajor) new DenseMatrix(numRows, numCols, toArray)
else new DenseMatrix(numRows, numCols, this.transpose.toArray, 
isTransposed = true)
  }
}


Which is less verbose than the previous code. I'm going to put that in the 
next commit. Let me know what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107519619
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

After thinking about it again, let's have it as `toSparseColumnMajor` to 
make the apis consistent with the dense ones if you don't mind?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107518109
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
--- End diff --

BTW, it's nice to have return type in public method. Can you add `Unit` as 
return type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107519221
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
--- End diff --

The following could work, and we only need one implementation in trait. 
Thanks.

```scala
trait Matrix {

  var isTransposed: Boolean = true
  var numCols: Int = 0
  var numRows: Int = 0

  def foreachActive(f: (Int, Int, Double) => Unit): Unit

  def toDenseMatrix(colMajor: Boolean): Matrix = {
this match {
  case _: DenseMatrix if this.isTransposed != colMajor =>
this
  case _: SparseMatrix | _: DenseMatrix if this.isTransposed == 
colMajor =>
val newValues = new Array[Double](numCols * numRows)

this.foreachActive { case (row, col, value) =>
// filling the newValues
}
new DenseMatrix(numRows, numCols, newValues, isTransposed = 
!colMajor)
  case _ =>
throw new IllegalArgumentException("")
}
  }
}

class DenseMatrix extends Matrix {

  def foreachActive(f: (Int, Int, Double) => Unit): Unit = {

  }

}

class SparseMatrix extends Matrix {
  def foreachActive(f: (Int, Int, Double) => Unit): Unit = {

  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107442966
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
--- End diff --

I don't think we can put this in the trait, since when it is called on 
dense matrix we would like to return `this` in some cases (when no layout 
change is needed). But, yes, I think it simpler to use `toArray` which calls 
`foreachActive`. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107436372
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

I'm not sure I have a preference. I don't mind leaving them as CSC and CSR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107352903
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +396,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
+  var j = 0
--- End diff --

Can we move `if (isTransposed) {` out of a loop like the following since it 
is a loop invariant and we want to remove div/mod operations?
```
if (isTransposed) {
  // it is row major and we want column major
  var j = 0
  var col = 0
  while (col < numCols) {
var row = 0
while (row < numRows) {
  ...
}
col += 1
  }
} else {
  // it is column major and we want row major
  var j = 0
  var row = 0
  while (row < numRows) {
var col = 0
while (col < numCols) {
  ...
}
rows += 1
  }
}
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107313343
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,385 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
--- End diff --

Why not just make `dm2` dm1.transposed, but explicitly assign the value?  
Thus, you don't need to type the value in the array for the comparison. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107312629
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,385 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseMatrix(false)
--- End diff --

I would like to make `toDenseMatrix` as private, and we test against 
`toDenseRowMajor` which is more explicit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107312905
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,110 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSR: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

I'm debating that should we keep the same ordering of layout when we call 
`toSparse` or `toDense`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107306663
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

I'm not good at naming, but since we use `toDenseRowMajor` for dense 
vector, should we use `toSparseColumnMajor`? Almost many packages are using 
`toCSC`, but I think we can make them consistent. Just my 2 cents.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107312194
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") (
 }
   }
 
+  override def numNonzeros: Int = values.count(_ != 0)
+
+  override def numActives: Int = values.length
+
   /**
-   * Generate a `DenseMatrix` from the given `SparseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit 
zero values if they
+   * exist.
+   *
+   * @param colMajor Whether or not the resulting `SparseMatrix` values 
are in column major
+   *order.
*/
-  @Since("2.0.0")
-  def toDense: DenseMatrix = {
-new DenseMatrix(numRows, numCols, toArray)
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!(colMajor ^ isTransposed)) {
+  // breeze transpose rearranges values in column major and removes 
explicit zeros
+  if (!isTransposed) {
+// it is row major and we want col major
+val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]].t
+
Matrices.fromBreeze(breezeTransposed).transpose.asInstanceOf[SparseMatrix]
+  } else {
+// it is col major and we want row major
+val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]]
+Matrices.fromBreeze(breezeTransposed).asInstanceOf[SparseMatrix]
+  }
+} else {
--- End diff --

Can we document here that it's when the layout of this and colMajor is 
different? Easier read than `(colMajor ^ isTranspose)` condition here. Even 
more readable to use pattern matching with exact boolean on both variables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107307120
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSR: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in either dense or sparse format, whichever uses 
less storage.
+   *
+   * @param colMajor Whether the values of the resulting matrix should be 
in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  @Since("2.2.0")
+  def compressed(colMajor: Boolean): Matrix = {
--- End diff --

Let's make it private, and follow the previous style. Should add 
`compressedRowMajor`, `compressedColumnMajor` since it can be dense matrix in 
certain situations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107309786
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
--- End diff --

Simpler to use `foreachActive`? With it, both `toDenseMatrix` can have the 
same implementation for sparse and dense in trait.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107311742
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") (
 }
   }
 
+  override def numNonzeros: Int = values.count(_ != 0)
+
+  override def numActives: Int = values.length
+
   /**
-   * Generate a `DenseMatrix` from the given `SparseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit 
zero values if they
+   * exist.
+   *
+   * @param colMajor Whether or not the resulting `SparseMatrix` values 
are in column major
+   *order.
*/
-  @Since("2.0.0")
-  def toDense: DenseMatrix = {
-new DenseMatrix(numRows, numCols, toArray)
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!(colMajor ^ isTransposed)) {
+  // breeze transpose rearranges values in column major and removes 
explicit zeros
--- End diff --

I think it's hacky to use breeze's transpose behavior to remove zeros in 
sparse matrices. Can we have our own implementation given we're potentially 
remove breeze?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107306774
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSR: SparseMatrix = toSparseMatrix(colMajor = false)
--- End diff --

Same question.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107303875
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in either dense or sparse format, whichever uses 
less storage.
+   *
+   * @param colMajor Whether the values of the resulting matrix should be 
in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  @Since("2.2.0")
+  def compressed(colMajor: Boolean): Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor)) {
+  toDenseMatrix(colMajor)
+} else {
+  toSparseMatrix(colMajor)
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
--- End diff --

+1 on the later one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107303720
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
--- End diff --

Fair enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-13 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105810508
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
--- End diff --

Added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-13 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105807171
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
--- End diff --

You mean `@inline private[ml] def ...` ? Do we expect this to be called 
often enough for that to make a difference?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-13 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105806817
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in either dense or sparse format, whichever uses 
less storage.
+   *
+   * @param colMajor Whether the values of the resulting matrix should be 
in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  @Since("2.2.0")
+  def compressed(colMajor: Boolean): Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor)) {
+  toDenseMatrix(colMajor)
+} else {
+  toSparseMatrix(colMajor)
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
--- End diff --

It's only a problem if we override/implement it in a subclass. Since it's 
contained wholly in the trait, it will be fine. I think this is ok to leave, 
though we could make it final? Also we could make three methods: `compressed`, 
`compressedCSC`, `compressedCSR`. I think the latter is a good solution, 
thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105272926
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
--- End diff --

Should be fine. Small enough change :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105274636
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
--- End diff --

we may `inline` this as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105272811
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in either dense or sparse format, whichever uses 
less storage.
+   *
+   * @param colMajor Whether the values of the resulting matrix should be 
in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  @Since("2.2.0")
+  def compressed(colMajor: Boolean): Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor)) {
+  toDenseMatrix(colMajor)
+} else {
+  toSparseMatrix(colMajor)
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
--- End diff --

Won't `compressed(colMajor: Boolean)` and `compressed` cause the 
overloading ambiguous issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105273751
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
--- End diff --

`toCSR`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105273579
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

How about we follow 
[scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html),
 and call it as `toCSC`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105273397
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
--- End diff --

Maybe we can `inline` this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105274573
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
--- End diff --

Could we add `toColumnMajorDense` and `toRowMajorDense`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105268787
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
--- End diff --

I made the change. Not sure if we should do this in a separate PR though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105261250
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param columnMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
--- End diff --

Should we just leave `toDenseMatrix` private and have `toDense` always use 
colMajor = true? I think that's ok to do for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-08 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105032243
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
--- End diff --

Can you make `foreachActive(f: (Int, Int, Double) => Unit)` public? This is 
public for vector. I believe it will be very useful, and I think it's stable 
enough to make it public.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-08 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105068149
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param columnMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
--- End diff --

Yeah, this is very hacky in my opinion too! 

The problem is that when one overloads a function without parenthesis, the 
ambiguity happens because when that function is invoked without parenthesis, 
this can be calling the actual function without parenthesis, or getting the 
function with parenthesis. The following is an example demonstrating the issue. 

In my opinion, I would like to call it `toSparse(columnMajor: Boolean)` and 
`toSparse() = toSparse(true)`, but in the vector api, we already use the one 
without parenthesis, so it will result inconsistency in the api design.

I think exposing the ability of converting it to `columnMajor` or 
`rowMajor` is very useful, as a result, we can expose it as `toCSRMatrix`, 
`toCSCMatrix`, and `toSparse` which converts the matrix to the one with 
smallest storage. 

```scala
scala> trait A {
 | def foo(b: Boolean): String
 | def foo: String = foo(true)
 | }
defined trait A

scala> class B extends A {
 | def foo(b: Boolean): String = b.toString
 | }
defined class B

scala> val b = new B
b: B = B@67b6d4ae

scala> b.foo
:18: error: ambiguous reference to overloaded definition,
both method foo in class B of type (b: Boolean)String
and  method foo in trait A of type => String
match expected type ?
   b.foo
 ^

scala> val x: String = b.foo
x: String = true

scala> val y: Boolean=> String = b.foo
y: Boolean => String = 
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-08 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r87668326
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param columnMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.1.0")
+  def toSparse: SparseMatrix = toSparseMatrix(columnMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param columnMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private [ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.1.0")
+  def toDense: DenseMatrix = toDenseMatrix(columnMajor = true)
--- End diff --

Nit, since we're using `numCols` already, should we call it `colMajor`? I 
saw couple packages using `colMajor` as the variable name.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-08 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r87669463
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param columnMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.1.0")
+  def toSparse: SparseMatrix = toSparseMatrix(columnMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param columnMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private [ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.1.0")
+  def toDense: DenseMatrix = toDenseMatrix(columnMajor = true)
+
+  /**
+   * Returns a matrix in either dense or sparse format, whichever uses 
less storage.
+   *
+   * @param columnMajor Whether the values of the resulting matrix should 
be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  @Since("2.1.0")
+  def compressed(columnMajor: Boolean): Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(columnMajor)) {
+  toDenseMatrix(columnMajor)
+} else {
+  toSparseMatrix(columnMajor)
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.1.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(columnMajor = true)
+val csrSize = getSparseSizeInBytes(columnMajor = false)
+val minSparseSize = cscSize.min(csrSize)
+if (getDenseSizeInBytes < minSparseSize) {
+  // size is the same either way, so maintain current layout
--- End diff --

``` scala
if (getDenseSizeInBytes < math.min(cscSize, csrSize)) 
...
...
if (cscSize < csrSize)
```

could be easier to read.

Also, can you elaborate the comment like 

```
// sizes for dense matrix in row major or column major are the same, so 
maintain current layout
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2016-11-03 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r86360982
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -1076,4 +1240,15 @@ object Matrices {
   SparseMatrix.fromCOO(numRows, numCols, entries)
 }
   }
+
+  private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
+// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 8 
+ 8 + 1
+12L * numActives + 4L * numPtrs + 17L
--- End diff --

ah right - no that's fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2016-11-03 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r86358529
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -1076,4 +1240,15 @@ object Matrices {
   SparseMatrix.fromCOO(numRows, numCols, entries)
 }
   }
+
+  private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
+// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 8 
+ 8 + 1
+12L * numActives + 4L * numPtrs + 17L
--- End diff --

I wondered how confusing this comment might be. Since `values.length == 
rowIndices.length == numActives`:

`8 * values.length + 4 * rowIndices.length = 8 * numActives + 4 * 
numActives = 12 * numActives`

The comment is meant to show where each number comes from and the 
implementation is meant to just be a condensed computation. But please let me 
know if you think it's too confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2016-11-03 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r86305786
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -1076,4 +1240,15 @@ object Matrices {
   SparseMatrix.fromCOO(numRows, numCols, entries)
 }
   }
+
+  private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
+// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 8 
+ 8 + 1
+12L * numActives + 4L * numPtrs + 17L
--- End diff --

The comment says 8 * values while this is 12? Seems like a mistype?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2016-10-31 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r85799898
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param columnMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
--- End diff --

A bit of explanation: I made this a private method with a different name 
because `toDense: DenseMatrix` and `toSparse: SparseMatrix` should be 
implemented in the trait, not in the subclasses. But we can't just put them 
here and use overloading, because we will get ambiguous reference compile 
errors. So, we implement them here and make this private with a different name 
to avoid this. I appreciate feedback on this approach - it feels a bit awkward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2016-10-31 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r85799040
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param columnMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.1.0")
+  def toSparse: SparseMatrix = toSparseMatrix(columnMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param columnMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private [ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix
--- End diff --

minor: space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2016-10-25 Thread sethah

GitHub user sethah opened a pull request:

https://github.com/apache/spark/pull/15628

[SPARK-17471][ML] Add compressed method to ML matrices

## What changes were proposed in this pull request?

This patch adds a `compressed` method to ML `Matrix` class, which returns 
the minimal storage representation of the matrix - either sparse or dense. 
Because the space occupied by a sparse matrix is dependent upon its layout 
(i.e. column major or row major), this method must consider both cases. It may 
also be useful to force the layout to be column or row major beforehand, so an 
overload is added which takes in a `columnMajor: Boolean` parameter.

The compressed implementation relies upon two new abstract methods 
`toDense(columnMajor: Boolean)` and `toSparse(columnMajor: Boolean)`, similar 
to the compressed method implemented in the `Vector` class. These methods also 
allow the layout of the resulting matrix to be specified via the `columnMajor` 
parameter. More detail on the new methods is given below.

## How was this patch tested?
Added many new unit tests

## New methods (summary, not exhaustive list)

**Matrix trait**

* `def toDense(columnMajor: Boolean): DenseMatrix` (abstract) - converts 
the matrix (either sparse or dense) to dense format
* `def toSparse(columnMajor: Boolean): SparseMatrix` (abstract) -  converts 
the matrix (either sparse or dense) to sparse format
* `def compressed: Matrix` - finds the minimum space representation of this 
matrix, considering both column and row major layouts, and converts it
* `def compressed(columnMajor: Boolean): Matrix` - finds the minimum space 
representation of this matrix considering only column OR row major, and 
converts it

**DenseMatrix class**

* `def toDense(columnMajor: Boolean): DenseMatrix` - converts the dense 
matrix to a dense matrix, optionally changing the layout (data is NOT 
duplicated if the layouts are the same)
* `def toSparse(columnMajors: Boolean): SparseMatrix` - converts the dense 
matrix to sparse matrix, using the specified layout

**SparseMatrix class**

* `def toDense(columnMajor: Boolean): DenseMatrix` - converts the sparse 
matrix to a dense matrix, using the specified layout
* `def toSparse(columnMajors: Boolean): SparseMatrix` - converts the sparse 
matrix to sparse matrix. If the sparse matrix contains any explicit zeros, they 
are removed. If the layout requested does not match the current layout, data is 
copied to a new representation. If the layouts match and no explicit zeros 
exist, the current matrix is returned.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sethah/spark matrix_compress

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15628.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15628


commit 5a29a4513b9a917c05c117cd03efe79a2dd2875a
Author: sethah 
Date:   2016-09-08T21:52:42Z

first commit

commit d2abb730f6a152f43d9afbee416e36fc2d4e16b2
Author: sethah 
Date:   2016-09-22T14:48:02Z

start to add tests

commit ee8ca60096f54beabf8cce9f348bda6f78fdfbd2
Author: sethah 
Date:   2016-09-23T22:55:20Z

sparse to sparse stuff

commit 68fc20e3cf9087e855edcbd12177183a77c3c36b
Author: sethah 
Date:   2016-10-25T17:25:47Z

improve test cases and cleanup

commit 011b6019d78eb73e39a0de51d6a4d905a43fb2ad
Author: sethah 
Date:   2016-10-25T18:22:23Z

adding some helper methods and shoring up test cases

commit d00926efbe637133b0f2d27dbfba14ddd97f9e57
Author: sethah 
Date:   2016-10-25T19:34:01Z

cleanup

commit a51e2173089cf79781b0d9a37492a4c4b4080881
Author: sethah 
Date:   2016-10-25T19:51:07Z

minor cleanup




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

99 matches

Mail list logo