[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68304656
  
LGTM. Merged into master. Thanks!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3319


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68036616
  
  [Test build #24774 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24774/consoleFull)
 for   PR 3319 at commit 
[`04c4829`](https://github.com/apache/spark/commit/04c4829d8364a36314485d6bdceed5ab93c67398).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68036621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24774/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68038204
  
  [Test build #24775 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24775/consoleFull)
 for   PR 3319 at commit 
[`b0354f6`](https://github.com/apache/spark/commit/b0354f616f7f49ee9b19f6b8e5d0dc775b05dba2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68038211
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24775/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68033235
  
  [Test build #24774 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24774/consoleFull)
 for   PR 3319 at commit 
[`04c4829`](https://github.com/apache/spark/commit/04c4829d8364a36314485d6bdceed5ab93c67398).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68033509
  
  [Test build #24775 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24775/consoleFull)
 for   PR 3319 at commit 
[`b0354f6`](https://github.com/apache/spark/commit/b0354f616f7f49ee9b19f6b8e5d0dc775b05dba2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67729483
  
  [Test build #24668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24668/consoleFull)
 for   PR 3319 at commit 
[`10a63a6`](https://github.com/apache/spark/commit/10a63a6e6b583e6e79ced58ea9b73937656f5a24).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67731189
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24668/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67731185
  
  [Test build #24668 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24668/consoleFull)
 for   PR 3319 at commit 
[`10a63a6`](https://github.com/apache/spark/commit/10a63a6e6b583e6e79ced58ea9b73937656f5a24).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22094195
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +331,145 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples. Array must be sorted first by *column* 
index and then by row
+   * index.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of ((row, column), value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), 
Double)]): SparseMatrix = {
+val colPtrs = new ArrayBuffer[Int](numCols + 1)
+colPtrs.append(0)
+var nnz = 0
+var lastCol = 0
+val values = entries.map { case ((i, j), v) =
+  while (j != lastCol) {
+colPtrs.append(nnz)
+lastCol += 1
+if (lastCol  numCols) {
+  throw new IndexOutOfBoundsException(Please make sure that the 
entries array is  +
+sorted by COLUMN index first and then by row index.)
+}
+  }
+  nnz += 1
+  v
+}
+while (numCols  lastCol) {
+  colPtrs.append(nnz)
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, colPtrs.toArray, 
entries.map(_._1._1), values)
+  }
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a `SparseMatrix` with a given random number generator and 
`method`, which
+* specifies the distribution. */
+  private def genRandMatrix(
+  numRows: Int,
+  numCols: Int,
--- End diff --

That's possible. This is how MATLAB does it, calls rand and fills it 
according to sprand or sprandn. I thought it looked more functional this 
way, get over with it with just one loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67617761
  
  [Test build #24637 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24637/consoleFull)
 for   PR 3319 at commit 
[`3971c93`](https://github.com/apache/spark/commit/3971c931d18dfaea0ea66e0cbb19b61dbf310a66).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67623680
  
  [Test build #24639 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24639/consoleFull)
 for   PR 3319 at commit 
[`f62d6c7`](https://github.com/apache/spark/commit/f62d6c795c6293817df195369e2873eb97b11a0e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67626010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24637/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67626002
  
  [Test build #24637 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24637/consoleFull)
 for   PR 3319 at commit 
[`3971c93`](https://github.com/apache/spark/commit/3971c931d18dfaea0ea66e0cbb19b61dbf310a66).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67631049
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24639/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67631046
  
  [Test build #24639 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24639/consoleFull)
 for   PR 3319 at commit 
[`f62d6c7`](https://github.com/apache/spark/commit/f62d6c795c6293817df195369e2873eb97b11a0e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120275
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -123,6 +135,130 @@ class DenseMatrix(val numRows: Int, val numCols: Int, 
val values: Array[Double])
   }
 
   override def copy = new DenseMatrix(numRows, numCols, values.clone())
+
+  private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, 
numCols, values.map(f))
+
+  private[mllib] def update(f: Double = Double): DenseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `SparseMatrix` from the given `DenseMatrix`. */
+  def toSparse(): SparseMatrix = {
+val spVals: ArrayBuilder[Double] = new ArrayBuilder.ofDouble
+val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+val rowIndices: ArrayBuilder[Int] = new ArrayBuilder.ofInt
+var nnz = 0
+var lastCol = -1
+var j = 0
+while (j  numCols) {
+  var i = 0
+  val indStart = j * numRows
+  while (i  numRows) {
+val v = values(indStart + i)
+if (v != 0.0) {
+  rowIndices += i
+  spVals += v
+  while (j != lastCol) {
--- End diff --

It iterates over a full 2-d grid. We can update `colPtrs` after each `while 
(i  numRows)` block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120278
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -123,6 +135,130 @@ class DenseMatrix(val numRows: Int, val numCols: Int, 
val values: Array[Double])
   }
 
   override def copy = new DenseMatrix(numRows, numCols, values.clone())
+
+  private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, 
numCols, values.map(f))
+
+  private[mllib] def update(f: Double = Double): DenseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `SparseMatrix` from the given `DenseMatrix`. */
+  def toSparse(): SparseMatrix = {
+val spVals: ArrayBuilder[Double] = new ArrayBuilder.ofDouble
+val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+val rowIndices: ArrayBuilder[Int] = new ArrayBuilder.ofInt
+var nnz = 0
+var lastCol = -1
+var j = 0
+while (j  numCols) {
+  var i = 0
+  val indStart = j * numRows
+  while (i  numRows) {
+val v = values(indStart + i)
+if (v != 0.0) {
+  rowIndices += i
+  spVals += v
+  while (j != lastCol) {
+colPtrs(lastCol + 1) = nnz
+lastCol += 1
+  }
+  nnz += 1
+}
+i += 1
+  }
+  j += 1
+}
+while (numCols  lastCol) {
--- End diff --

This check is not necessary. At the end of the `while (j  numCols)` loop, 
`j = numCols + 1`. So it is `colPtrs(j) = nnz`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120295
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +335,167 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of (row, column, value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, 
Double)]): SparseMatrix = {
+val sortedEntries = entries.sortBy(v = (v._2, v._1))
+val colPtrs = new Array[Int](numCols + 1)
+var nnz = 0
+var lastCol = -1
+val values = sortedEntries.map { case (i, j, v) =
+  while (j != lastCol) {
+colPtrs(lastCol + 1) = nnz
+lastCol += 1
+  }
+  nnz += 1
+  v
+}
+while (numCols  lastCol) {
+  colPtrs(lastCol + 1) = nnz
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, colPtrs.toArray, 
sortedEntries.map(_._1), values)
+  }
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a `SparseMatrix` with a given random number generator and 
`method`, which
+* specifies the distribution. */
+  private def genRandMatrix(
+  numRows: Int,
+  numCols: Int,
+  density: Double,
+  rng: Random,
+  method: Random = Double): SparseMatrix = {
+require(density = 0.0  density = 1.0, density must be a double in 
the range  +
+  s0.0 = d = 1.0. Currently, density: $density)
+val length = math.ceil(numRows * numCols * density).toInt
+val entries = MutableMap[(Int, Int), Double]()
+var i = 0
+if (density == 0.0) {
+  return new SparseMatrix(numRows, numCols, new Array[Int](numCols + 
1),
+Array[Int](), Array[Double]())
+} else if (density == 1.0) {
+  return new SparseMatrix(numRows, numCols, (0 to numRows * numCols by 
numRows).toArray,
--- End diff --

indices out of bound


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120305
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +335,167 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of (row, column, value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, 
Double)]): SparseMatrix = {
+val sortedEntries = entries.sortBy(v = (v._2, v._1))
+val colPtrs = new Array[Int](numCols + 1)
+var nnz = 0
+var lastCol = -1
+val values = sortedEntries.map { case (i, j, v) =
+  while (j != lastCol) {
+colPtrs(lastCol + 1) = nnz
+lastCol += 1
+  }
+  nnz += 1
+  v
+}
+while (numCols  lastCol) {
+  colPtrs(lastCol + 1) = nnz
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, colPtrs.toArray, 
sortedEntries.map(_._1), values)
+  }
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a `SparseMatrix` with a given random number generator and 
`method`, which
+* specifies the distribution. */
+  private def genRandMatrix(
+  numRows: Int,
+  numCols: Int,
+  density: Double,
+  rng: Random,
+  method: Random = Double): SparseMatrix = {
+require(density = 0.0  density = 1.0, density must be a double in 
the range  +
+  s0.0 = d = 1.0. Currently, density: $density)
+val length = math.ceil(numRows * numCols * density).toInt
+val entries = MutableMap[(Int, Int), Double]()
+var i = 0
+if (density == 0.0) {
+  return new SparseMatrix(numRows, numCols, new Array[Int](numCols + 
1),
+Array[Int](), Array[Double]())
+} else if (density == 1.0) {
+  return new SparseMatrix(numRows, numCols, (0 to numRows * numCols by 
numRows).toArray,
+(0 until numRows * numCols).toArray, Array.fill(numRows * 
numCols)(method(rng)))
+}
+// Expected number of iterations is less than 1.5 * length
+if (density  0.34) {
+  while (i  length) {
+var rowIndex = rng.nextInt(numRows)
+var colIndex = rng.nextInt(numCols)
+while (entries.contains((rowIndex, colIndex))) {
+  rowIndex = rng.nextInt(numRows)
+  colIndex = rng.nextInt(numCols)
+}
+entries += (rowIndex, colIndex) - method(rng)
+i += 1
+  }
+} else { // selection - rejection method
+  var j = 0
+  val pool = numRows * numCols
+  // loop over columns so that the sort in fromCOO requires less 
sorting
+  while (i  length  j  numCols) {
+var passedInPool = j * numRows
+var r = 0
+while (i  length  r  numRows) {
+  if (rng.nextDouble()  1.0 * (length - i) / (pool - 
passedInPool)) {
+entries += (r, j) - method(rng)
+i += 1
+  }
+  r += 1
+  passedInPool += 1
+}
+j += 1
+  }
+}
+SparseMatrix.fromCOO(numRows, numCols, entries.toArray.map(v = 
(v._1._1, v._1._2, v._2)))
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. uniform random 
numbers. The number of non-zero
+   * elements equal the ceiling of `numRows` x `numCols` x `density`
+   *
  

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120303
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +335,167 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of (row, column, value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, 
Double)]): SparseMatrix = {
+val sortedEntries = entries.sortBy(v = (v._2, v._1))
+val colPtrs = new Array[Int](numCols + 1)
+var nnz = 0
+var lastCol = -1
+val values = sortedEntries.map { case (i, j, v) =
+  while (j != lastCol) {
+colPtrs(lastCol + 1) = nnz
+lastCol += 1
+  }
+  nnz += 1
+  v
+}
+while (numCols  lastCol) {
+  colPtrs(lastCol + 1) = nnz
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, colPtrs.toArray, 
sortedEntries.map(_._1), values)
+  }
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a `SparseMatrix` with a given random number generator and 
`method`, which
+* specifies the distribution. */
+  private def genRandMatrix(
+  numRows: Int,
+  numCols: Int,
+  density: Double,
+  rng: Random,
+  method: Random = Double): SparseMatrix = {
+require(density = 0.0  density = 1.0, density must be a double in 
the range  +
+  s0.0 = d = 1.0. Currently, density: $density)
+val length = math.ceil(numRows * numCols * density).toInt
+val entries = MutableMap[(Int, Int), Double]()
+var i = 0
+if (density == 0.0) {
+  return new SparseMatrix(numRows, numCols, new Array[Int](numCols + 
1),
+Array[Int](), Array[Double]())
+} else if (density == 1.0) {
+  return new SparseMatrix(numRows, numCols, (0 to numRows * numCols by 
numRows).toArray,
+(0 until numRows * numCols).toArray, Array.fill(numRows * 
numCols)(method(rng)))
+}
+// Expected number of iterations is less than 1.5 * length
+if (density  0.34) {
+  while (i  length) {
+var rowIndex = rng.nextInt(numRows)
+var colIndex = rng.nextInt(numCols)
+while (entries.contains((rowIndex, colIndex))) {
+  rowIndex = rng.nextInt(numRows)
+  colIndex = rng.nextInt(numCols)
+}
+entries += (rowIndex, colIndex) - method(rng)
+i += 1
+  }
+} else { // selection - rejection method
+  var j = 0
+  val pool = numRows * numCols
+  // loop over columns so that the sort in fromCOO requires less 
sorting
+  while (i  length  j  numCols) {
+var passedInPool = j * numRows
+var r = 0
+while (i  length  r  numRows) {
+  if (rng.nextDouble()  1.0 * (length - i) / (pool - 
passedInPool)) {
+entries += (r, j) - method(rng)
+i += 1
+  }
+  r += 1
+  passedInPool += 1
+}
+j += 1
+  }
+}
+SparseMatrix.fromCOO(numRows, numCols, entries.toArray.map(v = 
(v._1._1, v._1._2, v._2)))
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. uniform random 
numbers. The number of non-zero
+   * elements equal the ceiling of `numRows` x `numCols` x `density`
+   *
  

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120288
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +335,167 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of (row, column, value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, 
Double)]): SparseMatrix = {
+val sortedEntries = entries.sortBy(v = (v._2, v._1))
+val colPtrs = new Array[Int](numCols + 1)
+var nnz = 0
+var lastCol = -1
+val values = sortedEntries.map { case (i, j, v) =
+  while (j != lastCol) {
--- End diff --

We can add a check on `j =0`. If there is `j = -2` in the input, we get a 
dead loop.

It is a little hard to read if we put the `colPtrs` construction code 
inside the map. We are separate the code. There is no performance penalty.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120308
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +555,244 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120292
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +331,145 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples. Array must be sorted first by *column* 
index and then by row
+   * index.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of ((row, column), value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), 
Double)]): SparseMatrix = {
+val colPtrs = new ArrayBuffer[Int](numCols + 1)
+colPtrs.append(0)
+var nnz = 0
+var lastCol = 0
+val values = entries.map { case ((i, j), v) =
+  while (j != lastCol) {
+colPtrs.append(nnz)
+lastCol += 1
+if (lastCol  numCols) {
+  throw new IndexOutOfBoundsException(Please make sure that the 
entries array is  +
+sorted by COLUMN index first and then by row index.)
+}
+  }
+  nnz += 1
+  v
+}
+while (numCols  lastCol) {
+  colPtrs.append(nnz)
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, colPtrs.toArray, 
entries.map(_._1._1), values)
+  }
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a `SparseMatrix` with a given random number generator and 
`method`, which
+* specifies the distribution. */
+  private def genRandMatrix(
+  numRows: Int,
+  numCols: Int,
--- End diff --

It is harder to read this way. Also, if we generate the skeleton first, we 
don't need to handle dynamic values array. Having everything in one loop is not 
a goal, and it doesn't always give us performance gain.

~~~scala
def genRandPattern(numRows, numCols, density, rng): SparseMatrix {
  ...
}

def sprand(numRows, numCols, density, rng): SparseMatrix {
  val mat = genRandPattern(numRows, numCols, density, rng)
  // fill-in the values with rng.nextDouble
}
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120317
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +555,244 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120287
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +335,167 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of (row, column, value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, 
Double)]): SparseMatrix = {
--- End diff --

What's the behavior if there are duplicate coordinates in the input? This 
should be documented. In MATLAB's `sparse`, Any elements of s that have 
duplicate values of i and j are added together.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120299
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +335,167 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of (row, column, value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, 
Double)]): SparseMatrix = {
+val sortedEntries = entries.sortBy(v = (v._2, v._1))
+val colPtrs = new Array[Int](numCols + 1)
+var nnz = 0
+var lastCol = -1
+val values = sortedEntries.map { case (i, j, v) =
+  while (j != lastCol) {
+colPtrs(lastCol + 1) = nnz
+lastCol += 1
+  }
+  nnz += 1
+  v
+}
+while (numCols  lastCol) {
+  colPtrs(lastCol + 1) = nnz
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, colPtrs.toArray, 
sortedEntries.map(_._1), values)
+  }
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a `SparseMatrix` with a given random number generator and 
`method`, which
+* specifies the distribution. */
+  private def genRandMatrix(
+  numRows: Int,
+  numCols: Int,
+  density: Double,
+  rng: Random,
+  method: Random = Double): SparseMatrix = {
+require(density = 0.0  density = 1.0, density must be a double in 
the range  +
+  s0.0 = d = 1.0. Currently, density: $density)
+val length = math.ceil(numRows * numCols * density).toInt
+val entries = MutableMap[(Int, Int), Double]()
+var i = 0
+if (density == 0.0) {
+  return new SparseMatrix(numRows, numCols, new Array[Int](numCols + 
1),
+Array[Int](), Array[Double]())
+} else if (density == 1.0) {
+  return new SparseMatrix(numRows, numCols, (0 to numRows * numCols by 
numRows).toArray,
+(0 until numRows * numCols).toArray, Array.fill(numRows * 
numCols)(method(rng)))
+}
+// Expected number of iterations is less than 1.5 * length
+if (density  0.34) {
+  while (i  length) {
+var rowIndex = rng.nextInt(numRows)
+var colIndex = rng.nextInt(numCols)
+while (entries.contains((rowIndex, colIndex))) {
+  rowIndex = rng.nextInt(numRows)
+  colIndex = rng.nextInt(numCols)
+}
+entries += (rowIndex, colIndex) - method(rng)
+i += 1
+  }
+} else { // selection - rejection method
+  var j = 0
+  val pool = numRows * numCols
+  // loop over columns so that the sort in fromCOO requires less 
sorting
+  while (i  length  j  numCols) {
+var passedInPool = j * numRows
+var r = 0
+while (i  length  r  numRows) {
+  if (rng.nextDouble()  1.0 * (length - i) / (pool - 
passedInPool)) {
+entries += (r, j) - method(rng)
--- End diff --

Using a map here is not optimal. The sampled entries are ordered. We can 
construct `rowIndices` and `colPtrs` directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22120314
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +555,244 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67499071
  
  [Test build #24591 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24591/consoleFull)
 for   PR 3319 at commit 
[`75239f8`](https://github.com/apache/spark/commit/75239f8e5b41a275a0f232108b26cb0e16935bbf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67513626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24591/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67513619
  
  [Test build #24591 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24591/consoleFull)
 for   PR 3319 at commit 
[`75239f8`](https://github.com/apache/spark/commit/75239f8e5b41a275a0f232108b26cb0e16935bbf).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22063271
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +524,297 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
--- End diff --

I see your point. The reason we didn't return the exact type in `Vectors` 
and `Matrices` was because RDD is not covariant. But maybe we should return the 
exact types that and let algorithms take a generic `RDD[T]` with `T` extending 
`Vector`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070489
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +331,145 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples. Array must be sorted first by *column* 
index and then by row
+   * index.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of ((row, column), value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), 
Double)]): SparseMatrix = {
+val colPtrs = new ArrayBuffer[Int](numCols + 1)
+colPtrs.append(0)
+var nnz = 0
+var lastCol = 0
+val values = entries.map { case ((i, j), v) =
+  while (j != lastCol) {
+colPtrs.append(nnz)
+lastCol += 1
+if (lastCol  numCols) {
--- End diff --

minor: COO doesn't have this restriction. We should sort the input entries, 
which could be done in a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070477
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -123,6 +135,126 @@ class DenseMatrix(val numRows: Int, val numCols: Int, 
val values: Array[Double])
   }
 
   override def copy = new DenseMatrix(numRows, numCols, values.clone())
+
+  private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, 
numCols, values.map(f))
+
+  private[mllib] def update(f: Double = Double): DenseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `SparseMatrix` from the given `DenseMatrix`. */
+  def toSparse(): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer()
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
--- End diff --

If we already know the size, we don't need a buffer. Btw, it would be nice 
if we use the same naming as in `SparseVector` for consistency, `colPtrs` in 
this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070506
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +331,145 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples. Array must be sorted first by *column* 
index and then by row
+   * index.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of ((row, column), value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), 
Double)]): SparseMatrix = {
+val colPtrs = new ArrayBuffer[Int](numCols + 1)
+colPtrs.append(0)
+var nnz = 0
+var lastCol = 0
+val values = entries.map { case ((i, j), v) =
+  while (j != lastCol) {
+colPtrs.append(nnz)
+lastCol += 1
+if (lastCol  numCols) {
+  throw new IndexOutOfBoundsException(Please make sure that the 
entries array is  +
+sorted by COLUMN index first and then by row index.)
+}
+  }
+  nnz += 1
+  v
+}
+while (numCols  lastCol) {
+  colPtrs.append(nnz)
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, colPtrs.toArray, 
entries.map(_._1._1), values)
+  }
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a `SparseMatrix` with a given random number generator and 
`method`, which
+* specifies the distribution. */
+  private def genRandMatrix(
+  numRows: Int,
+  numCols: Int,
--- End diff --

minor: It may be cleaner if we just use this function to generate the 
skeleton and fill in values inside `sprandn` and `sprand`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070475
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -123,6 +135,126 @@ class DenseMatrix(val numRows: Int, val numCols: Int, 
val values: Array[Double])
   }
 
   override def copy = new DenseMatrix(numRows, numCols, values.clone())
+
+  private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, 
numCols, values.map(f))
+
+  private[mllib] def update(f: Double = Double): DenseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `SparseMatrix` from the given `DenseMatrix`. */
+  def toSparse(): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer()
--- End diff --

Please use `ArrayBuilder` instead because `ArrayBuffer` is not specialized 
for primitive types, unfortunately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070528
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +529,222 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070520
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +529,222 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070486
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +331,145 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples. Array must be sorted first by *column* 
index and then by row
+   * index.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of ((row, column), value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), 
Double)]): SparseMatrix = {
--- End diff --

`(Int, Int, Double)` should be sufficient and it saves one object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070478
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -123,6 +135,126 @@ class DenseMatrix(val numRows: Int, val numCols: Int, 
val values: Array[Double])
   }
 
   override def copy = new DenseMatrix(numRows, numCols, values.clone())
+
+  private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, 
numCols, values.map(f))
+
+  private[mllib] def update(f: Double = Double): DenseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `SparseMatrix` from the given `DenseMatrix`. */
+  def toSparse(): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer()
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
+val sRows: ArrayBuffer[Int] = new ArrayBuffer()
+var i = 0
+var nnz = 0
+var lastCol = -1
+values.foreach { v =
--- End diff --

Is it simpler to use two while loops with a counter?

~~~
var i = 0
var j = 0
var idx = 0
while (j  n) {
  while (i  m) {
if (values(idx) != 0) {
 ...
}
i += 1
idx += 1
  }
  ...
  j += 1
}
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070530
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +529,222 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070521
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +529,222 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070531
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +529,222 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070510
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +331,145 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+
+  /** Generate a `DenseMatrix` from the given `SparseMatrix`. */
+  def toDense(): DenseMatrix = {
+new DenseMatrix(numRows, numCols, toArray)
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate a `SparseMatrix` from Coordinate List (COO) format. Input 
must be an array of
+   * (row, column, value) tuples. Array must be sorted first by *column* 
index and then by row
+   * index.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param entries Array of ((row, column), value) tuples
+   * @return The corresponding `SparseMatrix`
+   */
+  def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), 
Double)]): SparseMatrix = {
+val colPtrs = new ArrayBuffer[Int](numCols + 1)
+colPtrs.append(0)
+var nnz = 0
+var lastCol = 0
+val values = entries.map { case ((i, j), v) =
+  while (j != lastCol) {
+colPtrs.append(nnz)
+lastCol += 1
+if (lastCol  numCols) {
+  throw new IndexOutOfBoundsException(Please make sure that the 
entries array is  +
+sorted by COLUMN index first and then by row index.)
+}
+  }
+  nnz += 1
+  v
+}
+while (numCols  lastCol) {
+  colPtrs.append(nnz)
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, colPtrs.toArray, 
entries.map(_._1._1), values)
+  }
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a `SparseMatrix` with a given random number generator and 
`method`, which
+* specifies the distribution. */
+  private def genRandMatrix(
+  numRows: Int,
+  numCols: Int,
+  density: Double,
+  rng: Random,
+  method: Random = Double): SparseMatrix = {
+require(density = 0.0  density = 1.0, density must be a double in 
the range  +
+  s0.0 = d = 1.0. Currently, density: $density)
+val length = math.ceil(numRows * numCols * density).toInt
+val entries = Map[(Int, Int), Double]()
+var i = 0
+while (i  length) {
+  var rowIndex = rng.nextInt(numRows)
+  var colIndex = rng.nextInt(numCols)
+  while (entries.contains((rowIndex, colIndex))) {
--- End diff --

If `density` is close to `1`, it is hard to end this while loop. We can 
combine this approach with selection-rejection to achieve `O(nnz)` complexity:


https://github.com/mengxr/spark-sampling/blob/master/src/main/scala/org/apache/spark/sampling/RDDSamplingFunctions.scala#L98


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22070523
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +529,222 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67299603
  
  [Test build #24541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24541/consoleFull)
 for   PR 3319 at commit 
[`e4bd0c0`](https://github.com/apache/spark/commit/e4bd0c02df49b07ed0ee3687c3ac8e44868c857a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67308668
  
  [Test build #24541 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24541/consoleFull)
 for   PR 3319 at commit 
[`e4bd0c0`](https://github.com/apache/spark/commit/e4bd0c02df49b07ed0ee3687c3ac8e44868c857a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67308677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24541/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010770
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +300,171 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a SparseMatrix given an Array[Double] of size numRows * 
numCols. The number of
+* non-zeros in `raw` is provided for efficiency. */
+  private def genRand(
+  numRows: Int,
+  numCols: Int,
+  raw: Array[Double],
+  nonZero: Int): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
+val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
+
+var i = 0
+var nnz = 0
+var lastCol = -1
+raw.foreach { v =
+  val r = i % numRows
+  val c = (i - r) / numRows
+  if (v != 0.0) {
+sRows.append(r)
+sparseA.append(v)
+while (c != lastCol) {
+  sCols.append(nnz)
+  lastCol += 1
+}
+nnz += 1
+  }
+  i += 1
+}
+while (numCols  lastCol) {
+  sCols.append(sparseA.length)
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, 
sparseA.toArray)
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `SparseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
SparseMatrix = {
+require(density = 0.0  density = 1.0, density must be a double in 
the range  +
+  s0.0 = d = 1.0. Currently, density: $density)
+val length = numRows * numCols
+val rawA = new Array[Double](length)
+var nnz = 0
+for (i - 0 until length) {
--- End diff --

`sprand` is not generated this way, which has `O(m * n)` complexity. Please 
check MATLAB's implementation of octave's.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010793
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +524,297 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010805
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +524,297 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010780
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +300,171 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a SparseMatrix given an Array[Double] of size numRows * 
numCols. The number of
+* non-zeros in `raw` is provided for efficiency. */
+  private def genRand(
+  numRows: Int,
+  numCols: Int,
+  raw: Array[Double],
+  nonZero: Int): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
+val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
+
+var i = 0
+var nnz = 0
+var lastCol = -1
+raw.foreach { v =
+  val r = i % numRows
+  val c = (i - r) / numRows
+  if (v != 0.0) {
+sRows.append(r)
+sparseA.append(v)
+while (c != lastCol) {
+  sCols.append(nnz)
+  lastCol += 1
+}
+nnz += 1
+  }
+  i += 1
+}
+while (numCols  lastCol) {
+  sCols.append(sparseA.length)
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, 
sparseA.toArray)
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `SparseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
SparseMatrix = {
+require(density = 0.0  density = 1.0, density must be a double in 
the range  +
+  s0.0 = d = 1.0. Currently, density: $density)
+val length = numRows * numCols
+val rawA = new Array[Double](length)
+var nnz = 0
+for (i - 0 until length) {
+  val p = rng.nextDouble()
+  if (p = density) {
+rawA.update(i, rng.nextDouble())
+nnz += 1
+  }
+}
+genRand(numRows, numCols, rawA, nnz)
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `SparseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
SparseMatrix = {
+require(density = 0.0  density = 1.0, density must be a double in 
the range  +
+  s0.0 = d = 1.0. Currently, density: $density)
+val length = numRows * numCols
+val rawA = new Array[Double](length)
+var nnz = 0
+for (i - 0 until length) {
+  val p = rng.nextDouble()
+  if (p = density) {
+rawA.update(i, rng.nextGaussian())
+nnz += 1
+  }
+}
+genRand(numRows, numCols, rawA, nnz)
+  }
+
+  /**
+   * Generate a diagonal matrix in `SparseMatrix` format from the supplied 
values.
+   * @param vector a `Vector` that will form the values on the diagonal of 
the matrix
+   * @return Square `SparseMatrix` with size `values.length` x 
`values.length` and non-zero
+   * `values` on the diagonal
+   */
+  def diag(vector: Vector): SparseMatrix = {
+val n = vector.size
+vector match {
+  case sVec: SparseVector =
+val rows = sVec.indices
   

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010832
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +524,297 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010786
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +524,297 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
--- End diff --

It is nice to put all operators under `Matrices`. Then maybe we can mark 
the ones under `SparkMatrix` and `DenseMatrix` private.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010764
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -123,6 +135,97 @@ class DenseMatrix(val numRows: Int, val numCols: Int, 
val values: Array[Double])
   }
 
   override def copy = new DenseMatrix(numRows, numCols, values.clone())
+
+  private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, 
numCols, values.map(f))
+
+  private[mllib] def update(f: Double = Double): DenseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.DenseMatrix]].
+ */
+object DenseMatrix {
+
+  /**
+   * Generate a `DenseMatrix` consisting of zeros.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   */
+  def zeros(numRows: Int, numCols: Int): DenseMatrix =
+new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+
+  /**
+   * Generate a `DenseMatrix` consisting of ones.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   */
+  def ones(numRows: Int, numCols: Int): DenseMatrix =
+new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+
+  /**
+   * Generate an Identity Matrix in `DenseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def eye(n: Int): DenseMatrix = {
+val identity = DenseMatrix.zeros(n, n)
+var i = 0
+while (i  n) {
+  identity.update(i, i, 1.0)
+  i += 1
+}
+identity
+  }
+
+  /**
+   * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param rng a random number generator
+   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   */
+  def rand(numRows: Int, numCols: Int, rng: Random): DenseMatrix = {
+new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
+  }
+
+  /**
+   * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param rng a random number generator
+   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   */
+  def randn(numRows: Int, numCols: Int, rng: Random): DenseMatrix = {
+new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
+  }
+
+  /**
+   * Generate a diagonal matrix in `DenseMatrix` format from the supplied 
values.
+   * @param vector a `Vector` that will form the values on the diagonal of 
the matrix
+   * @return Square `DenseMatrix` with size `values.length` x 
`values.length` and `values`
+   * on the diagonal
+   */
+  def diag(vector: Vector): DenseMatrix = {
+val n = vector.size
+val matrix = DenseMatrix.eye(n)
--- End diff --

`eye(n)` - `zeros(n, n)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010767
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +300,171 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a SparseMatrix given an Array[Double] of size numRows * 
numCols. The number of
+* non-zeros in `raw` is provided for efficiency. */
+  private def genRand(
--- End diff --

This is a little confusing. First of all, there is no randomness. Secondly, 
the doc doesn't describe how the values gets fill-in. Is it supposed to be a 
method in `DenseMatrix` called `toSparse`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010776
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +300,171 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a SparseMatrix given an Array[Double] of size numRows * 
numCols. The number of
+* non-zeros in `raw` is provided for efficiency. */
+  private def genRand(
+  numRows: Int,
+  numCols: Int,
+  raw: Array[Double],
+  nonZero: Int): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
+val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
+
+var i = 0
+var nnz = 0
+var lastCol = -1
+raw.foreach { v =
+  val r = i % numRows
+  val c = (i - r) / numRows
+  if (v != 0.0) {
+sRows.append(r)
+sparseA.append(v)
+while (c != lastCol) {
+  sCols.append(nnz)
+  lastCol += 1
+}
+nnz += 1
+  }
+  i += 1
+}
+while (numCols  lastCol) {
+  sCols.append(sparseA.length)
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, 
sparseA.toArray)
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `SparseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
SparseMatrix = {
+require(density = 0.0  density = 1.0, density must be a double in 
the range  +
+  s0.0 = d = 1.0. Currently, density: $density)
+val length = numRows * numCols
+val rawA = new Array[Double](length)
+var nnz = 0
+for (i - 0 until length) {
+  val p = rng.nextDouble()
+  if (p = density) {
+rawA.update(i, rng.nextDouble())
+nnz += 1
+  }
+}
+genRand(numRows, numCols, rawA, nnz)
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `SparseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
SparseMatrix = {
+require(density = 0.0  density = 1.0, density must be a double in 
the range  +
+  s0.0 = d = 1.0. Currently, density: $density)
+val length = numRows * numCols
+val rawA = new Array[Double](length)
+var nnz = 0
+for (i - 0 until length) {
--- End diff --

Ditto. `O(m * n)` is too expensive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010799
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +524,297 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22010808
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +524,297 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
 
   /**
* Generate a `DenseMatrix` consisting of ones.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
ones
+   * @return `Matrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, 
numCols)
 
   /**
-   * Generate an Identity Matrix in `DenseMatrix` format.
+   * Generate a dense Identity Matrix in `Matrix` format.
* @param n number of rows and columns of the matrix
-   * @return `DenseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
*/
-  def eye(n: Int): Matrix = {
-val identity = Matrices.zeros(n, n)
-var i = 0
-while (i  n){
-  identity.update(i, i, 1.0)
-  i += 1
-}
-identity
-  }
+  def eye(n: Int): Matrix = DenseMatrix.eye(n)
+
+  /**
+   * Generate a sparse Identity Matrix in `Matrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `Matrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): Matrix = SparseMatrix.speye(n)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
*/
-  def rand(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
-  }
+  def rand(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.rand(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprand(numRows, numCols, density, rng)
 
   /**
* Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param rng a random number generator
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
*/
-  def randn(numRows: Int, numCols: Int, rng: Random): Matrix = {
-new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
-  }
+  def randn(numRows: Int, numCols: Int, rng: Random): Matrix =
+DenseMatrix.randn(numRows, numCols, rng)
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
Matrix =
+SparseMatrix.sprandn(numRows, numCols, density, rng)
 
   /**
* Generate a diagonal matrix in `DenseMatrix` format 

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-17 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r22026283
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -256,72 +524,297 @@ object Matrices {
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
-   * @return `DenseMatrix` with size `numRows` x `numCols` and values of 
zeros
+   * @return `Matrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): Matrix =
-new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  def zeros(numRows: Int, numCols: Int): Matrix = 
DenseMatrix.zeros(numRows, numCols)
--- End diff --

I specifically don't want to mark them private, otherwise the user will 
have to always write `.asInstanceOf[SparseMatrix]`. We could mark it 
`private[mllib]` and still use them, but not having `.asInstanceOf` everywhere, 
especially while writing tests on spark-shell is a very nice convenience.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67129497
  
  [Test build #24492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24492/consoleFull)
 for   PR 3319 at commit 
[`065b531`](https://github.com/apache/spark/commit/065b53181349fa0cc56d4828044b1d564791ea80).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67130002
  
  [Test build #24493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24493/consoleFull)
 for   PR 3319 at commit 
[`d8be7bc`](https://github.com/apache/spark/commit/d8be7bc07b23982c4fced647f85982c6b7cadd4b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67136033
  
  [Test build #24492 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24492/consoleFull)
 for   PR 3319 at commit 
[`065b531`](https://github.com/apache/spark/commit/065b53181349fa0cc56d4828044b1d564791ea80).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67136043
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24492/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67136740
  
  [Test build #24493 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24493/consoleFull)
 for   PR 3319 at commit 
[`d8be7bc`](https://github.com/apache/spark/commit/d8be7bc07b23982c4fced647f85982c6b7cadd4b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67136754
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24493/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67141113
  
  [Test build #24497 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24497/consoleFull)
 for   PR 3319 at commit 
[`65c562e`](https://github.com/apache/spark/commit/65c562e57078ccb31de281b238a9348dd9a1f7c2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-67149415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24497/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-16 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21925528
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -313,15 +593,145 @@ object Matrices {
* @return Square `DenseMatrix` with size `values.length` x 
`values.length` and `values`
* on the diagonal
*/
-  def diag(vector: Vector): Matrix = {
-val n = vector.size
-val matrix = Matrices.eye(n)
-val values = vector.toArray
-var i = 0
-while (i  n) {
-  matrix.update(i, i, values(i))
-  i += 1
+  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
+
+  /**
+   * Horizontally concatenate a sequence of matrices. The returned matrix 
will be in the format
+   * the matrices are supplied in. Supplying a mix of dense and sparse 
matrices will result in
+   * a dense matrix.
--- End diff --

I like the MATLAB approach better. Usually a sparse matrix is very sparse, 
while a dense component is quite small, for example,

~~~
A^T A  A^T
A  I
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-16 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21929684
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -313,15 +593,145 @@ object Matrices {
* @return Square `DenseMatrix` with size `values.length` x 
`values.length` and `values`
* on the diagonal
*/
-  def diag(vector: Vector): Matrix = {
-val n = vector.size
-val matrix = Matrices.eye(n)
-val values = vector.toArray
-var i = 0
-while (i  n) {
-  matrix.update(i, i, values(i))
-  i += 1
+  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
+
+  /**
+   * Horizontally concatenate a sequence of matrices. The returned matrix 
will be in the format
+   * the matrices are supplied in. Supplying a mix of dense and sparse 
matrices will result in
+   * a dense matrix.
--- End diff --

Okay, will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21866315
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -80,6 +81,12 @@ sealed trait Matrix extends Serializable {
 
   /** A human readable representation of the matrix */
   override def toString: String = toBreeze.toString()
+
+  /** Map the values of this matrix using a function. Generates a new 
matrix. */
+  private[mllib] def map(f: Double = Double): Matrix
+
+  /** Update all the values of this matrix using the function f. Performed 
in-place. */
+  private[mllib] def update(f: Double = Double): Matrix
--- End diff --

Ditto. What happens when there are non-presenting zero values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21866309
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -17,10 +17,11 @@
 
 package org.apache.spark.mllib.linalg
 
-import java.util.{Random, Arrays}
-
 import breeze.linalg.{Matrix = BM, DenseMatrix = BDM, CSCMatrix = BSM}
 
+import java.util.{Random, Arrays}
+import scala.collection.mutable.ArrayBuffer
--- End diff --

organize imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21866327
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +295,171 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a SparseMatrix given an Array[Double] of size numRows * 
numCols. The number of
+* non-zeros in `raw` is provided for efficiency. */
+  private def genRand(
+  numRows: Int,
+  numCols: Int,
+  raw: Array[Double],
+  nonZero: Int): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
+val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
+
+var i = 0
+var nnz = 0
+var lastCol = -1
+raw.foreach { v =
+  val r = i % numRows
+  val c = (i - r) / numRows
+  if ( v != 0.0) {
+sRows.append(r)
+sparseA.append(v)
+while (c != lastCol){
+  sCols.append(nnz)
+  lastCol += 1
+}
+nnz += 1
+  }
+  i += 1
+}
+while (numCols  lastCol){
--- End diff --

space before `{` (and please fix other places)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21866336
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +295,171 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a SparseMatrix given an Array[Double] of size numRows * 
numCols. The number of
+* non-zeros in `raw` is provided for efficiency. */
+  private def genRand(
+  numRows: Int,
+  numCols: Int,
+  raw: Array[Double],
+  nonZero: Int): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
+val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
+
+var i = 0
+var nnz = 0
+var lastCol = -1
+raw.foreach { v =
+  val r = i % numRows
+  val c = (i - r) / numRows
+  if ( v != 0.0) {
+sRows.append(r)
+sparseA.append(v)
+while (c != lastCol){
+  sCols.append(nnz)
+  lastCol += 1
+}
+nnz += 1
+  }
+  i += 1
+}
+while (numCols  lastCol){
+  sCols.append(sparseA.length)
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, 
sparseA.toArray)
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `SparseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
SparseMatrix = {
+require(density  0.0  density  1.0, density must be a double in 
the range  +
+  s0.0  d  1.0. Currently, density: $density)
+val length = numRows * numCols
+val rawA = new Array[Double](length)
+var nnz = 0
+for (i - 0 until length) {
+  val p = rng.nextDouble()
+  if (p = density) {
+rawA.update(i, rng.nextDouble())
+nnz += 1
+  }
+}
+genRand(numRows, numCols, rawA, nnz)
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `SparseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
SparseMatrix = {
+require(density  0.0  density  1.0, density must be a double in 
the range  +
+  s0.0  d  1.0. Currently, density: $density)
+val length = numRows * numCols
+val rawA = new Array[Double](length)
+var nnz = 0
+for (i - 0 until length) {
+  val p = rng.nextDouble()
+  if (p = density) {
+rawA.update(i, rng.nextGaussian())
+nnz += 1
+  }
+}
+genRand(numRows, numCols, rawA, nnz)
+  }
+
+  /**
+   * Generate a diagonal matrix in `DenseMatrix` format from the supplied 
values.
--- End diff --

`DenseMatrix`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21866331
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +295,171 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a SparseMatrix given an Array[Double] of size numRows * 
numCols. The number of
+* non-zeros in `raw` is provided for efficiency. */
+  private def genRand(
+  numRows: Int,
+  numCols: Int,
+  raw: Array[Double],
+  nonZero: Int): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
+val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
+
+var i = 0
+var nnz = 0
+var lastCol = -1
+raw.foreach { v =
+  val r = i % numRows
+  val c = (i - r) / numRows
+  if ( v != 0.0) {
+sRows.append(r)
+sparseA.append(v)
+while (c != lastCol){
+  sCols.append(nnz)
+  lastCol += 1
+}
+nnz += 1
+  }
+  i += 1
+}
+while (numCols  lastCol){
+  sCols.append(sparseA.length)
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, 
sparseA.toArray)
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `SparseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
SparseMatrix = {
+require(density  0.0  density  1.0, density must be a double in 
the range  +
+  s0.0  d  1.0. Currently, density: $density)
--- End diff --

`density = 0.0` and `density = 1.0` should be valid.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21866323
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +295,171 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a SparseMatrix given an Array[Double] of size numRows * 
numCols. The number of
+* non-zeros in `raw` is provided for efficiency. */
+  private def genRand(
+  numRows: Int,
+  numCols: Int,
+  raw: Array[Double],
+  nonZero: Int): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
+val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
+
+var i = 0
+var nnz = 0
+var lastCol = -1
+raw.foreach { v =
+  val r = i % numRows
+  val c = (i - r) / numRows
+  if ( v != 0.0) {
--- End diff --

Is it handled inside the constructor of `SparseMatrix`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21866346
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -313,15 +593,145 @@ object Matrices {
* @return Square `DenseMatrix` with size `values.length` x 
`values.length` and `values`
* on the diagonal
*/
-  def diag(vector: Vector): Matrix = {
-val n = vector.size
-val matrix = Matrices.eye(n)
-val values = vector.toArray
-var i = 0
-while (i  n) {
-  matrix.update(i, i, values(i))
-  i += 1
+  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
+
+  /**
+   * Horizontally concatenate a sequence of matrices. The returned matrix 
will be in the format
+   * the matrices are supplied in. Supplying a mix of dense and sparse 
matrices will result in
+   * a dense matrix.
--- End diff --

Is it the same behavior as in MATLAB? (Sorry I don't have MATLAB installed.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21866313
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -80,6 +81,12 @@ sealed trait Matrix extends Serializable {
 
   /** A human readable representation of the matrix */
   override def toString: String = toBreeze.toString()
+
+  /** Map the values of this matrix using a function. Generates a new 
matrix. */
--- End diff --

Should comment on the behavior for sparse matrices, for example, `map(_+ 
1)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21881450
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -313,15 +593,145 @@ object Matrices {
* @return Square `DenseMatrix` with size `values.length` x 
`values.length` and `values`
* on the diagonal
*/
-  def diag(vector: Vector): Matrix = {
-val n = vector.size
-val matrix = Matrices.eye(n)
-val values = vector.toArray
-var i = 0
-while (i  n) {
-  matrix.update(i, i, values(i))
-  i += 1
+  def diag(vector: Vector): Matrix = DenseMatrix.diag(vector)
+
+  /**
+   * Horizontally concatenate a sequence of matrices. The returned matrix 
will be in the format
+   * the matrices are supplied in. Supplying a mix of dense and sparse 
matrices will result in
+   * a dense matrix.
--- End diff --

MATLAB does it the other way around. If one matrix is sparse, then the 
final matrix turns out to be sparse as well.
That's why I added the note. Should I make it consistent with MATLAB?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21881510
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +295,171 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a SparseMatrix given an Array[Double] of size numRows * 
numCols. The number of
+* non-zeros in `raw` is provided for efficiency. */
+  private def genRand(
+  numRows: Int,
+  numCols: Int,
+  raw: Array[Double],
+  nonZero: Int): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
+val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
+
+var i = 0
+var nnz = 0
+var lastCol = -1
+raw.foreach { v =
+  val r = i % numRows
+  val c = (i - r) / numRows
+  if ( v != 0.0) {
+sRows.append(r)
+sparseA.append(v)
+while (c != lastCol){
+  sCols.append(nnz)
+  lastCol += 1
+}
+nnz += 1
+  }
+  i += 1
+}
+while (numCols  lastCol){
+  sCols.append(sparseA.length)
+  lastCol += 1
+}
+new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, 
sparseA.toArray)
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `SparseMatrix` with size `numRows` x `numCols` and values in 
U(0, 1)
+   */
+  def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): 
SparseMatrix = {
+require(density  0.0  density  1.0, density must be a double in 
the range  +
+  s0.0  d  1.0. Currently, density: $density)
+val length = numRows * numCols
+val rawA = new Array[Double](length)
+var nnz = 0
+for (i - 0 until length) {
+  val p = rng.nextDouble()
+  if (p = density) {
+rawA.update(i, rng.nextDouble())
+nnz += 1
+  }
+}
+genRand(numRows, numCols, rawA, nnz)
+  }
+
+  /**
+   * Generate a `SparseMatrix` consisting of i.i.d. gaussian random 
numbers.
+   * @param numRows number of rows of the matrix
+   * @param numCols number of columns of the matrix
+   * @param density the desired density for the matrix
+   * @param rng a random number generator
+   * @return `SparseMatrix` with size `numRows` x `numCols` and values in 
N(0, 1)
+   */
+  def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): 
SparseMatrix = {
+require(density  0.0  density  1.0, density must be a double in 
the range  +
+  s0.0  d  1.0. Currently, density: $density)
+val length = numRows * numCols
+val rawA = new Array[Double](length)
+var nnz = 0
+for (i - 0 until length) {
+  val p = rng.nextDouble()
+  if (p = density) {
+rawA.update(i, rng.nextGaussian())
+nnz += 1
+  }
+}
+genRand(numRows, numCols, rawA, nnz)
+  }
+
+  /**
+   * Generate a diagonal matrix in `DenseMatrix` format from the supplied 
values.
--- End diff --

Good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-12-15 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3319#discussion_r21881678
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -197,6 +295,171 @@ class SparseMatrix(
   }
 
   override def copy = new SparseMatrix(numRows, numCols, colPtrs, 
rowIndices, values.clone())
+
+  private[mllib] def map(f: Double = Double) =
+new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f))
+
+  private[mllib] def update(f: Double = Double): SparseMatrix = {
+val len = values.length
+var i = 0
+while (i  len) {
+  values(i) = f(values(i))
+  i += 1
+}
+this
+  }
+}
+
+/**
+ * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]].
+ */
+object SparseMatrix {
+
+  /**
+   * Generate an Identity Matrix in `SparseMatrix` format.
+   * @param n number of rows and columns of the matrix
+   * @return `SparseMatrix` with size `n` x `n` and values of ones on the 
diagonal
+   */
+  def speye(n: Int): SparseMatrix = {
+new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, 
Array.fill(n)(1.0))
+  }
+
+  /** Generates a SparseMatrix given an Array[Double] of size numRows * 
numCols. The number of
+* non-zeros in `raw` is provided for efficiency. */
+  private def genRand(
+  numRows: Int,
+  numCols: Int,
+  raw: Array[Double],
+  nonZero: Int): SparseMatrix = {
+val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero)
+val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1)
+val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero)
+
+var i = 0
+var nnz = 0
+var lastCol = -1
+raw.foreach { v =
+  val r = i % numRows
+  val c = (i - r) / numRows
+  if ( v != 0.0) {
--- End diff --

Right now, it's not. Currently users can supply zero values during the 
construction of SparseMatrix. Two things:
1) Should I add a check in the constructor of SparseMatrix?
2) Should I transform genRand into something like .toSparse() inside 
DenseMatrix, and add a .toDense() inside SparseMatrix? (I actually had these 
two methods in my multi model training repo)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-64681676
  
  [Test build #23900 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23900/consoleFull)
 for   PR 3319 at commit 
[`a8120d2`](https://github.com/apache/spark/commit/a8120d2a83720b621b36942add3a98aa4b96bcc3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-64693996
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23900/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-64693986
  
  [Test build #23900 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23900/consoleFull)
 for   PR 3319 at commit 
[`a8120d2`](https://github.com/apache/spark/commit/a8120d2a83720b621b36942add3a98aa4b96bcc3).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-26 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-64706309
  
@brkyvz I didn't know MATLAB has `horzcat` and `vertcat` along with `[A, 
B]` or `[A; B]`. I'm okay with adapting method names from MATLAB. Hope there is 
no copyright issues. (I don't see any special statement from Octave.)

If we want to use MATLAB operators, maybe we should also stick to lowercase 
method names.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-26 Thread brkyvz
Github user brkyvz commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-64713893
  
I checked MATLAB's webpage, I didn't see any copyright mentions for the 
method names. It's best to triple check though. Since numPy and sciPy share 
method names with MATLAB, I don't expect there to be problems.
with the last commit I made the method names lowercase :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-64504931
  
  [Test build #23863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23863/consoleFull)
 for   PR 3319 at commit 
[`c75f3cd`](https://github.com/apache/spark/commit/c75f3cdec438042c10e31009dee87a14fdce4053).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-25 Thread brkyvz
Github user brkyvz commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-64505065
  
@mengxr:
Thanks for the feedback. Added the Java tests!
horzcat and vertcat are in fact MATLAB methods:
http://www.mathworks.com/help/matlab/ref/horzcat.html
http://www.mathworks.com/help/matlab/ref/vertcat.html
They are the underlying methods that are called when someone writes 
`A = [A1 A2; A3 A4];`
I felt the naming was more intuitive as it is like `strcat`, because you 
are concatenating matrices either
horizontally or vertically. I'd be happy to change them to `hstack` and 
`vstack`, but horzcat sounds more intuitive to me (maybe I'm biased, because I 
used to use it more).
Your call :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-64510456
  
  [Test build #23863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23863/consoleFull)
 for   PR 3319 at commit 
[`c75f3cd`](https://github.com/apache/spark/commit/c75f3cdec438042c10e31009dee87a14fdce4053).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-64510461
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23863/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-24 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-64310622
  
@brkyvz Two comments on the API:

1) For the APIs we provide, could you add a JAVA test suite and verify that 
all methods work in Java.
2) `horzCat` and `vertCat` are not MATLAB operators, nor NumPy's. Maybe we 
should rename them to `hstack` and `vstack`, which are at least known by NumPy 
users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-17 Thread brkyvz
GitHub user brkyvz opened a pull request:

https://github.com/apache/spark/pull/3319

[SPARK-4409][MLlib] Additional Linear Algebra Utils

Addition of a very limited number of local matrix manipulation and 
generation methods that would be helpful in the further development for 
algorithms on top of BlockMatrix (SPARK-3974), such as Randomized SVD, and 
Multi Model Training (SPARK-1486).
The proposed methods for addition are:

For `Matrix`
 - map: maps the values in the matrix with a given function. Produces a new 
matrix.
 - update: the values in the matrix are updated with a given function. 
Occurs in place.

Factory methods for `DenseMatrix`:
 - *zeros: Generate a matrix consisting of zeros
 - *ones: Generate a matrix consisting of ones
 - *eye: Generate an identity matrix
 - *rand: Generate a matrix consisting of i.i.d. uniform random numbers
 - *randn: Generate a matrix consisting of i.i.d. gaussian random numbers
 - *diag: Generate a diagonal matrix from a supplied vector
*These methods already exist in the factory methods for `Matrices`, however 
for cases where we require a `DenseMatrix`, you constantly have to add 
`.asInstanceOf[DenseMatrix]` everywhere, which makes the code dirtier. I 
propose moving these functions to factory methods for `DenseMatrix` where the 
putput will be a `DenseMatrix` and the factory methods for `Matrices` will call 
these functions directly and output a generic `Matrix`.

Factory methods for `SparseMatrix`:
 - speye: Identity matrix in sparse format. Saves a ton of memory when 
dimensions are large, especially in Multi Model Training, where each row 
requires being multiplied by a scalar.
 - sprand: Generate a sparse matrix with a given density consisting of 
i.i.d. uniform random numbers.
 - sprandn: Generate a sparse matrix with a given density consisting of 
i.i.d. gaussian random numbers.
 - diag: Generate a diagonal matrix from a supplied vector, but is memory 
efficient, because it just stores the diagonal. Again, very helpful in Multi 
Model Training.

Factory methods for `Matrices`:
 - Include all the factory methods given above, but return a generic 
`Matrix` rather than `SparseMatrix` or `DenseMatrix`.
 - horzCat: Horizontally concatenate matrices to form one larger matrix. 
Very useful in both Multi Model Training, and for the repartitioning of 
BlockMatrix.
 - vertCat: Vertically concatenate matrices to form one larger matrix. Very 
useful for the repartitioning of BlockMatrix.

The names for these methods were selected from MATLAB

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brkyvz/spark SPARK-4409

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3319


commit a14c0da0360b4202a2db787b85ce631562014f0d
Author: Burak Yavuz brk...@gmail.com
Date:   2014-11-17T09:33:36Z

[SPARK-4409] Initial commit to add methods




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-63358380
  
  [Test build #23485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23485/consoleFull)
 for   PR 3319 at commit 
[`94d7ae9`](https://github.com/apache/spark/commit/94d7ae977858ba4d785429df5a324012a438bc80).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-63358572
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23485/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-63358570
  
  [Test build #23485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23485/consoleFull)
 for   PR 3319 at commit 
[`94d7ae9`](https://github.com/apache/spark/commit/94d7ae977858ba4d785429df5a324012a438bc80).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-63365041
  
  [Test build #23492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23492/consoleFull)
 for   PR 3319 at commit 
[`d662f9d`](https://github.com/apache/spark/commit/d662f9d963c21aca720bab87a8279a938e1d924e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

2014-11-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-63378128
  
  [Test build #23492 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23492/consoleFull)
 for   PR 3319 at commit 
[`d662f9d`](https://github.com/apache/spark/commit/d662f9d963c21aca720bab87a8279a938e1d924e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org