[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-68304656 LGTM. Merged into master. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3319 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-68036616 [Test build #24774 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24774/consoleFull) for PR 3319 at commit [`04c4829`](https://github.com/apache/spark/commit/04c4829d8364a36314485d6bdceed5ab93c67398). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-68036621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24774/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-68038204 [Test build #24775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24775/consoleFull) for PR 3319 at commit [`b0354f6`](https://github.com/apache/spark/commit/b0354f616f7f49ee9b19f6b8e5d0dc775b05dba2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-68038211 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24775/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-68033235 [Test build #24774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24774/consoleFull) for PR 3319 at commit [`04c4829`](https://github.com/apache/spark/commit/04c4829d8364a36314485d6bdceed5ab93c67398). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-68033509 [Test build #24775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24775/consoleFull) for PR 3319 at commit [`b0354f6`](https://github.com/apache/spark/commit/b0354f616f7f49ee9b19f6b8e5d0dc775b05dba2). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67729483 [Test build #24668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24668/consoleFull) for PR 3319 at commit [`10a63a6`](https://github.com/apache/spark/commit/10a63a6e6b583e6e79ced58ea9b73937656f5a24). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67731189 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24668/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67731185 [Test build #24668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24668/consoleFull) for PR 3319 at commit [`10a63a6`](https://github.com/apache/spark/commit/10a63a6e6b583e6e79ced58ea9b73937656f5a24). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22094195 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +331,145 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. Array must be sorted first by *column* index and then by row + * index. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of ((row, column), value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), Double)]): SparseMatrix = { +val colPtrs = new ArrayBuffer[Int](numCols + 1) +colPtrs.append(0) +var nnz = 0 +var lastCol = 0 +val values = entries.map { case ((i, j), v) = + while (j != lastCol) { +colPtrs.append(nnz) +lastCol += 1 +if (lastCol numCols) { + throw new IndexOutOfBoundsException(Please make sure that the entries array is + +sorted by COLUMN index first and then by row index.) +} + } + nnz += 1 + v +} +while (numCols lastCol) { + colPtrs.append(nnz) + lastCol += 1 +} +new SparseMatrix(numRows, numCols, colPtrs.toArray, entries.map(_._1._1), values) + } + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a `SparseMatrix` with a given random number generator and `method`, which +* specifies the distribution. */ + private def genRandMatrix( + numRows: Int, + numCols: Int, --- End diff -- That's possible. This is how MATLAB does it, calls rand and fills it according to sprand or sprandn. I thought it looked more functional this way, get over with it with just one loop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67617761 [Test build #24637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24637/consoleFull) for PR 3319 at commit [`3971c93`](https://github.com/apache/spark/commit/3971c931d18dfaea0ea66e0cbb19b61dbf310a66). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67623680 [Test build #24639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24639/consoleFull) for PR 3319 at commit [`f62d6c7`](https://github.com/apache/spark/commit/f62d6c795c6293817df195369e2873eb97b11a0e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67626010 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24637/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67626002 [Test build #24637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24637/consoleFull) for PR 3319 at commit [`3971c93`](https://github.com/apache/spark/commit/3971c931d18dfaea0ea66e0cbb19b61dbf310a66). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67631049 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24639/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67631046 [Test build #24639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24639/consoleFull) for PR 3319 at commit [`f62d6c7`](https://github.com/apache/spark/commit/f62d6c795c6293817df195369e2873eb97b11a0e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120275 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -123,6 +135,130 @@ class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) } override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, numCols, values.map(f)) + + private[mllib] def update(f: Double = Double): DenseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `SparseMatrix` from the given `DenseMatrix`. */ + def toSparse(): SparseMatrix = { +val spVals: ArrayBuilder[Double] = new ArrayBuilder.ofDouble +val colPtrs: Array[Int] = new Array[Int](numCols + 1) +val rowIndices: ArrayBuilder[Int] = new ArrayBuilder.ofInt +var nnz = 0 +var lastCol = -1 +var j = 0 +while (j numCols) { + var i = 0 + val indStart = j * numRows + while (i numRows) { +val v = values(indStart + i) +if (v != 0.0) { + rowIndices += i + spVals += v + while (j != lastCol) { --- End diff -- It iterates over a full 2-d grid. We can update `colPtrs` after each `while (i numRows)` block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120278 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -123,6 +135,130 @@ class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) } override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, numCols, values.map(f)) + + private[mllib] def update(f: Double = Double): DenseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `SparseMatrix` from the given `DenseMatrix`. */ + def toSparse(): SparseMatrix = { +val spVals: ArrayBuilder[Double] = new ArrayBuilder.ofDouble +val colPtrs: Array[Int] = new Array[Int](numCols + 1) +val rowIndices: ArrayBuilder[Int] = new ArrayBuilder.ofInt +var nnz = 0 +var lastCol = -1 +var j = 0 +while (j numCols) { + var i = 0 + val indStart = j * numRows + while (i numRows) { +val v = values(indStart + i) +if (v != 0.0) { + rowIndices += i + spVals += v + while (j != lastCol) { +colPtrs(lastCol + 1) = nnz +lastCol += 1 + } + nnz += 1 +} +i += 1 + } + j += 1 +} +while (numCols lastCol) { --- End diff -- This check is not necessary. At the end of the `while (j numCols)` loop, `j = numCols + 1`. So it is `colPtrs(j) = nnz`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120295 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +335,167 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of (row, column, value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, Double)]): SparseMatrix = { +val sortedEntries = entries.sortBy(v = (v._2, v._1)) +val colPtrs = new Array[Int](numCols + 1) +var nnz = 0 +var lastCol = -1 +val values = sortedEntries.map { case (i, j, v) = + while (j != lastCol) { +colPtrs(lastCol + 1) = nnz +lastCol += 1 + } + nnz += 1 + v +} +while (numCols lastCol) { + colPtrs(lastCol + 1) = nnz + lastCol += 1 +} +new SparseMatrix(numRows, numCols, colPtrs.toArray, sortedEntries.map(_._1), values) + } + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a `SparseMatrix` with a given random number generator and `method`, which +* specifies the distribution. */ + private def genRandMatrix( + numRows: Int, + numCols: Int, + density: Double, + rng: Random, + method: Random = Double): SparseMatrix = { +require(density = 0.0 density = 1.0, density must be a double in the range + + s0.0 = d = 1.0. Currently, density: $density) +val length = math.ceil(numRows * numCols * density).toInt +val entries = MutableMap[(Int, Int), Double]() +var i = 0 +if (density == 0.0) { + return new SparseMatrix(numRows, numCols, new Array[Int](numCols + 1), +Array[Int](), Array[Double]()) +} else if (density == 1.0) { + return new SparseMatrix(numRows, numCols, (0 to numRows * numCols by numRows).toArray, --- End diff -- indices out of bound --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120305 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +335,167 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of (row, column, value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, Double)]): SparseMatrix = { +val sortedEntries = entries.sortBy(v = (v._2, v._1)) +val colPtrs = new Array[Int](numCols + 1) +var nnz = 0 +var lastCol = -1 +val values = sortedEntries.map { case (i, j, v) = + while (j != lastCol) { +colPtrs(lastCol + 1) = nnz +lastCol += 1 + } + nnz += 1 + v +} +while (numCols lastCol) { + colPtrs(lastCol + 1) = nnz + lastCol += 1 +} +new SparseMatrix(numRows, numCols, colPtrs.toArray, sortedEntries.map(_._1), values) + } + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a `SparseMatrix` with a given random number generator and `method`, which +* specifies the distribution. */ + private def genRandMatrix( + numRows: Int, + numCols: Int, + density: Double, + rng: Random, + method: Random = Double): SparseMatrix = { +require(density = 0.0 density = 1.0, density must be a double in the range + + s0.0 = d = 1.0. Currently, density: $density) +val length = math.ceil(numRows * numCols * density).toInt +val entries = MutableMap[(Int, Int), Double]() +var i = 0 +if (density == 0.0) { + return new SparseMatrix(numRows, numCols, new Array[Int](numCols + 1), +Array[Int](), Array[Double]()) +} else if (density == 1.0) { + return new SparseMatrix(numRows, numCols, (0 to numRows * numCols by numRows).toArray, +(0 until numRows * numCols).toArray, Array.fill(numRows * numCols)(method(rng))) +} +// Expected number of iterations is less than 1.5 * length +if (density 0.34) { + while (i length) { +var rowIndex = rng.nextInt(numRows) +var colIndex = rng.nextInt(numCols) +while (entries.contains((rowIndex, colIndex))) { + rowIndex = rng.nextInt(numRows) + colIndex = rng.nextInt(numCols) +} +entries += (rowIndex, colIndex) - method(rng) +i += 1 + } +} else { // selection - rejection method + var j = 0 + val pool = numRows * numCols + // loop over columns so that the sort in fromCOO requires less sorting + while (i length j numCols) { +var passedInPool = j * numRows +var r = 0 +while (i length r numRows) { + if (rng.nextDouble() 1.0 * (length - i) / (pool - passedInPool)) { +entries += (r, j) - method(rng) +i += 1 + } + r += 1 + passedInPool += 1 +} +j += 1 + } +} +SparseMatrix.fromCOO(numRows, numCols, entries.toArray.map(v = (v._1._1, v._1._2, v._2))) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers. The number of non-zero + * elements equal the ceiling of `numRows` x `numCols` x `density` + *
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120303 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +335,167 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of (row, column, value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, Double)]): SparseMatrix = { +val sortedEntries = entries.sortBy(v = (v._2, v._1)) +val colPtrs = new Array[Int](numCols + 1) +var nnz = 0 +var lastCol = -1 +val values = sortedEntries.map { case (i, j, v) = + while (j != lastCol) { +colPtrs(lastCol + 1) = nnz +lastCol += 1 + } + nnz += 1 + v +} +while (numCols lastCol) { + colPtrs(lastCol + 1) = nnz + lastCol += 1 +} +new SparseMatrix(numRows, numCols, colPtrs.toArray, sortedEntries.map(_._1), values) + } + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a `SparseMatrix` with a given random number generator and `method`, which +* specifies the distribution. */ + private def genRandMatrix( + numRows: Int, + numCols: Int, + density: Double, + rng: Random, + method: Random = Double): SparseMatrix = { +require(density = 0.0 density = 1.0, density must be a double in the range + + s0.0 = d = 1.0. Currently, density: $density) +val length = math.ceil(numRows * numCols * density).toInt +val entries = MutableMap[(Int, Int), Double]() +var i = 0 +if (density == 0.0) { + return new SparseMatrix(numRows, numCols, new Array[Int](numCols + 1), +Array[Int](), Array[Double]()) +} else if (density == 1.0) { + return new SparseMatrix(numRows, numCols, (0 to numRows * numCols by numRows).toArray, +(0 until numRows * numCols).toArray, Array.fill(numRows * numCols)(method(rng))) +} +// Expected number of iterations is less than 1.5 * length +if (density 0.34) { + while (i length) { +var rowIndex = rng.nextInt(numRows) +var colIndex = rng.nextInt(numCols) +while (entries.contains((rowIndex, colIndex))) { + rowIndex = rng.nextInt(numRows) + colIndex = rng.nextInt(numCols) +} +entries += (rowIndex, colIndex) - method(rng) +i += 1 + } +} else { // selection - rejection method + var j = 0 + val pool = numRows * numCols + // loop over columns so that the sort in fromCOO requires less sorting + while (i length j numCols) { +var passedInPool = j * numRows +var r = 0 +while (i length r numRows) { + if (rng.nextDouble() 1.0 * (length - i) / (pool - passedInPool)) { +entries += (r, j) - method(rng) +i += 1 + } + r += 1 + passedInPool += 1 +} +j += 1 + } +} +SparseMatrix.fromCOO(numRows, numCols, entries.toArray.map(v = (v._1._1, v._1._2, v._2))) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers. The number of non-zero + * elements equal the ceiling of `numRows` x `numCols` x `density` + *
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120288 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +335,167 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of (row, column, value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, Double)]): SparseMatrix = { +val sortedEntries = entries.sortBy(v = (v._2, v._1)) +val colPtrs = new Array[Int](numCols + 1) +var nnz = 0 +var lastCol = -1 +val values = sortedEntries.map { case (i, j, v) = + while (j != lastCol) { --- End diff -- We can add a check on `j =0`. If there is `j = -2` in the input, we get a dead loop. It is a little hard to read if we put the `colPtrs` construction code inside the map. We are separate the code. There is no performance penalty. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120308 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +555,244 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120292 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +331,145 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. Array must be sorted first by *column* index and then by row + * index. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of ((row, column), value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), Double)]): SparseMatrix = { +val colPtrs = new ArrayBuffer[Int](numCols + 1) +colPtrs.append(0) +var nnz = 0 +var lastCol = 0 +val values = entries.map { case ((i, j), v) = + while (j != lastCol) { +colPtrs.append(nnz) +lastCol += 1 +if (lastCol numCols) { + throw new IndexOutOfBoundsException(Please make sure that the entries array is + +sorted by COLUMN index first and then by row index.) +} + } + nnz += 1 + v +} +while (numCols lastCol) { + colPtrs.append(nnz) + lastCol += 1 +} +new SparseMatrix(numRows, numCols, colPtrs.toArray, entries.map(_._1._1), values) + } + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a `SparseMatrix` with a given random number generator and `method`, which +* specifies the distribution. */ + private def genRandMatrix( + numRows: Int, + numCols: Int, --- End diff -- It is harder to read this way. Also, if we generate the skeleton first, we don't need to handle dynamic values array. Having everything in one loop is not a goal, and it doesn't always give us performance gain. ~~~scala def genRandPattern(numRows, numCols, density, rng): SparseMatrix { ... } def sprand(numRows, numCols, density, rng): SparseMatrix { val mat = genRandPattern(numRows, numCols, density, rng) // fill-in the values with rng.nextDouble } ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120317 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +555,244 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120287 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +335,167 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of (row, column, value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, Double)]): SparseMatrix = { --- End diff -- What's the behavior if there are duplicate coordinates in the input? This should be documented. In MATLAB's `sparse`, Any elements of s that have duplicate values of i and j are added together. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120299 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +335,167 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of (row, column, value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[(Int, Int, Double)]): SparseMatrix = { +val sortedEntries = entries.sortBy(v = (v._2, v._1)) +val colPtrs = new Array[Int](numCols + 1) +var nnz = 0 +var lastCol = -1 +val values = sortedEntries.map { case (i, j, v) = + while (j != lastCol) { +colPtrs(lastCol + 1) = nnz +lastCol += 1 + } + nnz += 1 + v +} +while (numCols lastCol) { + colPtrs(lastCol + 1) = nnz + lastCol += 1 +} +new SparseMatrix(numRows, numCols, colPtrs.toArray, sortedEntries.map(_._1), values) + } + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a `SparseMatrix` with a given random number generator and `method`, which +* specifies the distribution. */ + private def genRandMatrix( + numRows: Int, + numCols: Int, + density: Double, + rng: Random, + method: Random = Double): SparseMatrix = { +require(density = 0.0 density = 1.0, density must be a double in the range + + s0.0 = d = 1.0. Currently, density: $density) +val length = math.ceil(numRows * numCols * density).toInt +val entries = MutableMap[(Int, Int), Double]() +var i = 0 +if (density == 0.0) { + return new SparseMatrix(numRows, numCols, new Array[Int](numCols + 1), +Array[Int](), Array[Double]()) +} else if (density == 1.0) { + return new SparseMatrix(numRows, numCols, (0 to numRows * numCols by numRows).toArray, +(0 until numRows * numCols).toArray, Array.fill(numRows * numCols)(method(rng))) +} +// Expected number of iterations is less than 1.5 * length +if (density 0.34) { + while (i length) { +var rowIndex = rng.nextInt(numRows) +var colIndex = rng.nextInt(numCols) +while (entries.contains((rowIndex, colIndex))) { + rowIndex = rng.nextInt(numRows) + colIndex = rng.nextInt(numCols) +} +entries += (rowIndex, colIndex) - method(rng) +i += 1 + } +} else { // selection - rejection method + var j = 0 + val pool = numRows * numCols + // loop over columns so that the sort in fromCOO requires less sorting + while (i length j numCols) { +var passedInPool = j * numRows +var r = 0 +while (i length r numRows) { + if (rng.nextDouble() 1.0 * (length - i) / (pool - passedInPool)) { +entries += (r, j) - method(rng) --- End diff -- Using a map here is not optimal. The sampled entries are ordered. We can construct `rowIndices` and `colPtrs` directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA.
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22120314 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +555,244 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67499071 [Test build #24591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24591/consoleFull) for PR 3319 at commit [`75239f8`](https://github.com/apache/spark/commit/75239f8e5b41a275a0f232108b26cb0e16935bbf). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67513626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24591/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67513619 [Test build #24591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24591/consoleFull) for PR 3319 at commit [`75239f8`](https://github.com/apache/spark/commit/75239f8e5b41a275a0f232108b26cb0e16935bbf). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22063271 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +524,297 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) --- End diff -- I see your point. The reason we didn't return the exact type in `Vectors` and `Matrices` was because RDD is not covariant. But maybe we should return the exact types that and let algorithms take a generic `RDD[T]` with `T` extending `Vector`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070489 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +331,145 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. Array must be sorted first by *column* index and then by row + * index. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of ((row, column), value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), Double)]): SparseMatrix = { +val colPtrs = new ArrayBuffer[Int](numCols + 1) +colPtrs.append(0) +var nnz = 0 +var lastCol = 0 +val values = entries.map { case ((i, j), v) = + while (j != lastCol) { +colPtrs.append(nnz) +lastCol += 1 +if (lastCol numCols) { --- End diff -- minor: COO doesn't have this restriction. We should sort the input entries, which could be done in a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070477 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -123,6 +135,126 @@ class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) } override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, numCols, values.map(f)) + + private[mllib] def update(f: Double = Double): DenseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `SparseMatrix` from the given `DenseMatrix`. */ + def toSparse(): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer() +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) --- End diff -- If we already know the size, we don't need a buffer. Btw, it would be nice if we use the same naming as in `SparseVector` for consistency, `colPtrs` in this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070506 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +331,145 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. Array must be sorted first by *column* index and then by row + * index. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of ((row, column), value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), Double)]): SparseMatrix = { +val colPtrs = new ArrayBuffer[Int](numCols + 1) +colPtrs.append(0) +var nnz = 0 +var lastCol = 0 +val values = entries.map { case ((i, j), v) = + while (j != lastCol) { +colPtrs.append(nnz) +lastCol += 1 +if (lastCol numCols) { + throw new IndexOutOfBoundsException(Please make sure that the entries array is + +sorted by COLUMN index first and then by row index.) +} + } + nnz += 1 + v +} +while (numCols lastCol) { + colPtrs.append(nnz) + lastCol += 1 +} +new SparseMatrix(numRows, numCols, colPtrs.toArray, entries.map(_._1._1), values) + } + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a `SparseMatrix` with a given random number generator and `method`, which +* specifies the distribution. */ + private def genRandMatrix( + numRows: Int, + numCols: Int, --- End diff -- minor: It may be cleaner if we just use this function to generate the skeleton and fill in values inside `sprandn` and `sprand`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070475 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -123,6 +135,126 @@ class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) } override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, numCols, values.map(f)) + + private[mllib] def update(f: Double = Double): DenseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `SparseMatrix` from the given `DenseMatrix`. */ + def toSparse(): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer() --- End diff -- Please use `ArrayBuilder` instead because `ArrayBuffer` is not specialized for primitive types, unfortunately. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070528 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +529,222 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070520 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +529,222 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070486 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +331,145 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. Array must be sorted first by *column* index and then by row + * index. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of ((row, column), value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), Double)]): SparseMatrix = { --- End diff -- `(Int, Int, Double)` should be sufficient and it saves one object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070478 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -123,6 +135,126 @@ class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) } override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, numCols, values.map(f)) + + private[mllib] def update(f: Double = Double): DenseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `SparseMatrix` from the given `DenseMatrix`. */ + def toSparse(): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer() +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) +val sRows: ArrayBuffer[Int] = new ArrayBuffer() +var i = 0 +var nnz = 0 +var lastCol = -1 +values.foreach { v = --- End diff -- Is it simpler to use two while loops with a counter? ~~~ var i = 0 var j = 0 var idx = 0 while (j n) { while (i m) { if (values(idx) != 0) { ... } i += 1 idx += 1 } ... j += 1 } ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070530 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +529,222 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070521 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +529,222 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070531 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +529,222 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070510 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +331,145 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } + + /** Generate a `DenseMatrix` from the given `SparseMatrix`. */ + def toDense(): DenseMatrix = { +new DenseMatrix(numRows, numCols, toArray) + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate a `SparseMatrix` from Coordinate List (COO) format. Input must be an array of + * (row, column, value) tuples. Array must be sorted first by *column* index and then by row + * index. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param entries Array of ((row, column), value) tuples + * @return The corresponding `SparseMatrix` + */ + def fromCOO(numRows: Int, numCols: Int, entries: Array[((Int, Int), Double)]): SparseMatrix = { +val colPtrs = new ArrayBuffer[Int](numCols + 1) +colPtrs.append(0) +var nnz = 0 +var lastCol = 0 +val values = entries.map { case ((i, j), v) = + while (j != lastCol) { +colPtrs.append(nnz) +lastCol += 1 +if (lastCol numCols) { + throw new IndexOutOfBoundsException(Please make sure that the entries array is + +sorted by COLUMN index first and then by row index.) +} + } + nnz += 1 + v +} +while (numCols lastCol) { + colPtrs.append(nnz) + lastCol += 1 +} +new SparseMatrix(numRows, numCols, colPtrs.toArray, entries.map(_._1._1), values) + } + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a `SparseMatrix` with a given random number generator and `method`, which +* specifies the distribution. */ + private def genRandMatrix( + numRows: Int, + numCols: Int, + density: Double, + rng: Random, + method: Random = Double): SparseMatrix = { +require(density = 0.0 density = 1.0, density must be a double in the range + + s0.0 = d = 1.0. Currently, density: $density) +val length = math.ceil(numRows * numCols * density).toInt +val entries = Map[(Int, Int), Double]() +var i = 0 +while (i length) { + var rowIndex = rng.nextInt(numRows) + var colIndex = rng.nextInt(numCols) + while (entries.contains((rowIndex, colIndex))) { --- End diff -- If `density` is close to `1`, it is hard to end this while loop. We can combine this approach with selection-rejection to achieve `O(nnz)` complexity: https://github.com/mengxr/spark-sampling/blob/master/src/main/scala/org/apache/spark/sampling/RDDSamplingFunctions.scala#L98 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22070523 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +529,222 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67299603 [Test build #24541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24541/consoleFull) for PR 3319 at commit [`e4bd0c0`](https://github.com/apache/spark/commit/e4bd0c02df49b07ed0ee3687c3ac8e44868c857a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67308668 [Test build #24541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24541/consoleFull) for PR 3319 at commit [`e4bd0c0`](https://github.com/apache/spark/commit/e4bd0c02df49b07ed0ee3687c3ac8e44868c857a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67308677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24541/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010770 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +300,171 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a SparseMatrix given an Array[Double] of size numRows * numCols. The number of +* non-zeros in `raw` is provided for efficiency. */ + private def genRand( + numRows: Int, + numCols: Int, + raw: Array[Double], + nonZero: Int): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero) +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) +val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero) + +var i = 0 +var nnz = 0 +var lastCol = -1 +raw.foreach { v = + val r = i % numRows + val c = (i - r) / numRows + if (v != 0.0) { +sRows.append(r) +sparseA.append(v) +while (c != lastCol) { + sCols.append(nnz) + lastCol += 1 +} +nnz += 1 + } + i += 1 +} +while (numCols lastCol) { + sCols.append(sparseA.length) + lastCol += 1 +} +new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): SparseMatrix = { +require(density = 0.0 density = 1.0, density must be a double in the range + + s0.0 = d = 1.0. Currently, density: $density) +val length = numRows * numCols +val rawA = new Array[Double](length) +var nnz = 0 +for (i - 0 until length) { --- End diff -- `sprand` is not generated this way, which has `O(m * n)` complexity. Please check MATLAB's implementation of octave's. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010793 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +524,297 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010805 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +524,297 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010780 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +300,171 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a SparseMatrix given an Array[Double] of size numRows * numCols. The number of +* non-zeros in `raw` is provided for efficiency. */ + private def genRand( + numRows: Int, + numCols: Int, + raw: Array[Double], + nonZero: Int): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero) +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) +val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero) + +var i = 0 +var nnz = 0 +var lastCol = -1 +raw.foreach { v = + val r = i % numRows + val c = (i - r) / numRows + if (v != 0.0) { +sRows.append(r) +sparseA.append(v) +while (c != lastCol) { + sCols.append(nnz) + lastCol += 1 +} +nnz += 1 + } + i += 1 +} +while (numCols lastCol) { + sCols.append(sparseA.length) + lastCol += 1 +} +new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): SparseMatrix = { +require(density = 0.0 density = 1.0, density must be a double in the range + + s0.0 = d = 1.0. Currently, density: $density) +val length = numRows * numCols +val rawA = new Array[Double](length) +var nnz = 0 +for (i - 0 until length) { + val p = rng.nextDouble() + if (p = density) { +rawA.update(i, rng.nextDouble()) +nnz += 1 + } +} +genRand(numRows, numCols, rawA, nnz) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `SparseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): SparseMatrix = { +require(density = 0.0 density = 1.0, density must be a double in the range + + s0.0 = d = 1.0. Currently, density: $density) +val length = numRows * numCols +val rawA = new Array[Double](length) +var nnz = 0 +for (i - 0 until length) { + val p = rng.nextDouble() + if (p = density) { +rawA.update(i, rng.nextGaussian()) +nnz += 1 + } +} +genRand(numRows, numCols, rawA, nnz) + } + + /** + * Generate a diagonal matrix in `SparseMatrix` format from the supplied values. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `SparseMatrix` with size `values.length` x `values.length` and non-zero + * `values` on the diagonal + */ + def diag(vector: Vector): SparseMatrix = { +val n = vector.size +vector match { + case sVec: SparseVector = +val rows = sVec.indices
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010832 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +524,297 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010786 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +524,297 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) --- End diff -- It is nice to put all operators under `Matrices`. Then maybe we can mark the ones under `SparkMatrix` and `DenseMatrix` private. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010764 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -123,6 +135,97 @@ class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) } override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def map(f: Double = Double) = new DenseMatrix(numRows, numCols, values.map(f)) + + private[mllib] def update(f: Double = Double): DenseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.DenseMatrix]]. + */ +object DenseMatrix { + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): DenseMatrix = +new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): DenseMatrix = +new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): DenseMatrix = { +val identity = DenseMatrix.zeros(n, n) +var i = 0 +while (i n) { + identity.update(i, i, 1.0) + i += 1 +} +identity + } + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param rng a random number generator + * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int, rng: Random): DenseMatrix = { +new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) + } + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param rng a random number generator + * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int, rng: Random): DenseMatrix = { +new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) + } + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): DenseMatrix = { +val n = vector.size +val matrix = DenseMatrix.eye(n) --- End diff -- `eye(n)` - `zeros(n, n)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010767 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +300,171 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a SparseMatrix given an Array[Double] of size numRows * numCols. The number of +* non-zeros in `raw` is provided for efficiency. */ + private def genRand( --- End diff -- This is a little confusing. First of all, there is no randomness. Secondly, the doc doesn't describe how the values gets fill-in. Is it supposed to be a method in `DenseMatrix` called `toSparse`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010776 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +300,171 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a SparseMatrix given an Array[Double] of size numRows * numCols. The number of +* non-zeros in `raw` is provided for efficiency. */ + private def genRand( + numRows: Int, + numCols: Int, + raw: Array[Double], + nonZero: Int): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero) +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) +val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero) + +var i = 0 +var nnz = 0 +var lastCol = -1 +raw.foreach { v = + val r = i % numRows + val c = (i - r) / numRows + if (v != 0.0) { +sRows.append(r) +sparseA.append(v) +while (c != lastCol) { + sCols.append(nnz) + lastCol += 1 +} +nnz += 1 + } + i += 1 +} +while (numCols lastCol) { + sCols.append(sparseA.length) + lastCol += 1 +} +new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): SparseMatrix = { +require(density = 0.0 density = 1.0, density must be a double in the range + + s0.0 = d = 1.0. Currently, density: $density) +val length = numRows * numCols +val rawA = new Array[Double](length) +var nnz = 0 +for (i - 0 until length) { + val p = rng.nextDouble() + if (p = density) { +rawA.update(i, rng.nextDouble()) +nnz += 1 + } +} +genRand(numRows, numCols, rawA, nnz) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `SparseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): SparseMatrix = { +require(density = 0.0 density = 1.0, density must be a double in the range + + s0.0 = d = 1.0. Currently, density: $density) +val length = numRows * numCols +val rawA = new Array[Double](length) +var nnz = 0 +for (i - 0 until length) { --- End diff -- Ditto. `O(m * n)` is too expensive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010799 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +524,297 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22010808 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +524,297 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) /** * Generate a `DenseMatrix` consisting of ones. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones + * @return `Matrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) /** - * Generate an Identity Matrix in `DenseMatrix` format. + * Generate a dense Identity Matrix in `Matrix` format. * @param n number of rows and columns of the matrix - * @return `DenseMatrix` with size `n` x `n` and values of ones on the diagonal + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal */ - def eye(n: Int): Matrix = { -val identity = Matrices.zeros(n, n) -var i = 0 -while (i n){ - identity.update(i, i, 1.0) - i += 1 -} -identity - } + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate a sparse Identity Matrix in `Matrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) /** * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) */ - def rand(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) - } + def rand(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.rand(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprand(numRows, numCols, density, rng) /** * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix * @param rng a random number generator - * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) */ - def randn(numRows: Int, numCols: Int, rng: Random): Matrix = { -new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) - } + def randn(numRows: Int, numCols: Int, rng: Random): Matrix = +DenseMatrix.randn(numRows, numCols, rng) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, rng) /** * Generate a diagonal matrix in `DenseMatrix` format
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r22026283 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -256,72 +524,297 @@ object Matrices { * Generate a `DenseMatrix` consisting of zeros. * @param numRows number of rows of the matrix * @param numCols number of columns of the matrix - * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros + * @return `Matrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): Matrix = -new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) --- End diff -- I specifically don't want to mark them private, otherwise the user will have to always write `.asInstanceOf[SparseMatrix]`. We could mark it `private[mllib]` and still use them, but not having `.asInstanceOf` everywhere, especially while writing tests on spark-shell is a very nice convenience. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67129497 [Test build #24492 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24492/consoleFull) for PR 3319 at commit [`065b531`](https://github.com/apache/spark/commit/065b53181349fa0cc56d4828044b1d564791ea80). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67130002 [Test build #24493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24493/consoleFull) for PR 3319 at commit [`d8be7bc`](https://github.com/apache/spark/commit/d8be7bc07b23982c4fced647f85982c6b7cadd4b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67136033 [Test build #24492 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24492/consoleFull) for PR 3319 at commit [`065b531`](https://github.com/apache/spark/commit/065b53181349fa0cc56d4828044b1d564791ea80). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67136043 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24492/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67136740 [Test build #24493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24493/consoleFull) for PR 3319 at commit [`d8be7bc`](https://github.com/apache/spark/commit/d8be7bc07b23982c4fced647f85982c6b7cadd4b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67136754 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24493/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67141113 [Test build #24497 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24497/consoleFull) for PR 3319 at commit [`65c562e`](https://github.com/apache/spark/commit/65c562e57078ccb31de281b238a9348dd9a1f7c2). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-67149415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24497/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21925528 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -313,15 +593,145 @@ object Matrices { * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values` * on the diagonal */ - def diag(vector: Vector): Matrix = { -val n = vector.size -val matrix = Matrices.eye(n) -val values = vector.toArray -var i = 0 -while (i n) { - matrix.update(i, i, values(i)) - i += 1 + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices will result in + * a dense matrix. --- End diff -- I like the MATLAB approach better. Usually a sparse matrix is very sparse, while a dense component is quite small, for example, ~~~ A^T A A^T A I ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21929684 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -313,15 +593,145 @@ object Matrices { * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values` * on the diagonal */ - def diag(vector: Vector): Matrix = { -val n = vector.size -val matrix = Matrices.eye(n) -val values = vector.toArray -var i = 0 -while (i n) { - matrix.update(i, i, values(i)) - i += 1 + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices will result in + * a dense matrix. --- End diff -- Okay, will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21866315 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -80,6 +81,12 @@ sealed trait Matrix extends Serializable { /** A human readable representation of the matrix */ override def toString: String = toBreeze.toString() + + /** Map the values of this matrix using a function. Generates a new matrix. */ + private[mllib] def map(f: Double = Double): Matrix + + /** Update all the values of this matrix using the function f. Performed in-place. */ + private[mllib] def update(f: Double = Double): Matrix --- End diff -- Ditto. What happens when there are non-presenting zero values? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21866309 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -17,10 +17,11 @@ package org.apache.spark.mllib.linalg -import java.util.{Random, Arrays} - import breeze.linalg.{Matrix = BM, DenseMatrix = BDM, CSCMatrix = BSM} +import java.util.{Random, Arrays} +import scala.collection.mutable.ArrayBuffer --- End diff -- organize imports --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21866327 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +295,171 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a SparseMatrix given an Array[Double] of size numRows * numCols. The number of +* non-zeros in `raw` is provided for efficiency. */ + private def genRand( + numRows: Int, + numCols: Int, + raw: Array[Double], + nonZero: Int): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero) +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) +val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero) + +var i = 0 +var nnz = 0 +var lastCol = -1 +raw.foreach { v = + val r = i % numRows + val c = (i - r) / numRows + if ( v != 0.0) { +sRows.append(r) +sparseA.append(v) +while (c != lastCol){ + sCols.append(nnz) + lastCol += 1 +} +nnz += 1 + } + i += 1 +} +while (numCols lastCol){ --- End diff -- space before `{` (and please fix other places) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21866336 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +295,171 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a SparseMatrix given an Array[Double] of size numRows * numCols. The number of +* non-zeros in `raw` is provided for efficiency. */ + private def genRand( + numRows: Int, + numCols: Int, + raw: Array[Double], + nonZero: Int): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero) +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) +val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero) + +var i = 0 +var nnz = 0 +var lastCol = -1 +raw.foreach { v = + val r = i % numRows + val c = (i - r) / numRows + if ( v != 0.0) { +sRows.append(r) +sparseA.append(v) +while (c != lastCol){ + sCols.append(nnz) + lastCol += 1 +} +nnz += 1 + } + i += 1 +} +while (numCols lastCol){ + sCols.append(sparseA.length) + lastCol += 1 +} +new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): SparseMatrix = { +require(density 0.0 density 1.0, density must be a double in the range + + s0.0 d 1.0. Currently, density: $density) +val length = numRows * numCols +val rawA = new Array[Double](length) +var nnz = 0 +for (i - 0 until length) { + val p = rng.nextDouble() + if (p = density) { +rawA.update(i, rng.nextDouble()) +nnz += 1 + } +} +genRand(numRows, numCols, rawA, nnz) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `SparseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): SparseMatrix = { +require(density 0.0 density 1.0, density must be a double in the range + + s0.0 d 1.0. Currently, density: $density) +val length = numRows * numCols +val rawA = new Array[Double](length) +var nnz = 0 +for (i - 0 until length) { + val p = rng.nextDouble() + if (p = density) { +rawA.update(i, rng.nextGaussian()) +nnz += 1 + } +} +genRand(numRows, numCols, rawA, nnz) + } + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. --- End diff -- `DenseMatrix`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21866331 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +295,171 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a SparseMatrix given an Array[Double] of size numRows * numCols. The number of +* non-zeros in `raw` is provided for efficiency. */ + private def genRand( + numRows: Int, + numCols: Int, + raw: Array[Double], + nonZero: Int): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero) +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) +val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero) + +var i = 0 +var nnz = 0 +var lastCol = -1 +raw.foreach { v = + val r = i % numRows + val c = (i - r) / numRows + if ( v != 0.0) { +sRows.append(r) +sparseA.append(v) +while (c != lastCol){ + sCols.append(nnz) + lastCol += 1 +} +nnz += 1 + } + i += 1 +} +while (numCols lastCol){ + sCols.append(sparseA.length) + lastCol += 1 +} +new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): SparseMatrix = { +require(density 0.0 density 1.0, density must be a double in the range + + s0.0 d 1.0. Currently, density: $density) --- End diff -- `density = 0.0` and `density = 1.0` should be valid. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21866323 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +295,171 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a SparseMatrix given an Array[Double] of size numRows * numCols. The number of +* non-zeros in `raw` is provided for efficiency. */ + private def genRand( + numRows: Int, + numCols: Int, + raw: Array[Double], + nonZero: Int): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero) +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) +val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero) + +var i = 0 +var nnz = 0 +var lastCol = -1 +raw.foreach { v = + val r = i % numRows + val c = (i - r) / numRows + if ( v != 0.0) { --- End diff -- Is it handled inside the constructor of `SparseMatrix`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21866346 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -313,15 +593,145 @@ object Matrices { * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values` * on the diagonal */ - def diag(vector: Vector): Matrix = { -val n = vector.size -val matrix = Matrices.eye(n) -val values = vector.toArray -var i = 0 -while (i n) { - matrix.update(i, i, values(i)) - i += 1 + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices will result in + * a dense matrix. --- End diff -- Is it the same behavior as in MATLAB? (Sorry I don't have MATLAB installed.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21866313 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -80,6 +81,12 @@ sealed trait Matrix extends Serializable { /** A human readable representation of the matrix */ override def toString: String = toBreeze.toString() + + /** Map the values of this matrix using a function. Generates a new matrix. */ --- End diff -- Should comment on the behavior for sparse matrices, for example, `map(_+ 1)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21881450 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -313,15 +593,145 @@ object Matrices { * @return Square `DenseMatrix` with size `values.length` x `values.length` and `values` * on the diagonal */ - def diag(vector: Vector): Matrix = { -val n = vector.size -val matrix = Matrices.eye(n) -val values = vector.toArray -var i = 0 -while (i n) { - matrix.update(i, i, values(i)) - i += 1 + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices will result in + * a dense matrix. --- End diff -- MATLAB does it the other way around. If one matrix is sparse, then the final matrix turns out to be sparse as well. That's why I added the note. Should I make it consistent with MATLAB? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21881510 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +295,171 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a SparseMatrix given an Array[Double] of size numRows * numCols. The number of +* non-zeros in `raw` is provided for efficiency. */ + private def genRand( + numRows: Int, + numCols: Int, + raw: Array[Double], + nonZero: Int): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero) +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) +val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero) + +var i = 0 +var nnz = 0 +var lastCol = -1 +raw.foreach { v = + val r = i % numRows + val c = (i - r) / numRows + if ( v != 0.0) { +sRows.append(r) +sparseA.append(v) +while (c != lastCol){ + sCols.append(nnz) + lastCol += 1 +} +nnz += 1 + } + i += 1 +} +while (numCols lastCol){ + sCols.append(sparseA.length) + lastCol += 1 +} +new SparseMatrix(numRows, numCols, sCols.toArray, sRows.toArray, sparseA.toArray) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `SparseMatrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand(numRows: Int, numCols: Int, density: Double, rng: Random): SparseMatrix = { +require(density 0.0 density 1.0, density must be a double in the range + + s0.0 d 1.0. Currently, density: $density) +val length = numRows * numCols +val rawA = new Array[Double](length) +var nnz = 0 +for (i - 0 until length) { + val p = rng.nextDouble() + if (p = density) { +rawA.update(i, rng.nextDouble()) +nnz += 1 + } +} +genRand(numRows, numCols, rawA, nnz) + } + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param rng a random number generator + * @return `SparseMatrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn(numRows: Int, numCols: Int, density: Double, rng: Random): SparseMatrix = { +require(density 0.0 density 1.0, density must be a double in the range + + s0.0 d 1.0. Currently, density: $density) +val length = numRows * numCols +val rawA = new Array[Double](length) +var nnz = 0 +for (i - 0 until length) { + val p = rng.nextDouble() + if (p = density) { +rawA.update(i, rng.nextGaussian()) +nnz += 1 + } +} +genRand(numRows, numCols, rawA, nnz) + } + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. --- End diff -- Good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/3319#discussion_r21881678 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -197,6 +295,171 @@ class SparseMatrix( } override def copy = new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.clone()) + + private[mllib] def map(f: Double = Double) = +new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values.map(f)) + + private[mllib] def update(f: Double = Double): SparseMatrix = { +val len = values.length +var i = 0 +while (i len) { + values(i) = f(values(i)) + i += 1 +} +this + } +} + +/** + * Factory methods for [[org.apache.spark.mllib.linalg.SparseMatrix]]. + */ +object SparseMatrix { + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `SparseMatrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): SparseMatrix = { +new SparseMatrix(n, n, (0 to n).toArray, (0 until n).toArray, Array.fill(n)(1.0)) + } + + /** Generates a SparseMatrix given an Array[Double] of size numRows * numCols. The number of +* non-zeros in `raw` is provided for efficiency. */ + private def genRand( + numRows: Int, + numCols: Int, + raw: Array[Double], + nonZero: Int): SparseMatrix = { +val sparseA: ArrayBuffer[Double] = new ArrayBuffer(nonZero) +val sCols: ArrayBuffer[Int] = new ArrayBuffer(numCols + 1) +val sRows: ArrayBuffer[Int] = new ArrayBuffer(nonZero) + +var i = 0 +var nnz = 0 +var lastCol = -1 +raw.foreach { v = + val r = i % numRows + val c = (i - r) / numRows + if ( v != 0.0) { --- End diff -- Right now, it's not. Currently users can supply zero values during the construction of SparseMatrix. Two things: 1) Should I add a check in the constructor of SparseMatrix? 2) Should I transform genRand into something like .toSparse() inside DenseMatrix, and add a .toDense() inside SparseMatrix? (I actually had these two methods in my multi model training repo) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-64681676 [Test build #23900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23900/consoleFull) for PR 3319 at commit [`a8120d2`](https://github.com/apache/spark/commit/a8120d2a83720b621b36942add3a98aa4b96bcc3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-64693996 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23900/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-64693986 [Test build #23900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23900/consoleFull) for PR 3319 at commit [`a8120d2`](https://github.com/apache/spark/commit/a8120d2a83720b621b36942add3a98aa4b96bcc3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-64706309 @brkyvz I didn't know MATLAB has `horzcat` and `vertcat` along with `[A, B]` or `[A; B]`. I'm okay with adapting method names from MATLAB. Hope there is no copyright issues. (I don't see any special statement from Octave.) If we want to use MATLAB operators, maybe we should also stick to lowercase method names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-64713893 I checked MATLAB's webpage, I didn't see any copyright mentions for the method names. It's best to triple check though. Since numPy and sciPy share method names with MATLAB, I don't expect there to be problems. with the last commit I made the method names lowercase :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-64504931 [Test build #23863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23863/consoleFull) for PR 3319 at commit [`c75f3cd`](https://github.com/apache/spark/commit/c75f3cdec438042c10e31009dee87a14fdce4053). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-64505065 @mengxr: Thanks for the feedback. Added the Java tests! horzcat and vertcat are in fact MATLAB methods: http://www.mathworks.com/help/matlab/ref/horzcat.html http://www.mathworks.com/help/matlab/ref/vertcat.html They are the underlying methods that are called when someone writes `A = [A1 A2; A3 A4];` I felt the naming was more intuitive as it is like `strcat`, because you are concatenating matrices either horizontally or vertically. I'd be happy to change them to `hstack` and `vstack`, but horzcat sounds more intuitive to me (maybe I'm biased, because I used to use it more). Your call :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-64510456 [Test build #23863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23863/consoleFull) for PR 3319 at commit [`c75f3cd`](https://github.com/apache/spark/commit/c75f3cdec438042c10e31009dee87a14fdce4053). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-64510461 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23863/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-64310622 @brkyvz Two comments on the API: 1) For the APIs we provide, could you add a JAVA test suite and verify that all methods work in Java. 2) `horzCat` and `vertCat` are not MATLAB operators, nor NumPy's. Maybe we should rename them to `hstack` and `vstack`, which are at least known by NumPy users. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/3319 [SPARK-4409][MLlib] Additional Linear Algebra Utils Addition of a very limited number of local matrix manipulation and generation methods that would be helpful in the further development for algorithms on top of BlockMatrix (SPARK-3974), such as Randomized SVD, and Multi Model Training (SPARK-1486). The proposed methods for addition are: For `Matrix` - map: maps the values in the matrix with a given function. Produces a new matrix. - update: the values in the matrix are updated with a given function. Occurs in place. Factory methods for `DenseMatrix`: - *zeros: Generate a matrix consisting of zeros - *ones: Generate a matrix consisting of ones - *eye: Generate an identity matrix - *rand: Generate a matrix consisting of i.i.d. uniform random numbers - *randn: Generate a matrix consisting of i.i.d. gaussian random numbers - *diag: Generate a diagonal matrix from a supplied vector *These methods already exist in the factory methods for `Matrices`, however for cases where we require a `DenseMatrix`, you constantly have to add `.asInstanceOf[DenseMatrix]` everywhere, which makes the code dirtier. I propose moving these functions to factory methods for `DenseMatrix` where the putput will be a `DenseMatrix` and the factory methods for `Matrices` will call these functions directly and output a generic `Matrix`. Factory methods for `SparseMatrix`: - speye: Identity matrix in sparse format. Saves a ton of memory when dimensions are large, especially in Multi Model Training, where each row requires being multiplied by a scalar. - sprand: Generate a sparse matrix with a given density consisting of i.i.d. uniform random numbers. - sprandn: Generate a sparse matrix with a given density consisting of i.i.d. gaussian random numbers. - diag: Generate a diagonal matrix from a supplied vector, but is memory efficient, because it just stores the diagonal. Again, very helpful in Multi Model Training. Factory methods for `Matrices`: - Include all the factory methods given above, but return a generic `Matrix` rather than `SparseMatrix` or `DenseMatrix`. - horzCat: Horizontally concatenate matrices to form one larger matrix. Very useful in both Multi Model Training, and for the repartitioning of BlockMatrix. - vertCat: Vertically concatenate matrices to form one larger matrix. Very useful for the repartitioning of BlockMatrix. The names for these methods were selected from MATLAB You can merge this pull request into a Git repository by running: $ git pull https://github.com/brkyvz/spark SPARK-4409 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3319.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3319 commit a14c0da0360b4202a2db787b85ce631562014f0d Author: Burak Yavuz brk...@gmail.com Date: 2014-11-17T09:33:36Z [SPARK-4409] Initial commit to add methods --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-63358380 [Test build #23485 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23485/consoleFull) for PR 3319 at commit [`94d7ae9`](https://github.com/apache/spark/commit/94d7ae977858ba4d785429df5a324012a438bc80). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-63358572 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23485/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-63358570 [Test build #23485 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23485/consoleFull) for PR 3319 at commit [`94d7ae9`](https://github.com/apache/spark/commit/94d7ae977858ba4d785429df5a324012a438bc80). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-63365041 [Test build #23492 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23492/consoleFull) for PR 3319 at commit [`d662f9d`](https://github.com/apache/spark/commit/d662f9d963c21aca720bab87a8279a938e1d924e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3319#issuecomment-63378128 [Test build #23492 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23492/consoleFull) for PR 3319 at commit [`d662f9d`](https://github.com/apache/spark/commit/d662f9d963c21aca720bab87a8279a938e1d924e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org