[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3200


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-28 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-71884182
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-71871259
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26225/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-71871237
  
  [Test build #26225 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26225/consoleFull)
 for   PR 3200 at commit 
[`a8eace2`](https://github.com/apache/spark/commit/a8eace2681840b90e6c5690afd317273d461e2c4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BlockMatrix(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-28 Thread brkyvz
Github user brkyvz commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-71859402
  
@mengxr I don't know if `rows` and `cols` will be confusing in terms of 
naming in GridPartitioner... 
However, since it is private and internal, maybe it's not that big of a 
problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-71856884
  
  [Test build #26225 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26225/consoleFull)
 for   PR 3200 at commit 
[`a8eace2`](https://github.com/apache/spark/commit/a8eace2681840b90e6c5690afd317273d461e2c4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-71725639
  
  [Test build #26178 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26178/consoleFull)
 for   PR 3200 at commit 
[`5eecd48`](https://github.com/apache/spark/commit/5eecd487c54f65a9eacdf8b29eea218e0a27eb20).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DenseMatrix(`
  * `class BlockMatrix(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-71725650
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26178/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-71713783
  
  [Test build #26178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26178/consoleFull)
 for   PR 3200 at commit 
[`5eecd48`](https://github.com/apache/spark/commit/5eecd487c54f65a9eacdf8b29eea218e0a27eb20).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-27 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23596665
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
+// Number of neighboring blocks to take in each column
+val subBlocksPerCol = math.ceil(numColBlocks * 1.0 / 
partitionRatio).toInt
+// Coordinates of the block
+val i = blockRowIndex / subBlocksPerRow
+val j = blockColIndex / subBlocksPerCol
+val blocksPerRow = math.ceil(numRowBlocks * 1.0 / 
subBlocksPerRow).toInt
+j * blocksPerRow + i
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numRowBlocks == r.numRowBlocks) && (this.numColBlocks == 
r.numColBlocks) &&
+  (this.numPartitions == r.numPartitions)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ * @param nRows Number of rows of this matrix
+ * @param nCols Number of columns of this matrix
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rowsPerBlock Number of rows that make up each block. The blocks 
forming the fi

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23589329
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
+// Number of neighboring blocks to take in each column
+val subBlocksPerCol = math.ceil(numColBlocks * 1.0 / 
partitionRatio).toInt
+// Coordinates of the block
+val i = blockRowIndex / subBlocksPerRow
+val j = blockColIndex / subBlocksPerCol
+val blocksPerRow = math.ceil(numRowBlocks * 1.0 / 
subBlocksPerRow).toInt
+j * blocksPerRow + i
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numRowBlocks == r.numRowBlocks) && (this.numColBlocks == 
r.numColBlocks) &&
+  (this.numPartitions == r.numPartitions)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ * @param nRows Number of rows of this matrix
+ * @param nCols Number of columns of this matrix
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rowsPerBlock Number of rows that make up each block. The blocks 
forming the fi

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582378
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
+// Number of neighboring blocks to take in each column
+val subBlocksPerCol = math.ceil(numColBlocks * 1.0 / 
partitionRatio).toInt
+// Coordinates of the block
+val i = blockRowIndex / subBlocksPerRow
+val j = blockColIndex / subBlocksPerCol
+val blocksPerRow = math.ceil(numRowBlocks * 1.0 / 
subBlocksPerRow).toInt
+j * blocksPerRow + i
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numRowBlocks == r.numRowBlocks) && (this.numColBlocks == 
r.numColBlocks) &&
+  (this.numPartitions == r.numPartitions)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ * @param nRows Number of rows of this matrix
--- End diff --

Document the behavior when nRows is negative, e.g., whether this is 
allowed. I'm okay with marking them as required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enable

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582393
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
+// Number of neighboring blocks to take in each column
+val subBlocksPerCol = math.ceil(numColBlocks * 1.0 / 
partitionRatio).toInt
+// Coordinates of the block
+val i = blockRowIndex / subBlocksPerRow
+val j = blockColIndex / subBlocksPerCol
+val blocksPerRow = math.ceil(numRowBlocks * 1.0 / 
subBlocksPerRow).toInt
+j * blocksPerRow + i
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numRowBlocks == r.numRowBlocks) && (this.numColBlocks == 
r.numColBlocks) &&
+  (this.numPartitions == r.numPartitions)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ * @param nRows Number of rows of this matrix
+ * @param nCols Number of columns of this matrix
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rowsPerBlock Number of rows that make up each block. The blocks 
forming the fi

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582384
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
+// Number of neighboring blocks to take in each column
+val subBlocksPerCol = math.ceil(numColBlocks * 1.0 / 
partitionRatio).toInt
+// Coordinates of the block
+val i = blockRowIndex / subBlocksPerRow
+val j = blockColIndex / subBlocksPerCol
+val blocksPerRow = math.ceil(numRowBlocks * 1.0 / 
subBlocksPerRow).toInt
+j * blocksPerRow + i
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numRowBlocks == r.numRowBlocks) && (this.numColBlocks == 
r.numColBlocks) &&
+  (this.numPartitions == r.numPartitions)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ * @param nRows Number of rows of this matrix
+ * @param nCols Number of columns of this matrix
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rowsPerBlock Number of rows that make up each block. The blocks 
forming the fi

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582389
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
+// Number of neighboring blocks to take in each column
+val subBlocksPerCol = math.ceil(numColBlocks * 1.0 / 
partitionRatio).toInt
+// Coordinates of the block
+val i = blockRowIndex / subBlocksPerRow
+val j = blockColIndex / subBlocksPerCol
+val blocksPerRow = math.ceil(numRowBlocks * 1.0 / 
subBlocksPerRow).toInt
+j * blocksPerRow + i
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numRowBlocks == r.numRowBlocks) && (this.numColBlocks == 
r.numColBlocks) &&
+  (this.numPartitions == r.numPartitions)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ * @param nRows Number of rows of this matrix
+ * @param nCols Number of columns of this matrix
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rowsPerBlock Number of rows that make up each block. The blocks 
forming the fi

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582376
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
--- End diff --

This is wrong. If we have 10x20 blocks and 10 partitions. `partitionRatio = 
20` and `subBlocksPerRow = 1` and `subBlocksPerCol = 1`. Then we will have 200 
partitions. Please update the assignment login and add tests. Btw, it may be 
worth computing `subBlocksPerRow` and `subBlocksPerCol` in constructor. I would 
recommend renaming them to `numRowBlocksPerPartition` and 
`numColBlocksPerPartition`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582382
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
+// Number of neighboring blocks to take in each column
+val subBlocksPerCol = math.ceil(numColBlocks * 1.0 / 
partitionRatio).toInt
+// Coordinates of the block
+val i = blockRowIndex / subBlocksPerRow
+val j = blockColIndex / subBlocksPerCol
+val blocksPerRow = math.ceil(numRowBlocks * 1.0 / 
subBlocksPerRow).toInt
+j * blocksPerRow + i
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numRowBlocks == r.numRowBlocks) && (this.numColBlocks == 
r.numColBlocks) &&
+  (this.numPartitions == r.numPartitions)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ * @param nRows Number of rows of this matrix
+ * @param nCols Number of columns of this matrix
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rowsPerBlock Number of rows that make up each block. The blocks 
forming the fi

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582386
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
+// Number of neighboring blocks to take in each column
+val subBlocksPerCol = math.ceil(numColBlocks * 1.0 / 
partitionRatio).toInt
+// Coordinates of the block
+val i = blockRowIndex / subBlocksPerRow
+val j = blockColIndex / subBlocksPerCol
+val blocksPerRow = math.ceil(numRowBlocks * 1.0 / 
subBlocksPerRow).toInt
+j * blocksPerRow + i
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numRowBlocks == r.numRowBlocks) && (this.numColBlocks == 
r.numColBlocks) &&
+  (this.numPartitions == r.numPartitions)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ * @param nRows Number of rows of this matrix
+ * @param nCols Number of columns of this matrix
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rowsPerBlock Number of rows that make up each block. The blocks 
forming the fi

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582388
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
+// Number of neighboring blocks to take in each column
+val subBlocksPerCol = math.ceil(numColBlocks * 1.0 / 
partitionRatio).toInt
+// Coordinates of the block
+val i = blockRowIndex / subBlocksPerRow
+val j = blockColIndex / subBlocksPerCol
+val blocksPerRow = math.ceil(numRowBlocks * 1.0 / 
subBlocksPerRow).toInt
+j * blocksPerRow + i
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numRowBlocks == r.numRowBlocks) && (this.numColBlocks == 
r.numColBlocks) &&
+  (this.numPartitions == r.numPartitions)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ * @param nRows Number of rows of this matrix
+ * @param nCols Number of columns of this matrix
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rowsPerBlock Number of rows that make up each block. The blocks 
forming the fi

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582372
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
--- End diff --

Should it be called `getPartition` or `getPartitionId`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582380
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
+// Number of neighboring blocks to take in each row
+val subBlocksPerRow = math.ceil(numRowBlocks * 1.0 / 
partitionRatio).toInt
+// Number of neighboring blocks to take in each column
+val subBlocksPerCol = math.ceil(numColBlocks * 1.0 / 
partitionRatio).toInt
+// Coordinates of the block
+val i = blockRowIndex / subBlocksPerRow
+val j = blockColIndex / subBlocksPerCol
+val blocksPerRow = math.ceil(numRowBlocks * 1.0 / 
subBlocksPerRow).toInt
+j * blocksPerRow + i
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numRowBlocks == r.numRowBlocks) && (this.numColBlocks == 
r.numColBlocks) &&
+  (this.numPartitions == r.numPartitions)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ * @param nRows Number of rows of this matrix
+ * @param nCols Number of columns of this matrix
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
--- End diff --

Could it be derived from `nRows` and `rowsPerBlock`? Having `nRows`, 
`numRowBlocks`, and `rowsPerBlock` would leave space for inconsistent inputs.

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582374
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
+  // Having the number of partitions greater than the number of sub 
matrices does not help
+  override val numPartitions = math.min(numParts, numRowBlocks * 
numColBlocks)
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to. Tries to 
achieve block wise
+   * partitioning.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (blockRowIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case (blockRowIndex: Int, innerIndex: Int, blockColIndex: Int) =>
+getBlockId(blockRowIndex, blockColIndex)
+  case _ =>
+throw new IllegalArgumentException(s"Unrecognized key. key: $key")
+}
+  }
+
+  /** Partitions sub-matrices as blocks with neighboring sub-matrices. */
+  private def getBlockId(blockRowIndex: Int, blockColIndex: Int): Int = {
+val totalBlocks = numRowBlocks * numColBlocks
+// Gives the number of blocks that need to be in each partition
+val partitionRatio = math.ceil(totalBlocks * 1.0 / numPartitions).toInt
--- End diff --

`partitionRatio` -> `targetNumBlocksPerPartition`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23582369
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.{Logging, Partitioner}
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val numParts: Int) extends Partitioner {
--- End diff --

Add doc.

Remove `val`. Otherwise, `numParts` is a public field of `GridPartitioner`. 
We may also rename it to `suggestedNumPartitions` so users know that it might 
change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70945119
  
  [Test build #25923 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25923/consoleFull)
 for   PR 3200 at commit 
[`f9d664b`](https://github.com/apache/spark/commit/f9d664b7ebae288d778c7694dcc380f52e6cde30).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70945130
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25923/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70936456
  
  [Test build #25923 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25923/consoleFull)
 for   PR 3200 at commit 
[`f9d664b`](https://github.com/apache/spark/commit/f9d664b7ebae288d778c7694dcc380f52e6cde30).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23319026
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
--- End diff --

Why do we use partitioner to decide whether two matrices can multiply? It 
should be done with the matrix itself, who knows the dims and numRowsPerBlock 
and numColsPerBlock. Those information should be sufficient to decide whether 
two matrices are compatible or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23318868
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
--- End diff --

Oh, I didn't see `mllib` ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23319267
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case (rowIndex: Int, innerIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowsPerBlock == 
r.rowsPerBlock) &&
+  (this.colsPerBlock == r.colsPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowsPerBlock Number of rows that make up each block.
+   * @param colsPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowsPerBlock: Int,
+  colsPerBlock: Int) = {
--- End diff --

We cannot make such assumption a

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23271456
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case (rowIndex: Int, innerIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowsPerBlock == 
r.rowsPerBlock) &&
+  (this.colsPerBlock == r.colsPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowsPerBlock Number of rows that make up each block.
+   * @param colsPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowsPerBlock: Int,
+  colsPerBlock: Int) = {
--- End diff --

Will it really be the case that 

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23271303
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case (rowIndex: Int, innerIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowsPerBlock == 
r.rowsPerBlock) &&
+  (this.colsPerBlock == r.colsPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowsPerBlock Number of rows that make up each block.
+   * @param colsPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowsPerBlock: Int,
+  colsPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val p

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23271097
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
--- End diff --

That is what the partitioners were initially used for. If we want to 
compare if the partitioning is appropriate for `multiply` or `add`, then the 
partitioner needs to know `rowsPerBlock` and `colsPerBlock`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23271028
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
--- End diff --

There is `treeAggregate` in `getDim`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267239
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
--- End diff --

Try to be more explicit on the imports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267274
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case (rowIndex: Int, innerIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowsPerBlock == 
r.rowsPerBlock) &&
+  (this.colsPerBlock == r.colsPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
--- End diff --

private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267287
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case (rowIndex: Int, innerIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowsPerBlock == 
r.rowsPerBlock) &&
+  (this.colsPerBlock == r.colsPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowsPerBlock Number of rows that make up each block.
+   * @param colsPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowsPerBlock: Int,
+  colsPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val p

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267256
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
--- End diff --

This is a row-major partitioning. Do we want to have block partitioning?

~~~
1 5 9   13
2 6 10 14
3 7 11 15
4 8 12 16
~~~

The current implementation would have ((1, 5, 9, 13), (2, 6, 10, 14), (3, 
7, 11, 15), (4, 8, 12, 16)). A better partitioning scheme would be ((1, 2, 5, 
6), (9, 10, 13, 14), (3, 4, 7, 8), (11, 12, 15, 16)).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267266
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case (rowIndex: Int, innerIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
--- End diff --

Put the key inside the error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267245
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
--- End diff --

Does the partitioner need to know `rowsPerBlock` and `colsPerBlock`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267293
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import org.scalatest.FunSuite
+
--- End diff --

remove this line. `scalatest` counts as a 3rd-party import but not scala 
import.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267280
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case (rowIndex: Int, innerIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowsPerBlock == 
r.rowsPerBlock) &&
+  (this.colsPerBlock == r.colsPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowsPerBlock Number of rows that make up each block.
+   * @param colsPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowsPerBlock: Int,
+  colsPerBlock: Int) = {
--- End diff --

The list of arguments cannot pro

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267268
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case (rowIndex: Int, innerIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowsPerBlock == 
r.rowsPerBlock) &&
--- End diff --

Compare numRowBlocks and numColBlocks instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267249
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
--- End diff --

Is it the row index or row block index? `rowBlockIndex` would make the code 
clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23267240
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
--- End diff --

This is not necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70739754
  
  [Test build #25838 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25838/consoleFull)
 for   PR 3200 at commit 
[`1a63b20`](https://github.com/apache/spark/commit/1a63b204a59601734de986964d09515a0b5ab082).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70739766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25838/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70729927
  
  [Test build #25838 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25838/consoleFull)
 for   PR 3200 at commit 
[`1a63b20`](https://github.com/apache/spark/commit/1a63b204a59601734de986964d09515a0b5ab082).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70724262
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25830/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70724245
  
  [Test build #25830 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25830/consoleFull)
 for   PR 3200 at commit 
[`1e8bb2a`](https://github.com/apache/spark/commit/1e8bb2a592de611f3796c28798356878c6c89541).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70712379
  
  [Test build #25830 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25830/consoleFull)
 for   PR 3200 at commit 
[`1e8bb2a`](https://github.com/apache/spark/commit/1e8bb2a592de611f3796c28798356878c6c89541).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23241236
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,223 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case (rowIndex: Int, innerIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowsPerBlock == 
r.rowsPerBlock) &&
+  (this.colsPerBlock == r.colsPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
--- End diff --

Hi @mbhynes. Great question. We did have such a class, it was called 
`BlockPartition`, which later was renamed to `SubMatrix`. I did try what you 
suggested, but here was the catch. Even though the partition id's match for 
partitions, they didn't necessarily end at the same executors. Spark 
distributes the partitions deterministically, so theoretically, they should end 
at the same executors, but what happens is, an executor lags, "something" 
happens, and the partitions then get jumbled around executors. We couldn't 
(Spark doesn't) guarantee that these partitions end up on the same machine for 
fault tolerance reasons (it's how the scheduler works). Therefore we needed to 
have indices as above (which the class `SubMatrix` had). To ensure that we add 
the correct blocks with each other, calling a `.join` was inevitable. Instead 
of storing the index inside both

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-20 Thread mbhynes
Github user mbhynes commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23226987
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,223 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowsPerBlock Number of rows that make up each block.
+ * @param colsPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowsPerBlock: Int,
+val colsPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case (rowIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case (rowIndex: Int, innerIndex: Int, colIndex: Int) =>
+Utils.nonNegativeMod(rowIndex + colIndex * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowsPerBlock == 
r.rowsPerBlock) &&
+  (this.colsPerBlock == r.colsPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
--- End diff --

@brkyvz @rezazadeh I'm wondering why you are using (Int,Int) as the block 
index, as opposed to a class that implements hashCode() to partition the matrix 
in lexicographic order. Then, the call rdd.partitionBy(new 
HashPartitioner(numRowBlocks*numColBlocks)) will assure the same partitioning 
between two distributed matrices. When performing addition or 
elementwise-multiplication with other distributed matrices, the operation A_ij 
+ B_ij would be local without any shuffle.
Could you please explain the use of (Int,Int) over an Index class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70547665
  
  [Test build #25766 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25766/consoleFull)
 for   PR 3200 at commit 
[`239ab4b`](https://github.com/apache/spark/commit/239ab4b47d910428460160311165f620afb605fa).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70547676
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25766/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70540877
  
  [Test build #25766 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25766/consoleFull)
 for   PR 3200 at commit 
[`239ab4b`](https://github.com/apache/spark/commit/239ab4b47d910428460160311165f620afb605fa).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23128478
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23118561
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBlock

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-70317456
  
@brkyvz  Those are my only comments for now, except that the tests could 
also check that partitioner has the right info.

Also, I wonder if it would be better to force the RDD partitioner to be a 
grid partitioner and to use that partitioner, rather than storing a new 
instance of the partitioner (which must be kept in synch with the actual 
partitioner).  Users could construct block matrices via methods we provide, 
which would set up the RDD with the right partitioner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23106797
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23106790
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23106791
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23106803
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23106793
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23106801
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23106792
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23106795
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23106782
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23106799
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23104441
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
 ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import org.scalatest.FunSuite
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark.mllib.linalg.{DenseMatrix, Matrices, Matrix}
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+
+class BlockMatrixSuite extends FunSuite with MLlibTestSparkContext {
+
+  val m = 5
--- End diff --

I would put these fixed values in a private BlockMatrixSuite object and 
then import them inside the class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23104439
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23104436
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23104431
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
--- End diff --

Rename "rowPerBlock" --> "rowsPerBlock"?  Same for "colPerBlock" and in 
other places in this file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23104435
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBl

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23104432
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =>
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =>
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =>
+throw new IllegalArgumentException("Unrecognized key")
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =>
+(this.numPartitions == r.numPartitions) && (this.rowPerBlock == 
r.rowPerBlock) &&
+  (this.colPerBlock == r.colPerBlock)
+  case _ =>
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
--- End diff --

I assume this constructor should be private:
```
class BlockMatrix private (...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-69873803
  
  [Test build #25512 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25512/consoleFull)
 for   PR 3200 at commit 
[`ba414d2`](https://github.com/apache/spark/commit/ba414d2c6a16de987c9aa456dafa193d191014d5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-69873811
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25512/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-69869900
  
  [Test build #25512 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25512/consoleFull)
 for   PR 3200 at commit 
[`ba414d2`](https://github.com/apache/spark/commit/ba414d2c6a16de987c9aa456dafa193d191014d5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-69866191
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25507/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-69866190
  
  [Test build #25507 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25507/consoleFull)
 for   PR 3200 at commit 
[`ab6cde0`](https://github.com/apache/spark/commit/ab6cde0d90b917e89f97c86ef3c84dcdc64a9b57).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-69866135
  
  [Test build #25507 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25507/consoleFull)
 for   PR 3200 at commit 
[`ab6cde0`](https://github.com/apache/spark/commit/ab6cde0d90b917e89f97c86ef3c84dcdc64a9b57).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63883302
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23679/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63883289
  
  [Test build #23679 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23679/consoleFull)
 for   PR 3200 at commit 
[`9ae85aa`](https://github.com/apache/spark/commit/9ae85aa1ebabdc099d7f655bc1d9021d34d2910f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubMatrix(blockRowIndex: Int, blockColIndex: Int, mat: 
DenseMatrix) extends Serializable`
  * `class GridPartitioner(`
  * `class RowBasedPartitioner(`
  * `class ColumnBasedPartitioner(`
  * `class BlockMatrix(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63870038
  
  [Test build #23679 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23679/consoleFull)
 for   PR 3200 at commit 
[`9ae85aa`](https://github.com/apache/spark/commit/9ae85aa1ebabdc099d7f655bc1d9021d34d2910f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63154793
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23402/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63154789
  
  [Test build #23402 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23402/consoleFull)
 for   PR 3200 at commit 
[`d033861`](https://github.com/apache/spark/commit/d033861d5a2f88b223f601feb4445308399901e8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubMatrix(blockRowIndex: Int, blockColIndex: Int, mat: 
DenseMatrix) extends Serializable`
  * `class GridPartitioner(`
  * `class RowBasedPartitioner(`
  * `class ColumnBasedPartitioner(`
  * `class BlockMatrix(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63150236
  
  [Test build #23402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23402/consoleFull)
 for   PR 3200 at commit 
[`d033861`](https://github.com/apache/spark/commit/d033861d5a2f88b223f601feb4445308399901e8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63138048
  
  [Test build #23383 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23383/consoleFull)
 for   PR 3200 at commit 
[`49b9586`](https://github.com/apache/spark/commit/49b9586a18cf7338a46c88a51ed23890914d3be4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubMatrix(blockRowIndex: Int, blockColIndex: Int, mat: 
DenseMatrix) extends Serializable`
  * `case class SubMatrixInfo(`
  * `abstract class BlockMatrixPartitioner(`
  * `class GridPartitioner(`
  * `class RowBasedPartitioner(`
  * `class ColumnBasedPartitioner(`
  * `class BlockMatrix(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63138057
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23383/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63130191
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23382/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63130184
  
  [Test build #23382 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23382/consoleFull)
 for   PR 3200 at commit 
[`b05aabb`](https://github.com/apache/spark/commit/b05aabbdd5c3f5db69bbaff3582139b691d696fa).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63126875
  
  [Test build #23383 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23383/consoleFull)
 for   PR 3200 at commit 
[`49b9586`](https://github.com/apache/spark/commit/49b9586a18cf7338a46c88a51ed23890914d3be4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63123607
  
  [Test build #23382 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23382/consoleFull)
 for   PR 3200 at commit 
[`b05aabb`](https://github.com/apache/spark/commit/b05aabbdd5c3f5db69bbaff3582139b691d696fa).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63123158
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23379/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63123151
  
  [Test build #23379 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23379/consoleFull)
 for   PR 3200 at commit 
[`19c17e8`](https://github.com/apache/spark/commit/19c17e8d1594a3f7bd5a973a09b341de3a1c857a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63122272
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23378/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63122251
  
  [Test build #23378 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23378/consoleFull)
 for   PR 3200 at commit 
[`589fbb6`](https://github.com/apache/spark/commit/589fbb65478851d88ea5a7f5bf54c1fa8d53f055).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubMatrix(blockIdRow: Int, blockIdCol: Int, mat: 
DenseMatrix) extends Serializable`
  * `case class SubMatrixInfo(`
  * `abstract class BlockMatrixPartitioner(`
  * `class GridPartitioner(`
  * `class RowBasedPartitioner(`
  * `class ColumnBasedPartitioner(`
  * `class BlockMatrix(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63116214
  
  [Test build #23379 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23379/consoleFull)
 for   PR 3200 at commit 
[`19c17e8`](https://github.com/apache/spark/commit/19c17e8d1594a3f7bd5a973a09b341de3a1c857a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63115145
  
  [Test build #23378 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23378/consoleFull)
 for   PR 3200 at commit 
[`589fbb6`](https://github.com/apache/spark/commit/589fbb65478851d88ea5a7f5bf54c1fa8d53f055).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-14 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r20378789
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,331 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix => BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg.DenseMatrix
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * Represents a local matrix that makes up one block of a distributed 
BlockMatrix
+ *
+ * @param blockIdRow The row index of this block
+ * @param blockIdCol The column index of this block
+ * @param mat The underlying local matrix
+ */
+case class BlockPartition(blockIdRow: Int, blockIdCol: Int, mat: 
DenseMatrix) extends Serializable
+
+/**
+ * Information about the BlockMatrix maintained on the driver
+ *
+ * @param partitionId The id of the partition the block is found in
+ * @param blockIdRow The row index of this block
+ * @param blockIdCol The column index of this block
+ * @param startRow The starting row index with respect to the distributed 
BlockMatrix
+ * @param numRows The number of rows in this block
+ * @param startCol The starting column index with respect to the 
distributed BlockMatrix
+ * @param numCols The number of columns in this block
+ */
+case class BlockPartitionInfo(
+partitionId: Int,
+blockIdRow: Int,
+blockIdCol: Int,
+startRow: Long,
--- End diff --

That's a good question... I left it in there for the following reason, not 
sure if it's going to be ever required, but in the end, I thought it would be 
good to cover corner cases such as irregular grids:
Assume you have a matrix A with dimensions 280 x d. Assume each `SubMatrix` 
has a dimension 30 x d/3. The last row will consist of SubMatrices 10 x d/3.
Then you vertically append a Matrix B, with dimensions n x d. Then you're 
left with an irregular grid.
Maybe vertical concatenation is not as common as horizontal concatenation, 
but being ready to support such operations seems beneficial for users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-13 Thread brkyvz
Github user brkyvz commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-62982228
  
@mengxr 


> If we have two block matrices, A and B, and A's column block partitioning 
matches B's row block partitioning, can we take advantage of this fact in 
computing A * B? I support having only one block matrix partitioner 
implementation. Then we do the following:
> 
> if (A.partitioner.colBlockPartitioner == 
B.partitioner.rowBlockPartitioner) {
>   // zip ...
> } else {
>   ...
> }

By `partitioner.rowBlockPartitioner` and `partitioner.colBlockPartitioner`, 
are you talking about the number of blocks that form the rows and the number of 
rows per block match?

One problem with zip was that I couldn't guarantee data locality. I tried 
to force it, but the best way to force it turns out to be a join...





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >