[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-72128781 closing this PR as a lot of functionality has changed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user brkyvz closed the pull request at: https://github.com/apache/spark/pull/2451 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56245433 @brkyvz I've made a rough pass, and have listed all of my comments. I can make future passes as needed. Lots of work & it will be great to have! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17813577 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala --- @@ -169,4 +169,67 @@ object TestingUtils { override def toString = x.toString } + case class CompareMatrixRightSide( + fun: (Matrix, Matrix, Double) => Boolean, y: Matrix, eps: Double, method: String) + + /** + * Implicit class for comparing two matrices using relative tolerance or absolute tolerance. + */ + implicit class MatrixWithAlmostEquals(val x: Matrix) { + +/** + * When the difference of two vectors are within eps, returns true; otherwise, returns false. + */ +def ~=(r: CompareMatrixRightSide): Boolean = r.fun(x, r.y, r.eps) + +/** + * When the difference of two vectors are within eps, returns false; otherwise, returns true. + */ +def !~=(r: CompareMatrixRightSide): Boolean = !r.fun(x, r.y, r.eps) + +/** + * Throws exception when the difference of two vectors are NOT within eps; + * otherwise, returns true. + */ +def ~==(r: CompareMatrixRightSide): Boolean = { + if (!r.fun(x, r.y, r.eps)) { +throw new TestFailedException( + s"Expected \n$x\n and \n${r.y}\n to be within ${r.eps}${r.method} for all elements.", 0) + } + true +} + +/** + * Throws exception when the difference of two matrices are within eps; otherwise, returns true. + */ +def !~==(r: CompareMatrixRightSide): Boolean = { + if (r.fun(x, r.y, r.eps)) { +throw new TestFailedException( + s"Did not expect \n$x\n and \n${r.y}\n to be within " + +"${r.eps}${r.method} for all elements.", 0) + } + true +} + +/** + * Comparison using absolute tolerance. + */ +def absTol(eps: Double): CompareMatrixRightSide = CompareMatrixRightSide( + (x: Matrix, y: Matrix, eps: Double) => { +x.toArray.zip(y.toArray).forall(x => x._1 ~= x._2 absTol eps) --- End diff -- confusing having 2 things called x --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17813573 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala --- @@ -169,4 +169,67 @@ object TestingUtils { override def toString = x.toString } + case class CompareMatrixRightSide( + fun: (Matrix, Matrix, Double) => Boolean, y: Matrix, eps: Double, method: String) + + /** + * Implicit class for comparing two matrices using relative tolerance or absolute tolerance. + */ + implicit class MatrixWithAlmostEquals(val x: Matrix) { + +/** + * When the difference of two vectors are within eps, returns true; otherwise, returns false. + */ +def ~=(r: CompareMatrixRightSide): Boolean = r.fun(x, r.y, r.eps) + +/** + * When the difference of two vectors are within eps, returns false; otherwise, returns true. + */ +def !~=(r: CompareMatrixRightSide): Boolean = !r.fun(x, r.y, r.eps) + +/** + * Throws exception when the difference of two vectors are NOT within eps; + * otherwise, returns true. + */ +def ~==(r: CompareMatrixRightSide): Boolean = { + if (!r.fun(x, r.y, r.eps)) { +throw new TestFailedException( + s"Expected \n$x\n and \n${r.y}\n to be within ${r.eps}${r.method} for all elements.", 0) + } + true +} + +/** + * Throws exception when the difference of two matrices are within eps; otherwise, returns true. + */ +def !~==(r: CompareMatrixRightSide): Boolean = { + if (r.fun(x, r.y, r.eps)) { +throw new TestFailedException( + s"Did not expect \n$x\n and \n${r.y}\n to be within " + +"${r.eps}${r.method} for all elements.", 0) + } + true +} + +/** + * Comparison using absolute tolerance. + */ +def absTol(eps: Double): CompareMatrixRightSide = CompareMatrixRightSide( + (x: Matrix, y: Matrix, eps: Double) => { +x.toArray.zip(y.toArray).forall(x => x._1 ~= x._2 absTol eps) + }, x, eps, ABS_TOL_MSG) + +/** + * Comparison using relative tolerance. Note that comparing against sparse vector + * with elements having value of zero will raise exception because it involves with + * comparing against zero. + */ +def relTol(eps: Double): CompareMatrixRightSide = CompareMatrixRightSide( + (x: Matrix, y: Matrix, eps: Double) => { +x.toArray.zip(y.toArray).forall(x => x._1 ~= x._2 relTol eps) --- End diff -- confusing having 2 things called "x" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17813479 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescentSuite.scala --- @@ -0,0 +1,444 @@ +package org.apache.spark.mllib.optimization + +import scala.collection.JavaConversions._ +import scala.util.Random + +import org.scalatest.{FunSuite, Matchers} + +import org.apache.spark.mllib.linalg.{DenseMatrix, Matrices, Vectors} +import org.apache.spark.mllib.regression._ +import org.apache.spark.mllib.util.{LinearDataGenerator, LocalClusterSparkContext, LocalSparkContext} +import org.apache.spark.mllib.util.TestingUtils._ + +object MultiModelGradientDescentSuite { + + def generateLogisticInputAsList( + offset: Double, + scale: Double, + nPoints: Int, + seed: Int): java.util.List[LabeledPoint] = { +seqAsJavaList(generateGDInput(offset, scale, nPoints, seed)) + } + + // Generate input of the form Y = logistic(offset + scale * X) + def generateGDInput( + offset: Double, + scale: Double, + nPoints: Int, + seed: Int): Seq[LabeledPoint] = { +val rnd = new Random(seed) +val x1 = Array.fill[Double](nPoints)(rnd.nextGaussian()) + +val unifRand = new Random(45) +val rLogis = (0 until nPoints).map { i => + val u = unifRand.nextDouble() + math.log(u) - math.log(1.0-u) +} + +val y: Seq[Int] = (0 until nPoints).map { i => + val yVal = offset + scale * x1(i) + rLogis(i) + if (yVal > 0) 1 else 0 +} + +(0 until nPoints).map(i => LabeledPoint(y(i), Vectors.dense(x1(i + } + + def generateSVMInputAsList( + intercept: Double, + weights: Array[Double], + nPoints: Int, + seed: Int): java.util.List[LabeledPoint] = { +seqAsJavaList(generateSVMInput(intercept, weights, nPoints, seed)) + } + + // Generate noisy input of the form Y = signum(x.dot(weights) + intercept + noise) + def generateSVMInput( +intercept: Double, +weights: Array[Double], +nPoints: Int, +seed: Int): Seq[LabeledPoint] = { +val rnd = new Random(seed) +val weightsMat = new DenseMatrix(weights.length, 1, weights) +val x = Array.fill[Array[Double]](nPoints)( + Array.fill[Double](weights.length)(rnd.nextDouble() * 2.0 - 1.0)) +val y = x.map { xi => + val yD = (new DenseMatrix(1, xi.length, xi) multiply weightsMat) + +intercept + 0.01 * rnd.nextGaussian() + if (yD.toArray(0) < 0) 0.0 else 1.0 +} +y.zip(x).map(p => LabeledPoint(p._1, Vectors.dense(p._2))) + } +} + +class MultiModelGradientDescentSuite extends FunSuite with LocalSparkContext with Matchers { + test("Assert the loss is decreasing.") { +val nPoints = 1 +val A = 2.0 +val B = -1.5 + +val initialB = -1.0 +val initialWeights = Array(initialB) + +val gradient = new MultiModelLogisticGradient() +val updater: Array[MultiModelUpdater] = Array(new MultiModelSimpleUpdater()) +val stepSize = Array(1.0, 0.1) +val numIterations = Array(10) +val regParam = Array(0.0) +val miniBatchFrac = 1.0 + +// Add a extra variable consisting of all 1.0's for the intercept. +val testData = GradientDescentSuite.generateGDInput(A, B, nPoints, 42) +val data = testData.map { case LabeledPoint(label, features) => + label -> Vectors.dense(1.0 +: features.toArray) +} + +val dataRDD = sc.parallelize(data, 2).cache() +val initialWeightsWithIntercept = Vectors.dense(1.0 +: initialWeights.toArray) + +val (_, loss) = MultiModelGradientDescent.runMiniBatchMMSGD( + dataRDD, + gradient, + updater, + stepSize, + numIterations, + regParam, + miniBatchFrac, + initialWeightsWithIntercept) + +assert(loss.last(0) - loss.head(0) < 0, "loss isn't decreasing.") + +val lossDiff = loss.init.zip(loss.tail).map { case (lhs, rhs) => lhs(0) - rhs(0) } +assert(lossDiff.count(_ > 0).toDouble / lossDiff.size > 0.8) + } + + test("Test the loss and gradient of first iteration with regularization.") { + +val gradient = new MultiModelLogisticGradient()
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17813038 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeMatrixConversionSuite.scala --- @@ -37,4 +37,26 @@ class BreezeMatrixConversionSuite extends FunSuite { assert(mat.numCols === breeze.cols) assert(mat.values.eq(breeze.data), "should not copy data") } + + test("sparse matrix to breeze") { +val values = Array(1.0, 2.0, 4.0, 5.0) +val colPtrs = Array(0, 2, 4) +val rowIndices = Array(1, 2, 1, 2) +val mat = Matrices.sparse(3, 2, colPtrs, rowIndices, values) +val breeze = mat.toBreeze.asInstanceOf[BSM[Double]] +assert(breeze.rows === mat.numRows) +assert(breeze.cols === mat.numCols) +assert(breeze.data.eq(mat.asInstanceOf[SparseMatrix].values), "should not copy data") + } + + test("sparse breeze matrix to sparse matrix") { --- End diff -- Ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17813034 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeMatrixConversionSuite.scala --- @@ -37,4 +37,26 @@ class BreezeMatrixConversionSuite extends FunSuite { assert(mat.numCols === breeze.cols) assert(mat.values.eq(breeze.data), "should not copy data") } + + test("sparse matrix to breeze") { --- End diff -- Check values too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17812584 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Updater.scala --- @@ -145,12 +150,151 @@ class SquaredL2Updater extends Updater { // w' = w - thisIterStepSize * (gradient + regParam * w) // w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient val thisIterStepSize = stepSize / math.sqrt(iter) -val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector -brzWeights :*= (1.0 - thisIterStepSize * regParam) -brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights) -val norm = brzNorm(brzWeights, 2.0) +scal(1.0 - thisIterStepSize * regParam, weightsOld) +axpy(-thisIterStepSize, gradient, weightsOld) +val norm = brzNorm(weightsOld.toBreeze, 2.0) -(Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm) +(weightsOld, 0.5 * regParam * norm * norm) } } +/** + * :: DeveloperApi :: + * Class used to perform steps (weight update) using Gradient Descent methods. + * + * For general minimization problems, or for regularized problems of the form + * min L(w) + regParam * R(w), + * the compute function performs the actual update step, when given some + * (e.g. stochastic) gradient direction for the loss L(w), + * and a desired step-size (learning rate). + * + * The updater is responsible to also perform the update coming from the + * regularization term R(w) (if any regularization is used). + */ +@DeveloperApi +abstract class MultiModelUpdater extends Serializable { + /** + * Compute an updated value for weights given the gradient, stepSize, iteration number and + * regularization parameter. Also returns the regularization value regParam * R(w) + * computed using the *updated* weights. + * + * @param weightsOld - Column matrix of size dx1 where d is the number of features. + * @param gradient - Column matrix of size dx1 where d is the number of features. + * @param stepSize - step size across iterations + * @param iter - Iteration number + * @param regParam - Regularization parameter + * + * @return A tuple of 2 elements. The first element is a column matrix containing updated weights, + * and the second element is the regularization value computed using updated weights. + */ + def compute( + weightsOld: DenseMatrix, + gradient: DenseMatrix, + stepSize: DenseMatrix, + iter: Int, + regParam: Matrix): (DenseMatrix, Matrix) +} + +/** + * :: DeveloperApi :: + * A simple updater for gradient descent *without* any regularization. + * Uses a step-size decreasing with the square root of the number of iterations. + */ +@DeveloperApi +class MultiModelSimpleUpdater extends MultiModelUpdater { + def compute( + weightsOld: DenseMatrix, + gradient: DenseMatrix, + stepSize: DenseMatrix, + iter: Int, + regParam: Matrix): (DenseMatrix, Matrix) = { +val thisIterStepSize = + SparseMatrix.diag(Vectors.dense(stepSize.map(-_ / sqrt(iter)).toArray)) + +gemm(1.0, gradient,thisIterStepSize, 1.0, weightsOld) --- End diff -- spacing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17812518 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Updater.scala --- @@ -145,12 +150,151 @@ class SquaredL2Updater extends Updater { // w' = w - thisIterStepSize * (gradient + regParam * w) // w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient val thisIterStepSize = stepSize / math.sqrt(iter) -val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector -brzWeights :*= (1.0 - thisIterStepSize * regParam) -brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights) -val norm = brzNorm(brzWeights, 2.0) +scal(1.0 - thisIterStepSize * regParam, weightsOld) +axpy(-thisIterStepSize, gradient, weightsOld) +val norm = brzNorm(weightsOld.toBreeze, 2.0) -(Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm) +(weightsOld, 0.5 * regParam * norm * norm) } } +/** + * :: DeveloperApi :: + * Class used to perform steps (weight update) using Gradient Descent methods. + * + * For general minimization problems, or for regularized problems of the form + * min L(w) + regParam * R(w), + * the compute function performs the actual update step, when given some + * (e.g. stochastic) gradient direction for the loss L(w), + * and a desired step-size (learning rate). + * + * The updater is responsible to also perform the update coming from the + * regularization term R(w) (if any regularization is used). + */ +@DeveloperApi +abstract class MultiModelUpdater extends Serializable { + /** + * Compute an updated value for weights given the gradient, stepSize, iteration number and + * regularization parameter. Also returns the regularization value regParam * R(w) + * computed using the *updated* weights. + * + * @param weightsOld - Column matrix of size dx1 where d is the number of features. --- End diff -- update doc (matrix size) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17812287 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Updater.scala --- @@ -111,18 +112,22 @@ class L1Updater extends Updater { regParam: Double): (Vector, Double) = { val thisIterStepSize = stepSize / math.sqrt(iter) // Take gradient step -val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector -brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights) +//println(s"\n$iter:") --- End diff -- old comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17811312 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ArrayBuffer + +import breeze.linalg.{DenseVector => BDV} + +import org.apache.spark.annotation.{Experimental, DeveloperApi} +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ + +class MultiModelGradientDescent private[mllib] ( +private var gradient: MultiModelGradient, +private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging { + + private var stepSize: Array[Double] = Array(1.0, 0.1) + private var numIterations: Array[Int] = Array(100) + private var regParam: Array[Double] = Array(0.0, 0.1, 1.0) + private var miniBatchFraction: Double = 1.0 + + /** + * Set the initial step size of SGD for the first step. Default (1.0, 0.1). + * In subsequent steps, the step size will decrease with stepSize/sqrt(t) + */ + def setStepSize(step: Array[Double]): this.type = { +this.stepSize = step +this + } + + /** + * :: Experimental :: + * Set fraction of data to be used for each SGD iteration. + * Default 1.0 (corresponding to deterministic/classical gradient descent) + */ + @Experimental + def setMiniBatchFraction(fraction: Double): this.type = { +this.miniBatchFraction = fraction +this + } + + /** + * Set the number of iterations for SGD. Default 100. + */ + def setNumIterations(iters: Array[Int]): this.type = { +this.numIterations = iters +this + } + + /** + * Set the regularization parameter. Default (0.0, 0.1, 1.0). + */ + def setRegParam(regParam: Array[Double]): this.type = { +this.regParam = regParam +this + } + + /** + * Set the gradient function (of the loss function of one single data example) + * to be used for SGD. + */ + def setGradient(gradient: MultiModelGradient): this.type = { +this.gradient = gradient +this + } + + + /** + * Set the updater function to actually perform a gradient step in a given direction. + * The updater is responsible to perform the update from the regularization term as well, + * and therefore determines what kind or regularization is used, if any. + */ + def setUpdater(updater: Array[MultiModelUpdater]): this.type = { +this.updater = updater +this + } + + /** + * :: DeveloperApi :: + * Runs gradient descent on the given training data. + * @param data training data + * @param initialWeights initial weights + * @return solution vector + */ + @DeveloperApi + def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = { +val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD( + data, + gradient, + updater, + stepSize, + numIterations, + regParam, + miniBatchFraction, + initialWeights) +weights + } + +} + +/** + * :: DeveloperApi :: + * Top-level method to run gradient descent. + */ +@DeveloperApi +object MultiModelGradientDescent extends Logging { + /** + * Run stochastic gradient descent (SGD) in parallel using mini batches. + * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data + * in order to compute a gradient estimate. + * Sampling, and averaging the subgradients over this subset is performed using one standard + * spark map-reduce in each iteration.
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17811267 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ArrayBuffer + +import breeze.linalg.{DenseVector => BDV} + +import org.apache.spark.annotation.{Experimental, DeveloperApi} +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ + +class MultiModelGradientDescent private[mllib] ( +private var gradient: MultiModelGradient, +private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging { + + private var stepSize: Array[Double] = Array(1.0, 0.1) + private var numIterations: Array[Int] = Array(100) + private var regParam: Array[Double] = Array(0.0, 0.1, 1.0) + private var miniBatchFraction: Double = 1.0 + + /** + * Set the initial step size of SGD for the first step. Default (1.0, 0.1). + * In subsequent steps, the step size will decrease with stepSize/sqrt(t) + */ + def setStepSize(step: Array[Double]): this.type = { +this.stepSize = step +this + } + + /** + * :: Experimental :: + * Set fraction of data to be used for each SGD iteration. + * Default 1.0 (corresponding to deterministic/classical gradient descent) + */ + @Experimental + def setMiniBatchFraction(fraction: Double): this.type = { +this.miniBatchFraction = fraction +this + } + + /** + * Set the number of iterations for SGD. Default 100. + */ + def setNumIterations(iters: Array[Int]): this.type = { +this.numIterations = iters +this + } + + /** + * Set the regularization parameter. Default (0.0, 0.1, 1.0). + */ + def setRegParam(regParam: Array[Double]): this.type = { +this.regParam = regParam +this + } + + /** + * Set the gradient function (of the loss function of one single data example) + * to be used for SGD. + */ + def setGradient(gradient: MultiModelGradient): this.type = { +this.gradient = gradient +this + } + + + /** + * Set the updater function to actually perform a gradient step in a given direction. + * The updater is responsible to perform the update from the regularization term as well, + * and therefore determines what kind or regularization is used, if any. + */ + def setUpdater(updater: Array[MultiModelUpdater]): this.type = { +this.updater = updater +this + } + + /** + * :: DeveloperApi :: + * Runs gradient descent on the given training data. + * @param data training data + * @param initialWeights initial weights + * @return solution vector + */ + @DeveloperApi + def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = { +val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD( + data, + gradient, + updater, + stepSize, + numIterations, + regParam, + miniBatchFraction, + initialWeights) +weights + } + +} + +/** + * :: DeveloperApi :: + * Top-level method to run gradient descent. + */ +@DeveloperApi +object MultiModelGradientDescent extends Logging { + /** + * Run stochastic gradient descent (SGD) in parallel using mini batches. + * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data + * in order to compute a gradient estimate. + * Sampling, and averaging the subgradients over this subset is performed using one standard + * spark map-reduce in each iteration.
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17811169 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ArrayBuffer + +import breeze.linalg.{DenseVector => BDV} + +import org.apache.spark.annotation.{Experimental, DeveloperApi} +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ + +class MultiModelGradientDescent private[mllib] ( +private var gradient: MultiModelGradient, +private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging { + + private var stepSize: Array[Double] = Array(1.0, 0.1) + private var numIterations: Array[Int] = Array(100) + private var regParam: Array[Double] = Array(0.0, 0.1, 1.0) + private var miniBatchFraction: Double = 1.0 + + /** + * Set the initial step size of SGD for the first step. Default (1.0, 0.1). + * In subsequent steps, the step size will decrease with stepSize/sqrt(t) + */ + def setStepSize(step: Array[Double]): this.type = { +this.stepSize = step +this + } + + /** + * :: Experimental :: + * Set fraction of data to be used for each SGD iteration. + * Default 1.0 (corresponding to deterministic/classical gradient descent) + */ + @Experimental + def setMiniBatchFraction(fraction: Double): this.type = { +this.miniBatchFraction = fraction +this + } + + /** + * Set the number of iterations for SGD. Default 100. + */ + def setNumIterations(iters: Array[Int]): this.type = { +this.numIterations = iters +this + } + + /** + * Set the regularization parameter. Default (0.0, 0.1, 1.0). + */ + def setRegParam(regParam: Array[Double]): this.type = { +this.regParam = regParam +this + } + + /** + * Set the gradient function (of the loss function of one single data example) + * to be used for SGD. + */ + def setGradient(gradient: MultiModelGradient): this.type = { +this.gradient = gradient +this + } + + + /** + * Set the updater function to actually perform a gradient step in a given direction. + * The updater is responsible to perform the update from the regularization term as well, + * and therefore determines what kind or regularization is used, if any. + */ + def setUpdater(updater: Array[MultiModelUpdater]): this.type = { +this.updater = updater +this + } + + /** + * :: DeveloperApi :: + * Runs gradient descent on the given training data. + * @param data training data + * @param initialWeights initial weights + * @return solution vector + */ + @DeveloperApi + def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = { +val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD( + data, + gradient, + updater, + stepSize, + numIterations, + regParam, + miniBatchFraction, + initialWeights) +weights + } + +} + +/** + * :: DeveloperApi :: + * Top-level method to run gradient descent. + */ +@DeveloperApi +object MultiModelGradientDescent extends Logging { + /** + * Run stochastic gradient descent (SGD) in parallel using mini batches. + * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data + * in order to compute a gradient estimate. + * Sampling, and averaging the subgradients over this subset is performed using one standard + * spark map-reduce in each iteration.
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17811037 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ArrayBuffer + +import breeze.linalg.{DenseVector => BDV} + +import org.apache.spark.annotation.{Experimental, DeveloperApi} +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ + +class MultiModelGradientDescent private[mllib] ( +private var gradient: MultiModelGradient, +private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging { + + private var stepSize: Array[Double] = Array(1.0, 0.1) + private var numIterations: Array[Int] = Array(100) + private var regParam: Array[Double] = Array(0.0, 0.1, 1.0) + private var miniBatchFraction: Double = 1.0 + + /** + * Set the initial step size of SGD for the first step. Default (1.0, 0.1). + * In subsequent steps, the step size will decrease with stepSize/sqrt(t) + */ + def setStepSize(step: Array[Double]): this.type = { +this.stepSize = step +this + } + + /** + * :: Experimental :: + * Set fraction of data to be used for each SGD iteration. + * Default 1.0 (corresponding to deterministic/classical gradient descent) + */ + @Experimental + def setMiniBatchFraction(fraction: Double): this.type = { +this.miniBatchFraction = fraction +this + } + + /** + * Set the number of iterations for SGD. Default 100. + */ + def setNumIterations(iters: Array[Int]): this.type = { +this.numIterations = iters +this + } + + /** + * Set the regularization parameter. Default (0.0, 0.1, 1.0). + */ + def setRegParam(regParam: Array[Double]): this.type = { +this.regParam = regParam +this + } + + /** + * Set the gradient function (of the loss function of one single data example) + * to be used for SGD. + */ + def setGradient(gradient: MultiModelGradient): this.type = { +this.gradient = gradient +this + } + + + /** + * Set the updater function to actually perform a gradient step in a given direction. + * The updater is responsible to perform the update from the regularization term as well, + * and therefore determines what kind or regularization is used, if any. + */ + def setUpdater(updater: Array[MultiModelUpdater]): this.type = { +this.updater = updater +this + } + + /** + * :: DeveloperApi :: + * Runs gradient descent on the given training data. + * @param data training data + * @param initialWeights initial weights + * @return solution vector + */ + @DeveloperApi + def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = { +val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD( + data, + gradient, + updater, + stepSize, + numIterations, + regParam, + miniBatchFraction, + initialWeights) +weights + } + +} + +/** + * :: DeveloperApi :: + * Top-level method to run gradient descent. + */ +@DeveloperApi +object MultiModelGradientDescent extends Logging { + /** + * Run stochastic gradient descent (SGD) in parallel using mini batches. + * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data + * in order to compute a gradient estimate. + * Sampling, and averaging the subgradients over this subset is performed using one standard + * spark map-reduce in each iteration.
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56239321 @brkyvz Let's try to split this PR into small ones. For example, functions like factory methods for sparse matrices should not be included in this PR. We want to keep the vector and matrix classes in MLlib simple and let user use breeze for linear algebra operations. If breeze has performance issues, maybe we should contribute the optimization to breeze to centralize the effort on single-machine linear algebra computation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17810895 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ArrayBuffer + +import breeze.linalg.{DenseVector => BDV} + +import org.apache.spark.annotation.{Experimental, DeveloperApi} +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ + +class MultiModelGradientDescent private[mllib] ( +private var gradient: MultiModelGradient, +private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging { + + private var stepSize: Array[Double] = Array(1.0, 0.1) + private var numIterations: Array[Int] = Array(100) + private var regParam: Array[Double] = Array(0.0, 0.1, 1.0) + private var miniBatchFraction: Double = 1.0 + + /** + * Set the initial step size of SGD for the first step. Default (1.0, 0.1). + * In subsequent steps, the step size will decrease with stepSize/sqrt(t) + */ + def setStepSize(step: Array[Double]): this.type = { +this.stepSize = step +this + } + + /** + * :: Experimental :: + * Set fraction of data to be used for each SGD iteration. + * Default 1.0 (corresponding to deterministic/classical gradient descent) + */ + @Experimental + def setMiniBatchFraction(fraction: Double): this.type = { +this.miniBatchFraction = fraction +this + } + + /** + * Set the number of iterations for SGD. Default 100. + */ + def setNumIterations(iters: Array[Int]): this.type = { +this.numIterations = iters +this + } + + /** + * Set the regularization parameter. Default (0.0, 0.1, 1.0). + */ + def setRegParam(regParam: Array[Double]): this.type = { +this.regParam = regParam +this + } + + /** + * Set the gradient function (of the loss function of one single data example) + * to be used for SGD. + */ + def setGradient(gradient: MultiModelGradient): this.type = { +this.gradient = gradient +this + } + + + /** + * Set the updater function to actually perform a gradient step in a given direction. + * The updater is responsible to perform the update from the regularization term as well, + * and therefore determines what kind or regularization is used, if any. + */ + def setUpdater(updater: Array[MultiModelUpdater]): this.type = { +this.updater = updater +this + } + + /** + * :: DeveloperApi :: + * Runs gradient descent on the given training data. + * @param data training data + * @param initialWeights initial weights + * @return solution vector + */ + @DeveloperApi + def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = { +val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD( + data, + gradient, + updater, + stepSize, + numIterations, + regParam, + miniBatchFraction, + initialWeights) +weights + } + +} + +/** + * :: DeveloperApi :: + * Top-level method to run gradient descent. + */ +@DeveloperApi +object MultiModelGradientDescent extends Logging { + /** + * Run stochastic gradient descent (SGD) in parallel using mini batches. + * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data + * in order to compute a gradient estimate. + * Sampling, and averaging the subgradients over this subset is performed using one standard + * spark map-reduce in each iteration.
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17810823 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ArrayBuffer + +import breeze.linalg.{DenseVector => BDV} + +import org.apache.spark.annotation.{Experimental, DeveloperApi} +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ + +class MultiModelGradientDescent private[mllib] ( +private var gradient: MultiModelGradient, +private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging { + + private var stepSize: Array[Double] = Array(1.0, 0.1) + private var numIterations: Array[Int] = Array(100) + private var regParam: Array[Double] = Array(0.0, 0.1, 1.0) + private var miniBatchFraction: Double = 1.0 + + /** + * Set the initial step size of SGD for the first step. Default (1.0, 0.1). + * In subsequent steps, the step size will decrease with stepSize/sqrt(t) + */ + def setStepSize(step: Array[Double]): this.type = { +this.stepSize = step +this + } + + /** + * :: Experimental :: + * Set fraction of data to be used for each SGD iteration. + * Default 1.0 (corresponding to deterministic/classical gradient descent) + */ + @Experimental + def setMiniBatchFraction(fraction: Double): this.type = { +this.miniBatchFraction = fraction +this + } + + /** + * Set the number of iterations for SGD. Default 100. + */ + def setNumIterations(iters: Array[Int]): this.type = { +this.numIterations = iters +this + } + + /** + * Set the regularization parameter. Default (0.0, 0.1, 1.0). + */ + def setRegParam(regParam: Array[Double]): this.type = { +this.regParam = regParam +this + } + + /** + * Set the gradient function (of the loss function of one single data example) + * to be used for SGD. + */ + def setGradient(gradient: MultiModelGradient): this.type = { +this.gradient = gradient +this + } + + + /** + * Set the updater function to actually perform a gradient step in a given direction. + * The updater is responsible to perform the update from the regularization term as well, + * and therefore determines what kind or regularization is used, if any. + */ + def setUpdater(updater: Array[MultiModelUpdater]): this.type = { +this.updater = updater +this + } + + /** + * :: DeveloperApi :: + * Runs gradient descent on the given training data. + * @param data training data + * @param initialWeights initial weights + * @return solution vector + */ + @DeveloperApi + def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = { +val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD( + data, + gradient, + updater, + stepSize, + numIterations, + regParam, + miniBatchFraction, + initialWeights) +weights + } + +} + +/** + * :: DeveloperApi :: + * Top-level method to run gradient descent. + */ +@DeveloperApi +object MultiModelGradientDescent extends Logging { + /** + * Run stochastic gradient descent (SGD) in parallel using mini batches. + * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data + * in order to compute a gradient estimate. + * Sampling, and averaging the subgradients over this subset is performed using one standard + * spark map-reduce in each iteration.
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17810786 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ArrayBuffer + +import breeze.linalg.{DenseVector => BDV} + +import org.apache.spark.annotation.{Experimental, DeveloperApi} +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ + +class MultiModelGradientDescent private[mllib] ( +private var gradient: MultiModelGradient, +private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging { + + private var stepSize: Array[Double] = Array(1.0, 0.1) + private var numIterations: Array[Int] = Array(100) + private var regParam: Array[Double] = Array(0.0, 0.1, 1.0) + private var miniBatchFraction: Double = 1.0 + + /** + * Set the initial step size of SGD for the first step. Default (1.0, 0.1). + * In subsequent steps, the step size will decrease with stepSize/sqrt(t) + */ + def setStepSize(step: Array[Double]): this.type = { +this.stepSize = step +this + } + + /** + * :: Experimental :: + * Set fraction of data to be used for each SGD iteration. + * Default 1.0 (corresponding to deterministic/classical gradient descent) + */ + @Experimental + def setMiniBatchFraction(fraction: Double): this.type = { +this.miniBatchFraction = fraction +this + } + + /** + * Set the number of iterations for SGD. Default 100. + */ + def setNumIterations(iters: Array[Int]): this.type = { +this.numIterations = iters +this + } + + /** + * Set the regularization parameter. Default (0.0, 0.1, 1.0). + */ + def setRegParam(regParam: Array[Double]): this.type = { +this.regParam = regParam +this + } + + /** + * Set the gradient function (of the loss function of one single data example) + * to be used for SGD. + */ + def setGradient(gradient: MultiModelGradient): this.type = { +this.gradient = gradient +this + } + + + /** + * Set the updater function to actually perform a gradient step in a given direction. + * The updater is responsible to perform the update from the regularization term as well, + * and therefore determines what kind or regularization is used, if any. + */ + def setUpdater(updater: Array[MultiModelUpdater]): this.type = { +this.updater = updater +this + } + + /** + * :: DeveloperApi :: + * Runs gradient descent on the given training data. + * @param data training data + * @param initialWeights initial weights + * @return solution vector + */ + @DeveloperApi + def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = { +val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD( + data, + gradient, + updater, + stepSize, + numIterations, + regParam, + miniBatchFraction, + initialWeights) +weights + } + +} + +/** + * :: DeveloperApi :: + * Top-level method to run gradient descent. + */ +@DeveloperApi +object MultiModelGradientDescent extends Logging { + /** + * Run stochastic gradient descent (SGD) in parallel using mini batches. + * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data + * in order to compute a gradient estimate. + * Sampling, and averaging the subgradients over this subset is performed using one standard + * spark map-reduce in each iteration.
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17810600 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ArrayBuffer + +import breeze.linalg.{DenseVector => BDV} + +import org.apache.spark.annotation.{Experimental, DeveloperApi} +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ + +class MultiModelGradientDescent private[mllib] ( +private var gradient: MultiModelGradient, +private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging { + + private var stepSize: Array[Double] = Array(1.0, 0.1) + private var numIterations: Array[Int] = Array(100) + private var regParam: Array[Double] = Array(0.0, 0.1, 1.0) + private var miniBatchFraction: Double = 1.0 + + /** + * Set the initial step size of SGD for the first step. Default (1.0, 0.1). + * In subsequent steps, the step size will decrease with stepSize/sqrt(t) + */ + def setStepSize(step: Array[Double]): this.type = { +this.stepSize = step +this + } + + /** + * :: Experimental :: + * Set fraction of data to be used for each SGD iteration. + * Default 1.0 (corresponding to deterministic/classical gradient descent) + */ + @Experimental + def setMiniBatchFraction(fraction: Double): this.type = { +this.miniBatchFraction = fraction +this + } + + /** + * Set the number of iterations for SGD. Default 100. + */ + def setNumIterations(iters: Array[Int]): this.type = { +this.numIterations = iters +this + } + + /** + * Set the regularization parameter. Default (0.0, 0.1, 1.0). + */ + def setRegParam(regParam: Array[Double]): this.type = { +this.regParam = regParam +this + } + + /** + * Set the gradient function (of the loss function of one single data example) + * to be used for SGD. + */ + def setGradient(gradient: MultiModelGradient): this.type = { +this.gradient = gradient +this + } + + + /** + * Set the updater function to actually perform a gradient step in a given direction. + * The updater is responsible to perform the update from the regularization term as well, + * and therefore determines what kind or regularization is used, if any. + */ + def setUpdater(updater: Array[MultiModelUpdater]): this.type = { +this.updater = updater +this + } + + /** + * :: DeveloperApi :: + * Runs gradient descent on the given training data. + * @param data training data + * @param initialWeights initial weights + * @return solution vector + */ + @DeveloperApi + def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = { +val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD( + data, + gradient, + updater, + stepSize, + numIterations, + regParam, + miniBatchFraction, + initialWeights) +weights + } + +} + +/** + * :: DeveloperApi :: + * Top-level method to run gradient descent. + */ +@DeveloperApi +object MultiModelGradientDescent extends Logging { + /** + * Run stochastic gradient descent (SGD) in parallel using mini batches. + * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data + * in order to compute a gradient estimate. + * Sampling, and averaging the subgradients over this subset is performed using one standard + * spark map-reduce in each iteration.
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17809784 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ArrayBuffer + +import breeze.linalg.{DenseVector => BDV} + +import org.apache.spark.annotation.{Experimental, DeveloperApi} +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ + +class MultiModelGradientDescent private[mllib] ( +private var gradient: MultiModelGradient, +private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging { + + private var stepSize: Array[Double] = Array(1.0, 0.1) + private var numIterations: Array[Int] = Array(100) + private var regParam: Array[Double] = Array(0.0, 0.1, 1.0) + private var miniBatchFraction: Double = 1.0 + + /** + * Set the initial step size of SGD for the first step. Default (1.0, 0.1). + * In subsequent steps, the step size will decrease with stepSize/sqrt(t) + */ + def setStepSize(step: Array[Double]): this.type = { +this.stepSize = step +this + } + + /** + * :: Experimental :: + * Set fraction of data to be used for each SGD iteration. + * Default 1.0 (corresponding to deterministic/classical gradient descent) + */ + @Experimental + def setMiniBatchFraction(fraction: Double): this.type = { +this.miniBatchFraction = fraction +this + } + + /** + * Set the number of iterations for SGD. Default 100. + */ + def setNumIterations(iters: Array[Int]): this.type = { +this.numIterations = iters +this + } + + /** + * Set the regularization parameter. Default (0.0, 0.1, 1.0). + */ + def setRegParam(regParam: Array[Double]): this.type = { +this.regParam = regParam +this + } + + /** + * Set the gradient function (of the loss function of one single data example) + * to be used for SGD. + */ + def setGradient(gradient: MultiModelGradient): this.type = { +this.gradient = gradient +this + } + + + /** + * Set the updater function to actually perform a gradient step in a given direction. + * The updater is responsible to perform the update from the regularization term as well, + * and therefore determines what kind or regularization is used, if any. + */ + def setUpdater(updater: Array[MultiModelUpdater]): this.type = { +this.updater = updater +this + } + + /** + * :: DeveloperApi :: + * Runs gradient descent on the given training data. + * @param data training data + * @param initialWeights initial weights + * @return solution vector + */ + @DeveloperApi + def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Matrix = { +val (weights, _) = MultiModelGradientDescent.runMiniBatchMMSGD( + data, + gradient, + updater, + stepSize, + numIterations, + regParam, + miniBatchFraction, + initialWeights) +weights + } + +} + +/** + * :: DeveloperApi :: + * Top-level method to run gradient descent. + */ +@DeveloperApi +object MultiModelGradientDescent extends Logging { + /** + * Run stochastic gradient descent (SGD) in parallel using mini batches. + * In each iteration, we sample a subset (fraction miniBatchFraction) of the total data + * in order to compute a gradient estimate. + * Sampling, and averaging the subgradients over this subset is performed using one standard + * spark map-reduce in each iteration.
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17809626 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/MultiModelGradientDescent.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ArrayBuffer + +import breeze.linalg.{DenseVector => BDV} + +import org.apache.spark.annotation.{Experimental, DeveloperApi} +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ + +class MultiModelGradientDescent private[mllib] ( +private var gradient: MultiModelGradient, +private var updater: Array[MultiModelUpdater]) extends Optimizer[Matrix] with Logging { + + private var stepSize: Array[Double] = Array(1.0, 0.1) + private var numIterations: Array[Int] = Array(100) + private var regParam: Array[Double] = Array(0.0, 0.1, 1.0) + private var miniBatchFraction: Double = 1.0 + + /** + * Set the initial step size of SGD for the first step. Default (1.0, 0.1). + * In subsequent steps, the step size will decrease with stepSize/sqrt(t) + */ + def setStepSize(step: Array[Double]): this.type = { --- End diff -- Here and in the other methods, maybe append an "s" if it takes multiple parameter settings: "setStepSizes" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17808588 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -181,6 +181,7 @@ object GradientDescent extends Logging { var regVal = updater.compute( weights, Vectors.dense(new Array[Double](weights.size)), 0, 1, regParam)._2 +//println(s"initial:\n$weights\n\n") --- End diff -- remove --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17808221 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -157,3 +157,221 @@ class HingeGradient extends Gradient { } } } + +/** + * :: DeveloperApi :: + * Class used to compute the gradient for a loss function, given a series of data points. + */ +@DeveloperApi +abstract class MultiModelGradient extends Serializable { + /** + * Compute the gradient and loss given the features of all data points. + * + * @param data features for one data point + * @param label label for this data point + * @param weights weights/coefficients corresponding to features + * + * @return (gradient: DenseMatrix, loss: Double) + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) + + /** + * Compute the gradient and loss given the features of a series of data point, + * add the gradient to a provided matrix to avoid creating new objects, and return loss. + * + * @param data features for the data points + * @param label label for the data points + * @param weights weights/coefficients corresponding to features + * @param cumGradient the computed gradient will be added to this matrix + * + * @return loss + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix, cumGradient: DenseMatrix): Matrix +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a logistic loss function, as used in binary classification. + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLogisticGradient extends MultiModelGradient { + + private def sigmoid(p: DenseMatrix): DenseMatrix = { +def takeSigmoid(p: Double): Double = { + 1.0 / (math.exp(-p) + 1.0) +} +p.map(takeSigmoid) + } + + override def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) = { +val margin = data transposeMultiply weights +val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols) + +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 0.0, gradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +val lossVector = + if (data.isInstanceOf[DenseMatrix]) { +val numFeatures = data.numRows +val zeroEntries = data.compare(0.0, _ == _) +val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) +loss.colSums(false, shouldSkip) + } else { +loss.colSums + } +(gradient, lossVector) + } + + override def compute(data: Matrix, + label: DenseMatrix, + weights: DenseMatrix, + cumGradient: DenseMatrix): Matrix = { +val margin = data transposeMultiply weights +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 1.0, cumGradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +if (data.isInstanceOf[DenseMatrix]) { + val numFeatures = data.numRows + val zeroEntries = data.compare(0.0, _ == _) + val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) + loss.colSums(false, shouldSkip) +} else { + loss.colSums +} + } +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a Least-squared loss function, as used in linear regression. + * This is correct for the averaged least squares loss function (mean squared error) + * L = 1/n ||A weights-y||^2 + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLeastSquaresGradient extends MultiModelGradient { + override def compute(data: Matrix, label: DenseMatrix, --- End diff -- Ditto about computing in terms of below compute() method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. I
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17808193 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -157,3 +157,221 @@ class HingeGradient extends Gradient { } } } + +/** + * :: DeveloperApi :: + * Class used to compute the gradient for a loss function, given a series of data points. + */ +@DeveloperApi +abstract class MultiModelGradient extends Serializable { + /** + * Compute the gradient and loss given the features of all data points. + * + * @param data features for one data point + * @param label label for this data point + * @param weights weights/coefficients corresponding to features + * + * @return (gradient: DenseMatrix, loss: Double) + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) + + /** + * Compute the gradient and loss given the features of a series of data point, + * add the gradient to a provided matrix to avoid creating new objects, and return loss. + * + * @param data features for the data points + * @param label label for the data points + * @param weights weights/coefficients corresponding to features + * @param cumGradient the computed gradient will be added to this matrix + * + * @return loss + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix, cumGradient: DenseMatrix): Matrix +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a logistic loss function, as used in binary classification. + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLogisticGradient extends MultiModelGradient { + + private def sigmoid(p: DenseMatrix): DenseMatrix = { +def takeSigmoid(p: Double): Double = { + 1.0 / (math.exp(-p) + 1.0) +} +p.map(takeSigmoid) + } + + override def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) = { +val margin = data transposeMultiply weights +val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols) + +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 0.0, gradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +val lossVector = + if (data.isInstanceOf[DenseMatrix]) { +val numFeatures = data.numRows +val zeroEntries = data.compare(0.0, _ == _) +val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) +loss.colSums(false, shouldSkip) + } else { +loss.colSums + } +(gradient, lossVector) + } + + override def compute(data: Matrix, + label: DenseMatrix, + weights: DenseMatrix, + cumGradient: DenseMatrix): Matrix = { +val margin = data transposeMultiply weights +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 1.0, cumGradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +if (data.isInstanceOf[DenseMatrix]) { + val numFeatures = data.numRows + val zeroEntries = data.compare(0.0, _ == _) + val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) + loss.colSums(false, shouldSkip) +} else { + loss.colSums +} + } +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a Least-squared loss function, as used in linear regression. + * This is correct for the averaged least squares loss function (mean squared error) + * L = 1/n ||A weights-y||^2 + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLeastSquaresGradient extends MultiModelGradient { + override def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) = { + +val diff = (data transposeMultiply weights).elementWiseOperateOnColumnsInPlace(_ - _, label) + +v
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17808151 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -157,3 +157,221 @@ class HingeGradient extends Gradient { } } } + +/** + * :: DeveloperApi :: + * Class used to compute the gradient for a loss function, given a series of data points. + */ +@DeveloperApi +abstract class MultiModelGradient extends Serializable { + /** + * Compute the gradient and loss given the features of all data points. + * + * @param data features for one data point + * @param label label for this data point + * @param weights weights/coefficients corresponding to features + * + * @return (gradient: DenseMatrix, loss: Double) + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) + + /** + * Compute the gradient and loss given the features of a series of data point, + * add the gradient to a provided matrix to avoid creating new objects, and return loss. + * + * @param data features for the data points + * @param label label for the data points + * @param weights weights/coefficients corresponding to features + * @param cumGradient the computed gradient will be added to this matrix + * + * @return loss + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix, cumGradient: DenseMatrix): Matrix +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a logistic loss function, as used in binary classification. + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLogisticGradient extends MultiModelGradient { + + private def sigmoid(p: DenseMatrix): DenseMatrix = { +def takeSigmoid(p: Double): Double = { + 1.0 / (math.exp(-p) + 1.0) +} +p.map(takeSigmoid) + } + + override def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) = { +val margin = data transposeMultiply weights +val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols) + +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 0.0, gradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +val lossVector = + if (data.isInstanceOf[DenseMatrix]) { +val numFeatures = data.numRows +val zeroEntries = data.compare(0.0, _ == _) +val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) +loss.colSums(false, shouldSkip) + } else { +loss.colSums + } +(gradient, lossVector) + } + + override def compute(data: Matrix, + label: DenseMatrix, + weights: DenseMatrix, + cumGradient: DenseMatrix): Matrix = { +val margin = data transposeMultiply weights +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 1.0, cumGradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +if (data.isInstanceOf[DenseMatrix]) { + val numFeatures = data.numRows + val zeroEntries = data.compare(0.0, _ == _) + val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) --- End diff -- This applies elsewhere too, but I won't repeat the comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17808127 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -157,3 +157,221 @@ class HingeGradient extends Gradient { } } } + +/** + * :: DeveloperApi :: + * Class used to compute the gradient for a loss function, given a series of data points. + */ +@DeveloperApi +abstract class MultiModelGradient extends Serializable { + /** + * Compute the gradient and loss given the features of all data points. + * + * @param data features for one data point + * @param label label for this data point + * @param weights weights/coefficients corresponding to features + * + * @return (gradient: DenseMatrix, loss: Double) + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) + + /** + * Compute the gradient and loss given the features of a series of data point, + * add the gradient to a provided matrix to avoid creating new objects, and return loss. + * + * @param data features for the data points + * @param label label for the data points + * @param weights weights/coefficients corresponding to features + * @param cumGradient the computed gradient will be added to this matrix + * + * @return loss + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix, cumGradient: DenseMatrix): Matrix +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a logistic loss function, as used in binary classification. + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLogisticGradient extends MultiModelGradient { + + private def sigmoid(p: DenseMatrix): DenseMatrix = { +def takeSigmoid(p: Double): Double = { + 1.0 / (math.exp(-p) + 1.0) +} +p.map(takeSigmoid) + } + + override def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) = { +val margin = data transposeMultiply weights +val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols) + +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 0.0, gradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +val lossVector = + if (data.isInstanceOf[DenseMatrix]) { +val numFeatures = data.numRows +val zeroEntries = data.compare(0.0, _ == _) +val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) +loss.colSums(false, shouldSkip) + } else { +loss.colSums + } +(gradient, lossVector) + } + + override def compute(data: Matrix, + label: DenseMatrix, + weights: DenseMatrix, + cumGradient: DenseMatrix): Matrix = { +val margin = data transposeMultiply weights +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 1.0, cumGradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +if (data.isInstanceOf[DenseMatrix]) { + val numFeatures = data.numRows + val zeroEntries = data.compare(0.0, _ == _) + val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) + loss.colSums(false, shouldSkip) +} else { + loss.colSums +} + } +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a Least-squared loss function, as used in linear regression. + * This is correct for the averaged least squares loss function (mean squared error) + * L = 1/n ||A weights-y||^2 + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLeastSquaresGradient extends MultiModelGradient { + override def compute(data: Matrix, label: DenseMatrix, --- End diff -- line formatting --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17808106 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -157,3 +157,221 @@ class HingeGradient extends Gradient { } } } + +/** + * :: DeveloperApi :: + * Class used to compute the gradient for a loss function, given a series of data points. + */ +@DeveloperApi +abstract class MultiModelGradient extends Serializable { + /** + * Compute the gradient and loss given the features of all data points. + * + * @param data features for one data point + * @param label label for this data point + * @param weights weights/coefficients corresponding to features + * + * @return (gradient: DenseMatrix, loss: Double) + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) + + /** + * Compute the gradient and loss given the features of a series of data point, + * add the gradient to a provided matrix to avoid creating new objects, and return loss. + * + * @param data features for the data points + * @param label label for the data points + * @param weights weights/coefficients corresponding to features + * @param cumGradient the computed gradient will be added to this matrix + * + * @return loss + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix, cumGradient: DenseMatrix): Matrix +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a logistic loss function, as used in binary classification. + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLogisticGradient extends MultiModelGradient { + + private def sigmoid(p: DenseMatrix): DenseMatrix = { +def takeSigmoid(p: Double): Double = { + 1.0 / (math.exp(-p) + 1.0) +} +p.map(takeSigmoid) + } + + override def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) = { +val margin = data transposeMultiply weights +val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols) + +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 0.0, gradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +val lossVector = + if (data.isInstanceOf[DenseMatrix]) { +val numFeatures = data.numRows +val zeroEntries = data.compare(0.0, _ == _) +val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) +loss.colSums(false, shouldSkip) + } else { +loss.colSums + } +(gradient, lossVector) + } + + override def compute(data: Matrix, + label: DenseMatrix, + weights: DenseMatrix, + cumGradient: DenseMatrix): Matrix = { +val margin = data transposeMultiply weights +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 1.0, cumGradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +if (data.isInstanceOf[DenseMatrix]) { + val numFeatures = data.numRows + val zeroEntries = data.compare(0.0, _ == _) + val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) + loss.colSums(false, shouldSkip) +} else { + loss.colSums +} + } +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a Least-squared loss function, as used in linear regression. + * This is correct for the averaged least squares loss function (mean squared error) + * L = 1/n ||A weights-y||^2 + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLeastSquaresGradient extends MultiModelGradient { --- End diff -- At some point, we should rename this to SquaredError (not LeastSquares). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabl
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17808047 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def multiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols) +BLAS.gemm(false, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`-`DenseVector` multiplication. */ + def multiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numRows)) +BLAS.gemv(1.0, this, y, 0.0, output) +output + } + + /** Convenience method for `Matrix`^T^-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def transposeMultiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols) +BLAS.gemm(true, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */ + def transposeMultiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numCols)) +BLAS.gemv(true, 1.0, this, y, 0.0, output) +output + } + + /** A human readable representation of the matrix */ override def toString: String = toBreeze.toString() + + private[mllib] def map(f: Double => Double): Matrix + + private[mllib] def update(f: Double => Double): Matrix + + private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, +y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double, + y: Matrix): Matrix + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, + y: Double): Matrix + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y) + + private[mllib] def *(y: Matrix) = operate(_ * _, y) + + private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y) + + private[mllib] def +(y: Matrix) = operate(_ + _, y) + + private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y) + + private[mllib] def -(y: Matrix) = operate(_ - _, y) + + private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y) + + private[mllib] def /(y: Matrix) = operate(_ / _, y) + + private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y) + + private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y) + + private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y) + + private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y) + + private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y) + + private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y) + + private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y) + + private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y) + + private[mllib] def neg: Matrix + + private[mllib] def negInPlace: Matrix + + /** Less-than-or-equal-to check. Outputs
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17808020 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def multiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols) +BLAS.gemm(false, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`-`DenseVector` multiplication. */ + def multiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numRows)) +BLAS.gemv(1.0, this, y, 0.0, output) +output + } + + /** Convenience method for `Matrix`^T^-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def transposeMultiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols) +BLAS.gemm(true, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */ + def transposeMultiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numCols)) +BLAS.gemv(true, 1.0, this, y, 0.0, output) +output + } + + /** A human readable representation of the matrix */ override def toString: String = toBreeze.toString() + + private[mllib] def map(f: Double => Double): Matrix + + private[mllib] def update(f: Double => Double): Matrix + + private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, +y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double, + y: Matrix): Matrix + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, + y: Double): Matrix + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y) + + private[mllib] def *(y: Matrix) = operate(_ * _, y) + + private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y) + + private[mllib] def +(y: Matrix) = operate(_ + _, y) + + private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y) + + private[mllib] def -(y: Matrix) = operate(_ - _, y) + + private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y) + + private[mllib] def /(y: Matrix) = operate(_ / _, y) + + private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y) + + private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y) + + private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y) + + private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y) + + private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y) + + private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y) + + private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y) + + private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y) + + private[mllib] def neg: Matrix + + private[mllib] def negInPlace: Matrix + + /** Less-than-or-equal-to check. Outputs
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17807973 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -157,3 +157,221 @@ class HingeGradient extends Gradient { } } } + +/** + * :: DeveloperApi :: + * Class used to compute the gradient for a loss function, given a series of data points. + */ +@DeveloperApi +abstract class MultiModelGradient extends Serializable { + /** + * Compute the gradient and loss given the features of all data points. + * + * @param data features for one data point + * @param label label for this data point + * @param weights weights/coefficients corresponding to features + * + * @return (gradient: DenseMatrix, loss: Double) + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) + + /** + * Compute the gradient and loss given the features of a series of data point, + * add the gradient to a provided matrix to avoid creating new objects, and return loss. + * + * @param data features for the data points + * @param label label for the data points + * @param weights weights/coefficients corresponding to features + * @param cumGradient the computed gradient will be added to this matrix + * + * @return loss + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix, cumGradient: DenseMatrix): Matrix +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a logistic loss function, as used in binary classification. + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLogisticGradient extends MultiModelGradient { + + private def sigmoid(p: DenseMatrix): DenseMatrix = { +def takeSigmoid(p: Double): Double = { + 1.0 / (math.exp(-p) + 1.0) +} +p.map(takeSigmoid) + } + + override def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) = { +val margin = data transposeMultiply weights +val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols) + +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 0.0, gradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +val lossVector = + if (data.isInstanceOf[DenseMatrix]) { +val numFeatures = data.numRows +val zeroEntries = data.compare(0.0, _ == _) +val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) +loss.colSums(false, shouldSkip) + } else { +loss.colSums + } +(gradient, lossVector) + } + + override def compute(data: Matrix, + label: DenseMatrix, + weights: DenseMatrix, + cumGradient: DenseMatrix): Matrix = { +val margin = data transposeMultiply weights +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 1.0, cumGradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +if (data.isInstanceOf[DenseMatrix]) { + val numFeatures = data.numRows + val zeroEntries = data.compare(0.0, _ == _) + val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) --- End diff -- Is this really worthwhile? Computation is still linear in the size of the data, and the computation for colSums is pretty light. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17807758 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -157,3 +157,221 @@ class HingeGradient extends Gradient { } } } + +/** + * :: DeveloperApi :: + * Class used to compute the gradient for a loss function, given a series of data points. + */ +@DeveloperApi +abstract class MultiModelGradient extends Serializable { + /** + * Compute the gradient and loss given the features of all data points. + * + * @param data features for one data point + * @param label label for this data point + * @param weights weights/coefficients corresponding to features + * + * @return (gradient: DenseMatrix, loss: Double) + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) + + /** + * Compute the gradient and loss given the features of a series of data point, + * add the gradient to a provided matrix to avoid creating new objects, and return loss. + * + * @param data features for the data points + * @param label label for the data points + * @param weights weights/coefficients corresponding to features + * @param cumGradient the computed gradient will be added to this matrix + * + * @return loss + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix, cumGradient: DenseMatrix): Matrix +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a logistic loss function, as used in binary classification. + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLogisticGradient extends MultiModelGradient { + + private def sigmoid(p: DenseMatrix): DenseMatrix = { +def takeSigmoid(p: Double): Double = { + 1.0 / (math.exp(-p) + 1.0) +} +p.map(takeSigmoid) + } + + override def compute(data: Matrix, label: DenseMatrix, --- End diff -- Can this be implemented using the below compute method to avoid code duplication? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17807514 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -157,3 +157,221 @@ class HingeGradient extends Gradient { } } } + +/** + * :: DeveloperApi :: + * Class used to compute the gradient for a loss function, given a series of data points. + */ +@DeveloperApi +abstract class MultiModelGradient extends Serializable { + /** + * Compute the gradient and loss given the features of all data points. + * + * @param data features for one data point + * @param label label for this data point + * @param weights weights/coefficients corresponding to features + * + * @return (gradient: DenseMatrix, loss: Double) + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) --- End diff -- spacing (here and in methods below) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17807436 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -241,4 +241,4 @@ class SparseVector( } private[mllib] override def toBreeze: BV[Double] = new BSV[Double](indices, values, size) -} +} --- End diff -- newline? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56230988 Lots more tests to do for the MatricesSuite.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17807257 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17807001 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizont
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17806894 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizont
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17806667 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizont
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17806620 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizont
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17806514 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizont
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17806367 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizont
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17806323 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizont
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17806308 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizont
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17806143 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def multiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols) +BLAS.gemm(false, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`-`DenseVector` multiplication. */ + def multiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numRows)) +BLAS.gemv(1.0, this, y, 0.0, output) +output + } + + /** Convenience method for `Matrix`^T^-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def transposeMultiply(y: Matrix): DenseMatrix = { --- End diff -- We could implement transposed versions of other functions in a lazy manner. For most functions, we could add a one-line transposeIfNeeded() call. I'm OK with the current state, but as this API becomes more public, I think a lazy transpose will become more important. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17804577 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizontall
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17804191 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizont
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17803825 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => Doub
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17803754 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -93,9 +1000,310 @@ object Matrices { require(dm.majorStride == dm.rows, "Do not support stride size different from the number of rows.") new DenseMatrix(dm.rows, dm.cols, dm.data) + case sm: BSM[Double] => +new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) case _ => throw new UnsupportedOperationException( s"Do not support conversion from type ${breeze.getClass.getName}.") } } + + /** + * Generate a `DenseMatrix` consisting of zeros. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of zeros + */ + def zeros(numRows: Int, numCols: Int): Matrix = DenseMatrix.zeros(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of ones. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values of ones + */ + def ones(numRows: Int, numCols: Int): Matrix = DenseMatrix.ones(numRows, numCols) + + /** + * Generate an Identity Matrix in `DenseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def eye(n: Int): Matrix = DenseMatrix.eye(n) + + /** + * Generate an Identity Matrix in `SparseMatrix` format. + * @param n number of rows and columns of the matrix + * @return `Matrix` with size `n` x `n` and values of ones on the diagonal + */ + def speye(n: Int): Matrix = SparseMatrix.speye(n) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. uniform random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def rand(numRows: Int, numCols: Int): Matrix = DenseMatrix.rand(numRows, numCols) + + /** + * Generate a `DenseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def randn(numRows: Int, numCols: Int): Matrix = DenseMatrix.randn(numRows, numCols) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in U(0, 1) + */ + def sprand( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprand(numRows, numCols, density, seed) + + /** + * Generate a `SparseMatrix` consisting of i.i.d. gaussian random numbers. + * @param numRows number of rows of the matrix + * @param numCols number of columns of the matrix + * @param density the desired density for the matrix + * @param seed the seed for the random generator + * @return `Matrix` with size `numRows` x `numCols` and values in N(0, 1) + */ + def sprandn( + numRows: Int, + numCols: Int, + density: Double, + seed: Long = Utils.random.nextLong()): Matrix = +SparseMatrix.sprandn(numRows, numCols, density, seed) + + /** + * Generate a diagonal matrix in `DenseMatrix` format from the supplied values. Use + * [[org.apache.spark.mllib.linalg.SparseMatrix.diag()]] in order to generate the matrix in + * `SparseMatrix` format. + * @param vector a `Vector` that will form the values on the diagonal of the matrix + * @return Square `Matrix` with size `values.length` x `values.length` and `values` + * on the diagonal + */ + def diag(vector: Vector): Matrix = DenseMatrix.diag(vector) + + /** + * Horizontally concatenate a sequence of matrices. The returned matrix will be in the format + * the matrices are supplied in. Supplying a mix of dense and sparse matrices is not supported. + * @param matrices sequence of matrices + * @return a single `Matrix` composed of the matrices that were horizont
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17803601 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17803546 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17803482 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17803390 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17803218 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17803169 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => Doub
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17803111 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def multiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols) +BLAS.gemm(false, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`-`DenseVector` multiplication. */ + def multiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numRows)) +BLAS.gemv(1.0, this, y, 0.0, output) +output + } + + /** Convenience method for `Matrix`^T^-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def transposeMultiply(y: Matrix): DenseMatrix = { --- End diff -- That was a discussion issue. I'm happy to do it as such, but the problem is for every single function we add, we're going to have to implement the transposed versions as well. The number of functions are currently getting out of hand... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17803143 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17802982 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => Doub
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user anantasty commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17802806 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -157,3 +157,221 @@ class HingeGradient extends Gradient { } } } + +/** + * :: DeveloperApi :: + * Class used to compute the gradient for a loss function, given a series of data points. + */ +@DeveloperApi +abstract class MultiModelGradient extends Serializable { + /** + * Compute the gradient and loss given the features of all data points. + * + * @param data features for one data point + * @param label label for this data point + * @param weights weights/coefficients corresponding to features + * + * @return (gradient: DenseMatrix, loss: Double) + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) + + /** + * Compute the gradient and loss given the features of a series of data point, + * add the gradient to a provided matrix to avoid creating new objects, and return loss. + * + * @param data features for the data points + * @param label label for the data points + * @param weights weights/coefficients corresponding to features + * @param cumGradient the computed gradient will be added to this matrix + * + * @return loss + */ + def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix, cumGradient: DenseMatrix): Matrix +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a logistic loss function, as used in binary classification. + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLogisticGradient extends MultiModelGradient { + + private def sigmoid(p: DenseMatrix): DenseMatrix = { +def takeSigmoid(p: Double): Double = { + 1.0 / (math.exp(-p) + 1.0) +} +p.map(takeSigmoid) + } + + override def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) = { +val margin = data transposeMultiply weights +val gradient = DenseMatrix.zeros(weights.numRows, weights.numCols) + +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 0.0, gradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +val lossVector = + if (data.isInstanceOf[DenseMatrix]) { +val numFeatures = data.numRows +val zeroEntries = data.compare(0.0, _ == _) +val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) +loss.colSums(false, shouldSkip) + } else { +loss.colSums + } +(gradient, lossVector) + } + + override def compute(data: Matrix, + label: DenseMatrix, + weights: DenseMatrix, + cumGradient: DenseMatrix): Matrix = { +val margin = data transposeMultiply weights +gemm(false, false, 1.0, data, sigmoid(margin).elementWiseOperateOnColumnsInPlace(_ - _, label), + 1.0, cumGradient) + +val negativeLabels = label.compare(0.0, _ == _) +val addMargin = margin.elementWiseOperateOnColumns(_ * _, negativeLabels) + +val loss = margin.update(v => math.log1p(math.exp(-v))). + elementWiseOperateInPlace(_ + _, addMargin) + +if (data.isInstanceOf[DenseMatrix]) { + val numFeatures = data.numRows + val zeroEntries = data.compare(0.0, _ == _) + val shouldSkip = zeroEntries.colSums.compareInPlace(numFeatures, _ == _) + loss.colSums(false, shouldSkip) +} else { + loss.colSums +} + } +} + +/** + * :: DeveloperApi :: + * Compute gradient and loss for a Least-squared loss function, as used in linear regression. + * This is correct for the averaged least squares loss function (mean squared error) + * L = 1/n ||A weights-y||^2 + * See also the documentation for the precise formulation. + */ +@DeveloperApi +class MultiModelLeastSquaresGradient extends MultiModelGradient { + override def compute(data: Matrix, label: DenseMatrix, + weights: DenseMatrix): (DenseMatrix, Matrix) = { + +val diff = (data transposeMultiply weights).elementWiseOperateOnColumnsInPlace(_ - _, label) + +v
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17802391 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17802344 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17802293 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ --- End diff -- style: Maybe search for "}else" too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17802140 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ --- End diff -- ditto below; check please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17802211 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { --- End diff -- long line (run dev/scalastyle to check all) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user anantasty commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17802195 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -17,12 +17,19 @@ package org.apache.spark.mllib.linalg -import breeze.linalg.{Matrix => BM, DenseMatrix => BDM} +import breeze.linalg.{Matrix => BM, DenseMatrix => BDM, CSCMatrix => BSM} + +import org.apache.spark.rdd.RDD +import org.apache.spark.util.random.XORShiftRandom +import org.apache.spark.util.Utils + +import scala.collection.mutable.ArrayBuffer +import java.util.Arrays /** * Trait for a local matrix. */ -trait Matrix extends Serializable { +sealed trait Matrix extends Serializable { --- End diff -- Good use of sealed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17802108 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) --- End diff -- Please add warning messages (here and in other require statements). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17802128 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ --- End diff -- "){" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user anantasty commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17801907 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17801756 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def multiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols) +BLAS.gemm(false, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`-`DenseVector` multiplication. */ + def multiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numRows)) +BLAS.gemv(1.0, this, y, 0.0, output) +output + } + + /** Convenience method for `Matrix`^T^-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def transposeMultiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols) +BLAS.gemm(true, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */ + def transposeMultiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numCols)) +BLAS.gemv(true, 1.0, this, y, 0.0, output) +output + } + + /** A human readable representation of the matrix */ override def toString: String = toBreeze.toString() + + private[mllib] def map(f: Double => Double): Matrix + + private[mllib] def update(f: Double => Double): Matrix + + private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, +y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double, + y: Matrix): Matrix + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, + y: Double): Matrix + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y) + + private[mllib] def *(y: Matrix) = operate(_ * _, y) + + private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y) + + private[mllib] def +(y: Matrix) = operate(_ + _, y) + + private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y) + + private[mllib] def -(y: Matrix) = operate(_ - _, y) + + private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y) + + private[mllib] def /(y: Matrix) = operate(_ / _, y) + + private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y) + + private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y) + + private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y) + + private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y) + + private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y) + + private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y) + + private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y) + + private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y) + + private[mllib] def neg: Matrix + + private[mllib] def negInPlace: Matrix + + /** Less-than-or-equal-to check. Outputs
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17801735 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def multiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols) +BLAS.gemm(false, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`-`DenseVector` multiplication. */ + def multiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numRows)) +BLAS.gemv(1.0, this, y, 0.0, output) +output + } + + /** Convenience method for `Matrix`^T^-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def transposeMultiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols) +BLAS.gemm(true, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */ + def transposeMultiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numCols)) +BLAS.gemv(true, 1.0, this, y, 0.0, output) +output + } + + /** A human readable representation of the matrix */ override def toString: String = toBreeze.toString() + + private[mllib] def map(f: Double => Double): Matrix + + private[mllib] def update(f: Double => Double): Matrix + + private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, +y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double, + y: Matrix): Matrix + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, + y: Double): Matrix + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y) + + private[mllib] def *(y: Matrix) = operate(_ * _, y) + + private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y) + + private[mllib] def +(y: Matrix) = operate(_ + _, y) + + private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y) + + private[mllib] def -(y: Matrix) = operate(_ - _, y) + + private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y) + + private[mllib] def /(y: Matrix) = operate(_ / _, y) + + private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y) + + private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y) + + private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y) + + private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y) + + private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y) + + private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y) + + private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y) + + private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y) + + private[mllib] def neg: Matrix + + private[mllib] def negInPlace: Matrix + + /** Less-than-or-equal-to check. Outputs
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17801649 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def multiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols) +BLAS.gemm(false, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`-`DenseVector` multiplication. */ + def multiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numRows)) +BLAS.gemv(1.0, this, y, 0.0, output) +output + } + + /** Convenience method for `Matrix`^T^-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def transposeMultiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols) +BLAS.gemm(true, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */ + def transposeMultiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numCols)) +BLAS.gemv(true, 1.0, this, y, 0.0, output) +output + } + + /** A human readable representation of the matrix */ override def toString: String = toBreeze.toString() + + private[mllib] def map(f: Double => Double): Matrix + + private[mllib] def update(f: Double => Double): Matrix + + private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, +y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double, + y: Matrix): Matrix + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, + y: Double): Matrix + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y) + + private[mllib] def *(y: Matrix) = operate(_ * _, y) + + private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y) + + private[mllib] def +(y: Matrix) = operate(_ + _, y) + + private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y) + + private[mllib] def -(y: Matrix) = operate(_ - _, y) + + private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y) + + private[mllib] def /(y: Matrix) = operate(_ / _, y) + + private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y) + + private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y) + + private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y) + + private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y) + + private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y) + + private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y) + + private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y) + + private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y) + + private[mllib] def neg: Matrix + + private[mllib] def negInPlace: Matrix + + /** Less-than-or-equal-to check. Outputs
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17801574 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def multiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols) +BLAS.gemm(false, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`-`DenseVector` multiplication. */ + def multiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numRows)) +BLAS.gemv(1.0, this, y, 0.0, output) +output + } + + /** Convenience method for `Matrix`^T^-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def transposeMultiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols) +BLAS.gemm(true, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */ + def transposeMultiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numCols)) +BLAS.gemv(true, 1.0, this, y, 0.0, output) +output + } + + /** A human readable representation of the matrix */ override def toString: String = toBreeze.toString() + + private[mllib] def map(f: Double => Double): Matrix + + private[mllib] def update(f: Double => Double): Matrix + + private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, +y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double, + y: Matrix): Matrix + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, + y: Double): Matrix + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y) + + private[mllib] def *(y: Matrix) = operate(_ * _, y) + + private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y) + + private[mllib] def +(y: Matrix) = operate(_ + _, y) + + private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y) + + private[mllib] def -(y: Matrix) = operate(_ - _, y) + + private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y) + + private[mllib] def /(y: Matrix) = operate(_ / _, y) + + private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y) + + private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y) + + private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y) + + private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y) + + private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y) + + private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y) + + private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y) + + private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y) + + private[mllib] def neg: Matrix + + private[mllib] def negInPlace: Matrix + + /** Less-than-or-equal-to check. Outputs
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17801515 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def multiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols) +BLAS.gemm(false, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`-`DenseVector` multiplication. */ + def multiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numRows)) +BLAS.gemv(1.0, this, y, 0.0, output) +output + } + + /** Convenience method for `Matrix`^T^-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def transposeMultiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numCols, y.numCols) +BLAS.gemm(true, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`^T^-`DenseVector` multiplication. */ + def transposeMultiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numCols)) +BLAS.gemv(true, 1.0, this, y, 0.0, output) +output + } + + /** A human readable representation of the matrix */ override def toString: String = toBreeze.toString() + + private[mllib] def map(f: Double => Double): Matrix + + private[mllib] def update(f: Double => Double): Matrix + + private[mllib] def elementWiseOperateOnColumnsInPlace(f: (Double, Double) => Double, +y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRowsInPlace(f: (Double, Double) => Double, + y: Matrix): Matrix + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, + y: Double): Matrix + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): Matrix + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): Matrix + + private[mllib] def *=(y: Matrix) = operateInPlace(_ * _, y) + + private[mllib] def *(y: Matrix) = operate(_ * _, y) + + private[mllib] def +=(y: Matrix) = operateInPlace(_ + _, y) + + private[mllib] def +(y: Matrix) = operate(_ + _, y) + + private[mllib] def -=(y: Matrix) = operateInPlace(_ - _, y) + + private[mllib] def -(y: Matrix) = operate(_ - _, y) + + private[mllib] def /=(y: Matrix) = operateInPlace(_ / _, y) + + private[mllib] def /(y: Matrix) = operate(_ / _, y) + + private[mllib] def *=(y: Double) = elementWiseOperateScalarInPlace(_ * _, y) + + private[mllib] def +=(y: Double) = elementWiseOperateScalarInPlace(_ + _, y) + + private[mllib] def -=(y: Double) = elementWiseOperateScalarInPlace(_ - _, y) + + private[mllib] def /=(y: Double) = elementWiseOperateScalarInPlace(_ / _, y) + + private[mllib] def *(y: Double) = elementWiseOperateScalar(_ * _, y) + + private[mllib] def +(y: Double) = elementWiseOperateScalar(_ + _, y) + + private[mllib] def -(y: Double) = elementWiseOperateScalar(_ - _, y) + + private[mllib] def /(y: Double) = elementWiseOperateScalar(_ / _, y) + + private[mllib] def neg: Matrix + + private[mllib] def negInPlace: Matrix + + /** Less-than-or-equal-to check. Outputs
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user anantasty commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56217049 @brkyvz I will get on it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17801264 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { --- End diff -- long line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56216806 Also, is it odd that the user can't access the matrix data, except via toArray (or maybe side effects of the function given to map)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56216573 Could the methods be ordered in the file (grouped by public, private[mllib], private, etc.? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17801072 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ --- End diff -- Just wondering (not sure myself): Which is prefered: `SparseMatrix` or [[SparseMatrix]] in docs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17800735 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17800699 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17800687 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double = toBreeze(i, j) + private[mllib] def apply(i: Int, j: Int): Double + + /** Return the index for the (i, j)-th element in the backing array. */ + private[mllib] def index(i: Int, j: Int): Int + + /** Update element at (i, j) */ + private[mllib] def update(i: Int, j: Int, v: Double): Unit + + /** Get a deep copy of the matrix. */ + def copy: Matrix + /** Convenience method for `Matrix`-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def multiply(y: Matrix): DenseMatrix = { +val C: DenseMatrix = DenseMatrix.zeros(numRows, y.numCols) +BLAS.gemm(false, false, 1.0, this, y, 0.0, C) +C + } + + /** Convenience method for `Matrix`-`DenseVector` multiplication. */ + def multiply(y: DenseVector): DenseVector = { +val output = new DenseVector(new Array[Double](numRows)) +BLAS.gemv(1.0, this, y, 0.0, output) +output + } + + /** Convenience method for `Matrix`^T^-`Matrix` multiplication. +* Note: `SparseMatrix`-`SparseMatrix` multiplication is not supported */ + def transposeMultiply(y: Matrix): DenseMatrix = { --- End diff -- How hard would it be to have matrices store a transpose bit indicated if they are transposed (without the data being moved)? I envision: * transpose() function which sets this bit (so transpose is a lazy operation) * eliminate transposeMultiply * perhaps include a transposePhysical or tranpose(physical: Boolean) method which forces data movement I'm also OK with adding that support later on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17800664 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17800692 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param numCols number of columns * @param values matrix entries in column major */ -class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix { +class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable { - require(values.length == numRows * numCols) + require(values.length == numRows * numCols, "The number of values supplied doesn't match the " + +s"size of the matrix! values.length: ${values.length}, numRows * numCols: ${numRows * numCols}") override def toArray: Array[Double] = values - private[mllib] override def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + private[mllib] def toBreeze: BM[Double] = new BDM[Double](numRows, numCols, values) + + private[mllib] def apply(i: Int): Double = values(i) + + private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + + private[mllib] def index(i: Int, j: Int): Int = i + numRows * j + + private[mllib] def update(i: Int, j: Int, v: Double): Unit = { +values(index(i, j)) = v + } + + override def copy = new DenseMatrix(numRows, numCols, values.clone()) + + private[mllib] def elementWiseOperateOnColumnsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +val len = y_vals.length +require(y_vals.length == numRows) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < len){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(i)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateOnRowsInPlace( + f: (Double, Double) => Double, + y: Matrix): DenseMatrix = { +val y_vals = y.toArray +require(y_vals.length == numCols) +var j = 0 +while (j < numCols){ + var i = 0 + while (i < numRows){ +val idx = index(i, j) +values(idx) = f(values(idx), y_vals(j)) +i += 1 + } + j += 1 +} +this + } + + private[mllib] def elementWiseOperateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val y_val = y.toArray +val len = values.length +require(y_val.length == values.length) +var j = 0 +while (j < len){ + values(j) = f(values(j), y_val(j)) + j += 1 +} +this + } + + private[mllib] def elementWiseOperateScalarInPlace(f: (Double, Double) => Double, y: Double): DenseMatrix = { +var j = 0 +val len = values.length +while (j < len){ + values(j) = f(values(j), y) + j += 1 +} +this + } + + private[mllib] def operateInPlace(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +if (y.numCols==1 || y.numRows == 1){ + require(numCols != numRows, "Operation is ambiguous. Please use elementWiseOperateOnRows " + +"or elementWiseOperateOnColumns instead") +} +if (y.numCols == 1 && y.numRows == 1){ + elementWiseOperateScalarInPlace(f, y.toArray(0)) +} else { + if (y.numCols==1) { +elementWiseOperateOnColumnsInPlace(f, y) + }else if (y.numRows==1){ +elementWiseOperateOnRowsInPlace(f, y) + }else{ +elementWiseOperateInPlace(f, y) + } +} + } + + private[mllib] def elementWiseOperateOnColumns(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnColumnsInPlace(f, y) + } + + private[mllib] def elementWiseOperateOnRows(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateOnRowsInPlace(f, y) + } + + private[mllib] def elementWiseOperate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateInPlace(f, y) + } + + private[mllib] def elementWiseOperateScalar(f: (Double, Double) => Double, y: Double): DenseMatrix = { +val dup = this.copy +dup.elementWiseOperateScalarInPlace(f, y) + } + + private[mllib] def operate(f: (Double, Double) => Double, y: Matrix): DenseMatrix = { +val dup = this.copy +dup.operateInPlace(f, y) + } + + def map(f: Double => D
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56202513 @anantasty: If you could look through the code and mark places where you're like "What the heck is going on here", it would be easier for me to write up proper comments. I'm going to add a lot today, I can incorporate yours as well. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17769270 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765442 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BLASSuite.scala --- @@ -126,4 +126,142 @@ class BLASSuite extends FunSuite { } } } + + test("gemm") { --- End diff -- Shouldn't this test all 4 options for transA,transB? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765188 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765187 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +requ
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765178 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765175 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765173 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765167 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765077 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765001 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17764905 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +requ
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17764836 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17764833 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable { throw new IllegalArgumentException(s"scal doesn't support vector type ${x.getClass}.") } } + + // For level-3 routines, we use the native BLAS. + private def nativeBLAS: NetlibBLAS = { +if (_nativeBLAS == null) { + _nativeBLAS = NativeBLAS +} +_nativeBLAS + } + + /** + * C := alpha * A * B + beta * C + * @param transA whether to use the transpose of matrix A (true), or A itself (false). + * @param transB whether to use the transpose of matrix B (true), or B itself (false). + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +if (alpha == 0.0) { + logDebug("gemm: alpha is equal to 0. Returning C.") +} else { + A match { +case sparse: SparseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, sparse, dB, beta, C) +case sB: SparseMatrix => + throw new IllegalArgumentException(s"gemm doesn't support sparse-sparse matrix " + +s"multiplication") +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case dense: DenseMatrix => + B match { +case dB: DenseMatrix => gemm(transA, transB, alpha, dense, dB, beta, C) +case sB: SparseMatrix => gemm(transA, transB, alpha, dense, sB, beta, C) +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${B.getClass}.") + } +case _ => + throw new IllegalArgumentException(s"gemm doesn't support matrix type ${A.getClass}.") + } +} + } + + /** + * C := alpha * A * B + beta * C + * + * @param alpha a scalar to scale the multiplication A * B. + * @param A the matrix A that will be left multiplied to B. Size of m x k. + * @param B the matrix B that will be left multiplied by A. Size of k x n. + * @param beta a scalar that can be used to scale matrix C. + * @param C the resulting matrix C. Size of m x n. + */ + def gemm( + alpha: Double, + A: Matrix, + B: Matrix, + beta: Double, + C: DenseMatrix): Unit = { +gemm(false, false, alpha, A, B, beta, C) + } + + /** + * C := alpha * A * B + beta * C + * For `DenseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: DenseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols +val tAstr = if (!transA) "N" else "T" +val tBstr = if (!transB) "N" else "T" + +require(kA == kB, s"The columns of A don't match the rows of B. A: $kA, B: $kB") +require(mA == C.numRows, s"The rows of C don't match the rows of A. C: ${C.numRows}, A: $mA") +require(nB == C.numCols, + s"The columns of C don't match the columns of B. C: ${C.numCols}, A: $nB") + +nativeBLAS.dgemm(tAstr, tBstr, mA, nB, kA, alpha, A.values, A.numRows, B.values, B.numRows, + beta, C.values, C.numRows) + } + + /** + * C := alpha * A * B + beta * C + * For `SparseMatrix` A. + */ + private def gemm( + transA: Boolean, + transB: Boolean, + alpha: Double, + A: SparseMatrix, + B: DenseMatrix, + beta: Double, + C: DenseMatrix): Unit = { +val mA: Int = if (!transA) A.numRows else A.numCols +val nB: Int = if (!transB) B.numCols else B.numRows +val kA: Int = if (!transA) A.numCols else A.numRows +val kB: Int = if (!transB) B.numRows else B.numCols + +r
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56122908 @brkyvz Just wondering: Which reference library are you using to determine the order of arguments for BLAS routines? E.g., it's different from [Netlib LAPACK](http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user anantasty commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56108815 With some guidance I could help you with the docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56106639 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20553/consoleFull) for PR 2451 at commit [`5e7d744`](https://github.com/apache/spark/commit/5e7d74408fd5f4e521f4e3a7e94a289d59454913). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `sealed trait Matrix extends Serializable ` * `class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double]) extends Matrix with Serializable ` * `class SparseMatrix(` * `sealed trait Vector extends Serializable ` * `abstract class MultiModelGradient extends Serializable ` * `class MultiModelLogisticGradient extends MultiModelGradient ` * `class MultiModelLeastSquaresGradient extends MultiModelGradient ` * `class MultiModelHingeGradient extends MultiModelGradient ` * `trait Optimizer[V] extends Serializable ` * `abstract class MultiModelUpdater extends Serializable ` * `class MultiModelSimpleUpdater extends MultiModelUpdater ` * `class MultiModelL1Updater extends MultiModelUpdater ` * `class MultiModelSquaredL2Updater extends MultiModelUpdater ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org