[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-65179216 LGTM. Merged into master and branch-1.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3462 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-65043520 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23984/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-65043512 [Test build #23984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23984/consoleFull) for PR 3462 at commit [`63c7165`](https://github.com/apache/spark/commit/63c71659ab7aa3bbea1a505f872dceeca5d3ab2f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-65035512 [Test build #23984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23984/consoleFull) for PR 3462 at commit [`63c7165`](https://github.com/apache/spark/commit/63c71659ab7aa3bbea1a505f872dceeca5d3ab2f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r21076434 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { --- End diff -- yeah. but this will not work here unless p has type of `Int`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r21075718 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { --- End diff -- It is an interesting discussion ~ :) But maybe more people are familiar with the `if ... else if ... else` statement. And this is not on the critical path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r21075634 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala --- @@ -197,4 +198,27 @@ class VectorsSuite extends FunSuite { assert(svMap.get(2) === Some(3.1)) assert(svMap.get(3) === Some(0.0)) } + + test("vector p-norm") { +val dv = Vectors.dense(0.0, -1.2, 3.1, 0.0, -4.5, 1.9) +val sv = Vectors.sparse(6, Seq((1, -1.2), (2, 3.1), (3, 0.0), (4, -4.5), (5, 1.9))) + +assert(Vectors.norm(dv, 1.0) ~== dv.toArray.foldLeft(0.0)((a, v) => + a + math.abs(v)) relTol 1E-8) +assert(Vectors.norm(sv, 1.0) ~== sv.toArray.foldLeft(0.0)((a, v) => + a + math.abs(v)) relTol 1E-8) + +assert(Vectors.norm(dv, 2.0) ~== math.sqrt(dv.toArray.foldLeft(0.0)((a, v) => + a + v * v)) relTol 1E-8) +assert(Vectors.norm(sv, 2.0) ~== math.sqrt(sv.toArray.foldLeft(0.0)((a, v) => + a + v * v)) relTol 1E-8) + +assert(Vectors.norm(dv, Double.PositiveInfinity) ~== dv.toArray.map(math.abs).max relTol 1E-8) +assert(Vectors.norm(sv, Double.PositiveInfinity) ~== sv.toArray.map(math.abs).max relTol 1E-8) + +assert(Vectors.norm(dv, 3.7) ~== math.pow(dv.toArray.foldLeft(0.0)((a, v) => + a + math.pow(math.abs(v), 3.7)), 1.0 / 3.7) relTol 1E-8) +assert(Vectors.norm(sv, 3.7) ~== math.pow(dv.toArray.foldLeft(0.0)((a, v) => --- End diff -- `dv` -> `sv` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20970806 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { --- End diff -- ha~ It only works if I change type from `Double` to `Int`. See the oracle doc you referenced `The Java Virtual Machine's tableswitch and lookupswitch instructions operate only on int data. Because operations on byte, char, or short values are internally promoted to int, a switch whose expression evaluates to one of those types is compiled as though it evaluated to type int.` With ```scala def fun1(p: Int) = { (p: @switch) match { case 1 => 1 case 2 => 2 case _ => p } } ``` I got ``` public fun1(I)I L0 LINENUMBER 147 L0 ILOAD 1 ISTORE 2 ILOAD 2 TABLESWITCH 1: L1 2: L2 default: L3 L3 LINENUMBER 150 L3 FRAME APPEND [I] ILOAD 1 GOTO L4 L2 LINENUMBER 149 L2 FRAME SAME ICONST_2 GOTO L4 L1 LINENUMBER 148 L1 FRAME SAME ICONST_1 L4 LINENUMBER 147 L4 FRAME SAME1 I IRETURN L5 LOCALVARIABLE this Lorg/apache/spark/mllib/stat/Test$; L0 L5 0 LOCALVARIABLE p I L0 L5 1 MAXSTACK = 1 MAXLOCALS = 3 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20970221 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { --- End diff -- This is an interesting tangent. What happens if you add `@switch`? http://www.scala-lang.org/api/current/index.html#scala.annotation.switch Bytecode should have instructions for `switch` statements that aren't just conditionals, like `tableswitch`: https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-3.html#jvms-3.10 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20968353 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { --- End diff -- In bytecode, there is no direct `switch` operation. As a result, the `swtich` or pattern matching will be compiled into `if` statement in the bytecode. See the following example ```scala def fun1(p: Double) = { p match { case 1.0 => 1.0 case 2.0 => 2.0 case _ => p } } def fun2(p: Double) = { if (p == 1.0) 1.0 else if (p == 2.0) 2.0 else p } ``` will be compiled to ``` // access flags 0x1 public fun1(D)D L0 LINENUMBER 145 L0 DLOAD 1 DSTORE 3 L1 LINENUMBER 146 L1 DCONST_1 DLOAD 3 DCMPL IFNE L2 DCONST_1 DSTORE 5 GOTO L3 L2 LINENUMBER 147 L2 FRAME APPEND [D] LDC 2.0 DLOAD 3 DCMPL IFNE L4 LDC 2.0 DSTORE 5 GOTO L3 L4 LINENUMBER 148 L4 FRAME SAME DLOAD 1 DSTORE 5 L3 LINENUMBER 145 L3 FRAME APPEND [D] DLOAD 5 DRETURN L5 LOCALVARIABLE this Lorg/apache/spark/mllib/stat/Test$; L0 L5 0 LOCALVARIABLE p D L0 L5 1 MAXSTACK = 4 MAXLOCALS = 7 // access flags 0x1 public fun2(D)D L0 LINENUMBER 153 L0 DLOAD 1 DCONST_1 DCMPL IFNE L1 DCONST_1 GOTO L2 L1 LINENUMBER 154 L1 FRAME SAME DLOAD 1 LDC 2.0 DCMPL IFNE L3 LDC 2.0 GOTO L2 L3 LINENUMBER 155 L3 FRAME SAME DLOAD 1 L2 LINENUMBER 153 L2 FRAME SAME1 D DRETURN L4 LOCALVARIABLE this Lorg/apache/spark/mllib/stat/Test$; L0 L4 0 LOCALVARIABLE p D L0 L4 1 MAXSTACK = 4 MAXLOCALS = 3 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20967444 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += math.abs(values(i)) +i += 1 + } + sum +} else if (p == 2) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += values(i) * values(i) --- End diff -- @mengxr Having `var value` outside will generate the same bytecode. It doesn't make sense, since in the bytecode, it just stores the value to stack, and no difference between two version. @srowen I'm very curious about this myself. :P I'm using ASM Bytecode Outline plugin in intellij, https://plugins.jetbrains.com/plugin/5918 so I can generate the bytecode by just simple right click. We use it to find couple boxing/unboxing performance issue at Alpine, and it's very useful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20961526 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += math.abs(values(i)) +i += 1 + } + sum +} else if (p == 2) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += values(i) * values(i) --- End diff -- Just curious, what if you define `var value` outside the while loop? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20961460 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += math.abs(values(i)) +i += 1 + } + sum +} else if (p == 2) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += values(i) * values(i) --- End diff -- Oh OK, didn't mean to make you spend so much time investigating. It is not optimized I imagine but may be inconsequential even in a tight loop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20961188 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += math.abs(values(i)) +i += 1 + } + sum +} else if (p == 2) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += values(i) * values(i) --- End diff -- They will generate different bytecode, but I don't see the performance difference. Maybe `DSTORE` to store the `value` to local variable is more expensive than looping up twice. Or maybe JVM just optimizes it internally. I don't have preference. ```scala sum += values(i) * values(i) ``` will translate to ``` L24 LINENUMBER 292 L24 DLOAD 13 ALOAD 4 ILOAD 15 DALOAD ALOAD 4 ILOAD 15 DALOAD DMUL DADD DSTORE 13 ``` while ```scala val value = values(i) sum += value * value ``` will translate to ``` L24 LINENUMBER 292 L24 ALOAD 4 ILOAD 15 DALOAD DSTORE 16 L25 LINENUMBER 293 L25 DLOAD 13 DLOAD 16 DLOAD 16 DMUL DADD DSTORE 13 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64537183 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23886/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64537174 [Test build #23886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23886/consoleFull) for PR 3462 at commit [`0c3637f`](https://github.com/apache/spark/commit/0c3637f05fafc4be89a46816eabfc30274be7759). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64535777 [Test build #23885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23885/consoleFull) for PR 3462 at commit [`6fa616c`](https://github.com/apache/spark/commit/6fa616cd6e68e0e1a8c87551925e8b2d2af7c11b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64535783 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23885/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64532433 [Test build #23881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23881/consoleFull) for PR 3462 at commit [`9b7cb56`](https://github.com/apache/spark/commit/9b7cb563d0c0521d2e6797786a9256a431abb941). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64532440 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23881/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20921916 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { --- End diff -- Interesting, will try tomorrow. But I don't expect too much difference. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20921892 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += math.abs(values(i)) +i += 1 + } + sum +} else if (p == 2) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += values(i) * values(i) --- End diff -- will try to see if scala generates the same bytecode tomorrow. Maybe scala compiler optimizes it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20921838 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += math.abs(values(i)) +i += 1 + } + sum +} else if (p == 2) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += values(i) * values(i) --- End diff -- I tried that, and JVM will optimize it so no performance difference. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20921760 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { --- End diff -- How about `p match { ...` here? with `@switch` to ensure it's just a lookup? should be faster even than `if`s. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20921781 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -261,6 +261,57 @@ object Vectors { sys.error("Unsupported Breeze vector type: " + v.getClass.getName) } } + + /** + * Returns the p-norm of this vector. + * @param vector input vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(vector: Vector, p: Double): Double = { +require(p >= 1.0) +val values = vector match { + case dv: DenseVector => dv.values + case sv: SparseVector => sv.values + case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) +} +val size = values.size + +if (p == 1) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += math.abs(values(i)) +i += 1 + } + sum +} else if (p == 2) { + var sum = 0.0 + var i = 0 + while (i < size) { +sum += values(i) * values(i) --- End diff -- Since this is a common case and tight loop, avoid the duplicated value lookup? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64527790 [Test build #23886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23886/consoleFull) for PR 3462 at commit [`0c3637f`](https://github.com/apache/spark/commit/0c3637f05fafc4be89a46816eabfc30274be7759). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64526555 [Test build #23885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23885/consoleFull) for PR 3462 at commit [`6fa616c`](https://github.com/apache/spark/commit/6fa616cd6e68e0e1a8c87551925e8b2d2af7c11b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64524976 [Test build #23881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23881/consoleFull) for PR 3462 at commit [`9b7cb56`](https://github.com/apache/spark/commit/9b7cb563d0c0521d2e6797786a9256a431abb941). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20919934 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -85,6 +85,52 @@ sealed trait Vector extends Serializable { * with type `Double`. */ private[spark] def foreachActive(f: (Int, Double) => Unit) + + /** + * Returns the p-norm of this vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(p: Double): Double + + protected def norm(p: Double, values: Array[Double]): Double = { --- End diff -- The parent Vector class doesn't have values member variable. Unless we do another pattern matching, otherwise it will not work. I think it's okay to norm as the member function of `Vector`. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64514649 [Test build #23869 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23869/consoleFull) for PR 3462 at commit [`0b632e6`](https://github.com/apache/spark/commit/0b632e601c6c49d20f3b2eac6bea9642bff39ae9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3462#issuecomment-64514652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23869/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3462#discussion_r20915769 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -85,6 +85,52 @@ sealed trait Vector extends Serializable { * with type `Double`. */ private[spark] def foreachActive(f: (Int, Double) => Unit) + + /** + * Returns the p-norm of this vector. + * @param p norm. + * @return norm in L^p^ space. + */ + private[spark] def norm(p: Double): Double + + protected def norm(p: Double, values: Array[Double]): Double = { --- End diff -- I think we should move this method to `Vectors` because it is a static method, like the following: ~~~ object Vectors { private def norm(values: Array[Double], p: Double): Double = { ... } private[mllib] def norm(v: Vector, p: Double): Double = { norm(v.values, p) } } ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org