[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-65179216
  
LGTM. Merged into master and branch-1.2. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3462


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-65043520
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23984/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-65043512
  
  [Test build #23984 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23984/consoleFull)
 for   PR 3462 at commit 
[`63c7165`](https://github.com/apache/spark/commit/63c71659ab7aa3bbea1a505f872dceeca5d3ab2f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-65035512
  
  [Test build #23984 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23984/consoleFull)
 for   PR 3462 at commit 
[`63c7165`](https://github.com/apache/spark/commit/63c71659ab7aa3bbea1a505f872dceeca5d3ab2f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r21076434
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
--- End diff --

yeah. but this will not work here unless p has type of `Int`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r21075718
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
--- End diff --

It is an interesting discussion ~ :) But maybe more people are familiar 
with the `if ... else if ... else` statement. And this is not on the critical 
path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-12-01 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r21075634
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala ---
@@ -197,4 +198,27 @@ class VectorsSuite extends FunSuite {
 assert(svMap.get(2) === Some(3.1))
 assert(svMap.get(3) === Some(0.0))
   }
+
+  test("vector p-norm") {
+val dv = Vectors.dense(0.0, -1.2, 3.1, 0.0, -4.5, 1.9)
+val sv = Vectors.sparse(6, Seq((1, -1.2), (2, 3.1), (3, 0.0), (4, 
-4.5), (5, 1.9)))
+
+assert(Vectors.norm(dv, 1.0) ~== dv.toArray.foldLeft(0.0)((a, v) =>
+  a + math.abs(v)) relTol 1E-8)
+assert(Vectors.norm(sv, 1.0) ~== sv.toArray.foldLeft(0.0)((a, v) =>
+  a + math.abs(v)) relTol 1E-8)
+
+assert(Vectors.norm(dv, 2.0) ~== 
math.sqrt(dv.toArray.foldLeft(0.0)((a, v) =>
+  a + v * v)) relTol 1E-8)
+assert(Vectors.norm(sv, 2.0) ~== 
math.sqrt(sv.toArray.foldLeft(0.0)((a, v) =>
+  a + v * v)) relTol 1E-8)
+
+assert(Vectors.norm(dv, Double.PositiveInfinity) ~== 
dv.toArray.map(math.abs).max relTol 1E-8)
+assert(Vectors.norm(sv, Double.PositiveInfinity) ~== 
sv.toArray.map(math.abs).max relTol 1E-8)
+
+assert(Vectors.norm(dv, 3.7) ~== math.pow(dv.toArray.foldLeft(0.0)((a, 
v) =>
+  a + math.pow(math.abs(v), 3.7)), 1.0 / 3.7) relTol 1E-8)
+assert(Vectors.norm(sv, 3.7) ~== math.pow(dv.toArray.foldLeft(0.0)((a, 
v) =>
--- End diff --

`dv` -> `sv`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20970806
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
--- End diff --

ha~ It only works if I change type from `Double` to `Int`. See the oracle 
doc you referenced `The Java Virtual Machine's tableswitch and lookupswitch 
instructions operate only on int data. Because operations on byte, char, or 
short values are internally promoted to int, a switch whose expression 
evaluates to one of those types is compiled as though it evaluated to type int.`

With 
```scala
  def fun1(p: Int) = {
(p: @switch) match {
  case 1 => 1
  case 2 => 2
  case _ => p
}
  }
```
I got
```
  public fun1(I)I
   L0
LINENUMBER 147 L0
ILOAD 1
ISTORE 2
ILOAD 2
TABLESWITCH
  1: L1
  2: L2
  default: L3
   L3
LINENUMBER 150 L3
   FRAME APPEND [I]
ILOAD 1
GOTO L4
   L2
LINENUMBER 149 L2
   FRAME SAME
ICONST_2
GOTO L4
   L1
LINENUMBER 148 L1
   FRAME SAME
ICONST_1
   L4
LINENUMBER 147 L4
   FRAME SAME1 I
IRETURN
   L5
LOCALVARIABLE this Lorg/apache/spark/mllib/stat/Test$; L0 L5 0
LOCALVARIABLE p I L0 L5 1
MAXSTACK = 1
MAXLOCALS = 3
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20970221
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
--- End diff --

This is an interesting tangent. What happens if you add `@switch`? 
http://www.scala-lang.org/api/current/index.html#scala.annotation.switch  
Bytecode should have instructions for `switch` statements that aren't just 
conditionals, like `tableswitch`: 
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-3.html#jvms-3.10


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20968353
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
--- End diff --

In bytecode, there is no direct `switch` operation. As a result, the 
`swtich` or pattern matching will be compiled into `if` statement in the 
bytecode. See the following example
```scala
  def fun1(p: Double) = {
p match {
  case 1.0 => 1.0
  case 2.0 => 2.0
  case _ => p
}
  }

  def fun2(p: Double) = {
if (p == 1.0) 1.0
else if (p == 2.0) 2.0
else p
  }
```
will be compiled to
```
  // access flags 0x1
  public fun1(D)D
   L0
LINENUMBER 145 L0
DLOAD 1
DSTORE 3
   L1
LINENUMBER 146 L1
DCONST_1
DLOAD 3
DCMPL
IFNE L2
DCONST_1
DSTORE 5
GOTO L3
   L2
LINENUMBER 147 L2
   FRAME APPEND [D]
LDC 2.0
DLOAD 3
DCMPL
IFNE L4
LDC 2.0
DSTORE 5
GOTO L3
   L4
LINENUMBER 148 L4
   FRAME SAME
DLOAD 1
DSTORE 5
   L3
LINENUMBER 145 L3
   FRAME APPEND [D]
DLOAD 5
DRETURN
   L5
LOCALVARIABLE this Lorg/apache/spark/mllib/stat/Test$; L0 L5 0
LOCALVARIABLE p D L0 L5 1
MAXSTACK = 4
MAXLOCALS = 7

  // access flags 0x1
  public fun2(D)D
   L0
LINENUMBER 153 L0
DLOAD 1
DCONST_1
DCMPL
IFNE L1
DCONST_1
GOTO L2
   L1
LINENUMBER 154 L1
   FRAME SAME
DLOAD 1
LDC 2.0
DCMPL
IFNE L3
LDC 2.0
GOTO L2
   L3
LINENUMBER 155 L3
   FRAME SAME
DLOAD 1
   L2
LINENUMBER 153 L2
   FRAME SAME1 D
DRETURN
   L4
LOCALVARIABLE this Lorg/apache/spark/mllib/stat/Test$; L0 L4 0
LOCALVARIABLE p D L0 L4 1
MAXSTACK = 4
MAXLOCALS = 3
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20967444
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += math.abs(values(i))
+i += 1
+  }
+  sum
+} else if (p == 2) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += values(i) * values(i)
--- End diff --

@mengxr Having `var value` outside will generate the same bytecode. It 
doesn't make sense, since in the bytecode, it just stores the value to stack, 
and no difference between two version.

@srowen I'm very curious about this myself. :P I'm using ASM Bytecode 
Outline plugin in intellij, https://plugins.jetbrains.com/plugin/5918 so I can 
generate the bytecode by just simple right click. We use it to find couple 
boxing/unboxing performance issue at Alpine, and it's very useful. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20961526
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += math.abs(values(i))
+i += 1
+  }
+  sum
+} else if (p == 2) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += values(i) * values(i)
--- End diff --

Just curious, what if you define `var value` outside the while loop?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20961460
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += math.abs(values(i))
+i += 1
+  }
+  sum
+} else if (p == 2) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += values(i) * values(i)
--- End diff --

Oh OK, didn't mean to make you spend so much time investigating. It is not 
optimized I imagine but may be inconsequential even in a tight loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20961188
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += math.abs(values(i))
+i += 1
+  }
+  sum
+} else if (p == 2) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += values(i) * values(i)
--- End diff --

They will generate different bytecode, but I don't see the performance 
difference. Maybe `DSTORE` to store the `value` to local variable is more 
expensive than looping up twice. Or maybe JVM just optimizes it internally. I 
don't have preference.

```scala
sum += values(i) * values(i)
```
will translate to
```
   L24
LINENUMBER 292 L24
DLOAD 13
ALOAD 4
ILOAD 15
DALOAD
ALOAD 4
ILOAD 15
DALOAD
DMUL
DADD
DSTORE 13
```
while
```scala
val value = values(i)
sum += value * value
```
will translate to
```
   L24
LINENUMBER 292 L24
ALOAD 4
ILOAD 15
DALOAD
DSTORE 16
   L25
LINENUMBER 293 L25
DLOAD 13
DLOAD 16
DLOAD 16
DMUL
DADD
DSTORE 13
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64537183
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23886/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64537174
  
  [Test build #23886 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23886/consoleFull)
 for   PR 3462 at commit 
[`0c3637f`](https://github.com/apache/spark/commit/0c3637f05fafc4be89a46816eabfc30274be7759).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64535777
  
  [Test build #23885 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23885/consoleFull)
 for   PR 3462 at commit 
[`6fa616c`](https://github.com/apache/spark/commit/6fa616cd6e68e0e1a8c87551925e8b2d2af7c11b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64535783
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23885/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64532433
  
  [Test build #23881 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23881/consoleFull)
 for   PR 3462 at commit 
[`9b7cb56`](https://github.com/apache/spark/commit/9b7cb563d0c0521d2e6797786a9256a431abb941).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64532440
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23881/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20921916
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
--- End diff --

Interesting, will try tomorrow. But I don't expect too much difference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20921892
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += math.abs(values(i))
+i += 1
+  }
+  sum
+} else if (p == 2) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += values(i) * values(i)
--- End diff --

will try to see if scala generates the same bytecode tomorrow. Maybe scala 
compiler optimizes it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20921838
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += math.abs(values(i))
+i += 1
+  }
+  sum
+} else if (p == 2) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += values(i) * values(i)
--- End diff --

I tried that, and JVM will optimize it so no performance difference. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20921760
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
--- End diff --

How about `p match { ...` here? with `@switch` to ensure it's just a 
lookup? should be faster even than `if`s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20921781
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -261,6 +261,57 @@ object Vectors {
 sys.error("Unsupported Breeze vector type: " + v.getClass.getName)
 }
   }
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param vector input vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(vector: Vector, p: Double): Double = {
+require(p >= 1.0)
+val values = vector match {
+  case dv: DenseVector => dv.values
+  case sv: SparseVector => sv.values
+  case v => throw new IllegalArgumentException("Do not support vector 
type " + v.getClass)
+}
+val size = values.size
+
+if (p == 1) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += math.abs(values(i))
+i += 1
+  }
+  sum
+} else if (p == 2) {
+  var sum = 0.0
+  var i = 0
+  while (i < size) {
+sum += values(i) * values(i)
--- End diff --

Since this is a common case and tight loop, avoid the duplicated value 
lookup?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64527790
  
  [Test build #23886 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23886/consoleFull)
 for   PR 3462 at commit 
[`0c3637f`](https://github.com/apache/spark/commit/0c3637f05fafc4be89a46816eabfc30274be7759).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64526555
  
  [Test build #23885 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23885/consoleFull)
 for   PR 3462 at commit 
[`6fa616c`](https://github.com/apache/spark/commit/6fa616cd6e68e0e1a8c87551925e8b2d2af7c11b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64524976
  
  [Test build #23881 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23881/consoleFull)
 for   PR 3462 at commit 
[`9b7cb56`](https://github.com/apache/spark/commit/9b7cb563d0c0521d2e6797786a9256a431abb941).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20919934
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -85,6 +85,52 @@ sealed trait Vector extends Serializable {
*  with type `Double`.
*/
   private[spark] def foreachActive(f: (Int, Double) => Unit)
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(p: Double): Double
+
+  protected def norm(p: Double, values: Array[Double]): Double = {
--- End diff --

The parent Vector class doesn't have values member variable. Unless we do 
another pattern matching, otherwise it will not work. I think it's okay to norm 
as the member function of `Vector`. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64514649
  
  [Test build #23869 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23869/consoleFull)
 for   PR 3462 at commit 
[`0b632e6`](https://github.com/apache/spark/commit/0b632e601c6c49d20f3b2eac6bea9642bff39ae9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3462#issuecomment-64514652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23869/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4611][MLlib] Implement the efficient ve...

2014-11-25 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3462#discussion_r20915769
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -85,6 +85,52 @@ sealed trait Vector extends Serializable {
*  with type `Double`.
*/
   private[spark] def foreachActive(f: (Int, Double) => Unit)
+
+  /**
+   * Returns the p-norm of this vector.
+   * @param p norm.
+   * @return norm in L^p^ space.
+   */
+  private[spark] def norm(p: Double): Double
+
+  protected def norm(p: Double, values: Array[Double]): Double = {
--- End diff --

I think we should move this method to `Vectors` because it is a static 
method, like the following:

~~~
object Vectors {
  private def norm(values: Array[Double], p: Double): Double = {
...
  }

  private[mllib] def norm(v: Vector, p: Double): Double = {
norm(v.values, p)
  }
}
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org