Repository: spark
Updated Branches:
  refs/heads/master d6894b1c5 -> 7b0ed7979


[SPARK-5419][Mllib] Fix the logic in Vectors.sqdist

The current implementation in Vectors.sqdist is not efficient because of 
allocating temp arrays. There is also a bug in the code `v1.indices.length / 
v1.size < 0.5`. This pr fixes the bug and refactors sqdist without allocating 
new arrays.

Author: Liang-Chi Hsieh <vii...@gmail.com>

Closes #4217 from viirya/fix_sqdist and squashes the following commits:

e8b0b3d [Liang-Chi Hsieh] For review comments.
314c424 [Liang-Chi Hsieh] Fix sqdist bug.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b0ed797
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7b0ed797
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7b0ed797

Branch: refs/heads/master
Commit: 7b0ed797958a91cda73baa7aa49ce66bfcb6b64b
Parents: d6894b1
Author: Liang-Chi Hsieh <vii...@gmail.com>
Authored: Tue Jan 27 01:29:14 2015 -0800
Committer: Xiangrui Meng <m...@databricks.com>
Committed: Tue Jan 27 01:29:14 2015 -0800

----------------------------------------------------------------------
 .../org/apache/spark/mllib/linalg/Vectors.scala  | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/7b0ed797/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
index b3022ad..2834ea7 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
@@ -371,18 +371,23 @@ object Vectors {
           squaredDistance += score * score
         }
 
-      case (v1: SparseVector, v2: DenseVector) if v1.indices.length / v1.size 
< 0.5 =>
+      case (v1: SparseVector, v2: DenseVector) =>
         squaredDistance = sqdist(v1, v2)
 
-      case (v1: DenseVector, v2: SparseVector) if v2.indices.length / v2.size 
< 0.5 =>
+      case (v1: DenseVector, v2: SparseVector) =>
         squaredDistance = sqdist(v2, v1)
 
-      // When a SparseVector is approximately dense, we treat it as a 
DenseVector
-      case (v1, v2) =>
-        squaredDistance = v1.toArray.zip(v2.toArray).foldLeft(0.0){ (distance, 
elems) =>
-          val score = elems._1 - elems._2
-          distance + score * score
+      case (DenseVector(vv1), DenseVector(vv2)) =>
+        var kv = 0
+        val sz = vv1.size
+        while (kv < sz) {
+          val score = vv1(kv) - vv2(kv)
+          squaredDistance += score * score
+          kv += 1
         }
+      case _ =>
+        throw new IllegalArgumentException("Do not support vector type " + 
v1.getClass +
+          " and " + v2.getClass)
     }
     squaredDistance
   }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to