zhengruifeng commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514035619 I test the perf among current impl (`apply`) , direct binary-search (`apply2`), binary-seach with extra range check (`apply3`) ``` def apply2(i: Int): Double = { if (i < 0 || i >= size) { throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, $size)") } val j = util.Arrays.binarySearch(indices, i) if (j < 0) 0.0 else values(j) } def apply3(i: Int): Double = { if (i < 0 || i >= size) { throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, $size)") } if (indices.isEmpty || i < indices(0) || i > indices(indices.length - 1)) { 0.0 } else { val j = util.Arrays.binarySearch(indices, i) if (j < 0) 0.0 else values(j) } } ``` the test suite is similar with the above one ``` import scala.util.Random import org.apache.spark.ml.linalg._ val size = 10000000 for (nnz <- Seq(100, 10000, 1000000)) { val rng = new Random(123) val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.sorted.take(nnz) val values = Array.fill(nnz)(rng.nextDouble) val vec = Vectors.sparse(size, indices, values).toSparse val tic1 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec(i); i+=1} }; val toc1 = System.currentTimeMillis; val tic2 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply2(i); i+=1} }; val toc2 = System.currentTimeMillis; val tic3 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply3(i); i+=1} }; val toc3 = System.currentTimeMillis; println((size, nnz, toc1 - tic1, toc2 - tic2, toc3 - tic3)) } ``` | size| nnz | apply(old) | apply2 | apply3| |------|----------|------------|----------|----------| |10000000|100|75294|13744|15318| |10000000|10000|75616|33543|21112| |10000000|1000000|92949|52483|34215| The `apply2` here is faster only if `nnz` is 100 in the suite, I guess that that is because the seach complexity is realy small and the cost of extra two range check can not be ignored. But when `nnz` is relative a greater number (however still small), `apply3` is faster than `apply2`. I guess that is due to avoid calling `Arrays.binarySearch`, `Arrays.binarySearch` do not check the range internally and still perferm `log(nnz)` check even if the input key is out of range. In sum, I tend to keep current pr with extra range check.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org