zhengruifeng commented on issue #25178: [SPARK-28421][ML] SparseVector.apply 
performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514035619
 
 
   I test the perf among current impl (`apply`) , direct binary-search 
(`apply2`), binary-seach with extra range check (`apply3`)
   ```
     def apply2(i: Int): Double = {
       if (i < 0 || i >= size) {
         throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, 
$size)")
       }
   
       val j = util.Arrays.binarySearch(indices, i)
       if (j < 0) 0.0 else values(j)
     }
   
     def apply3(i: Int): Double = {
       if (i < 0 || i >= size) {
         throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, 
$size)")
       }
   
       if (indices.isEmpty || i < indices(0) || i > indices(indices.length - 
1)) {
         0.0
       } else {
         val j = util.Arrays.binarySearch(indices, i)
         if (j < 0) 0.0 else values(j)
       }
     }
   ```
   
   the test suite is similar with the above one
   ```
   import scala.util.Random
   import org.apache.spark.ml.linalg._
   
   val size = 10000000
   for (nnz <- Seq(100, 10000, 1000000)) {
        val rng = new Random(123)
        val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % 
size).distinct.sorted.take(nnz)
        val values = Array.fill(nnz)(rng.nextDouble)
        val vec = Vectors.sparse(size, indices, values).toSparse
   
        val tic1 = System.currentTimeMillis;
        (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < 
size) {sum+=vec(i); i+=1} };
        val toc1 = System.currentTimeMillis;
   
        val tic2 = System.currentTimeMillis;
        (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < 
size) {sum+=vec.apply2(i); i+=1} };
        val toc2 = System.currentTimeMillis;
   
        val tic3 = System.currentTimeMillis;
        (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < 
size) {sum+=vec.apply3(i); i+=1} };
        val toc3 = System.currentTimeMillis;
        
        println((size, nnz, toc1 - tic1, toc2 - tic2, toc3 - tic3))
   }
   ```
   
   
   
   | size|  nnz | apply(old) | apply2 | apply3|
   |------|----------|------------|----------|----------|
   |10000000|100|75294|13744|15318|
   |10000000|10000|75616|33543|21112|
   |10000000|1000000|92949|52483|34215|
   
   The `apply2` here is faster only if `nnz` is 100 in the suite, I guess that 
that is because the seach complexity is realy small and the cost of extra two 
range check  can not be ignored.
   But when `nnz` is relative a greater number (however still small), `apply3` 
is faster than `apply2`. I guess that is due to avoid calling 
`Arrays.binarySearch`, `Arrays.binarySearch` do not check the range internally 
and still perferm `log(nnz)` check even if the input key is out of range.
   In sum, I tend to keep current pr with extra range check.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to