Github user KyleLi1985 commented on the issue:

    https://github.com/apache/spark/pull/22893
  
    > I don't think BLAS matters here as these are all vector-vector operations 
and f2jblas is used directly (i.e. stays in the JVM).
    > 
    > Are all the vectors dense? I suppose I'm still surprised if sqdist is 
faster than dot here as it ought to be a little more math. The sparse-dense 
case might come out differently, note.
    > 
    > And I suppose I have a hard time believing that the sparse-sparse case is 
faster after this change (when the precision bound is met) because now it's 
handled in the sparse-sparse if case in this code, which definitely does a dot 
plus more work.
    > 
    > (If you did remove this check you could remove some other values that get 
computed to check this bound, like precision1)
    
    We use only "Vectors Dense", here is the test file
    
[SparkMLlibTest.txt](https://github.com/apache/spark/files/2538030/SparkMLlibTest.txt)
    I extract the relevant part from code and compare the performance, The 
result show in Vectors Dense situation the sqdist is faster。
    And for End-to-End test, I consider the worst situation, input vector are 
all dense and the precision is not OK!
    
    And other situation definitive like your saying, if the precision is ok to 
use sqdist is not a good way.
    So, this part logic should be 
    `
        if (precisionBound1 < precision && v1.isInstanceOf[SparseVector] && 
v2.isInstanceOf[SparseVector]) 
       {
           sqDist = sumSquaredNorm - 2.0 * dot(v1, v2)
        } else if (v1.isInstanceOf[SparseVector] || 
v2.isInstanceOf[SparseVector]) {
          val dotValue = dot(v1, v2)
          sqDist = math.max(sumSquaredNorm - 2.0 * dotValue, 0.0)
          val precisionBound2 = EPSILON * (sumSquaredNorm + 2.0 * 
math.abs(dotValue)) /
            (sqDist + EPSILON)
          if (precisionBound2 > precision) {
            sqDist = Vectors.sqdist(v1, v2)
          }
        } else {
          sqDist = Vectors.sqdist(v1, v2)
        }
    `
    only use sqdist to calculate distance when the logic like above
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to