Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > I don't think BLAS matters here as these are all vector-vector operations and f2jblas is used directly (i.e. stays in the JVM). > > Are all the vectors dense? I suppose I'm still surprised if sqdist is faster than dot here as it ought to be a little more math. The sparse-dense case might come out differently, note. > > And I suppose I have a hard time believing that the sparse-sparse case is faster after this change (when the precision bound is met) because now it's handled in the sparse-sparse if case in this code, which definitely does a dot plus more work. > > (If you did remove this check you could remove some other values that get computed to check this bound, like precision1) We use only "Vectors Dense", here is the test file [SparkMLlibTest.txt](https://github.com/apache/spark/files/2538356/SparkMLlibTest.txt) I extract the relevant part from code and compare the performance, The result show in Vectors Dense situation the sqdist is fasterã And for End-to-End test, I consider the worst situation, input vector are all dense and the precision is not OK! ` if (precisionBound1 < precision && !v1.isInstanceOf[DenseVector] && !v2.isInstanceOf[DenseVector]) { sqDist = sumSquaredNorm - 2.0 * dot(v1, v2) } else if (v1.isInstanceOf[SparseVector] || v2.isInstanceOf[SparseVector]) { val dotValue = dot(v1, v2) sqDist = math.max(sumSquaredNorm - 2.0 * dotValue, 0.0) val precisionBound2 = EPSILON * (sumSquaredNorm + 2.0 * math.abs(dotValue)) / (sqDist + EPSILON) if (precisionBound2 > precision) { sqDist = Vectors.sqdist(v1, v2) } } else { sqDist = Vectors.sqdist(v1, v2) } ` only use sqdist to calculate distance when the logic is presented above
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org