Github user KyleLi1985 commented on the issue:
https://github.com/apache/spark/pull/22893
> I don't think BLAS matters here as these are all vector-vector operations
and f2jblas is used directly (i.e. stays in the JVM).
>
> Are all the vectors dense? I suppose I'm still surprised if sqdist is
faster than dot here as it ought to be a little more math. The sparse-dense
case might come out differently, note.
>
> And I suppose I have a hard time believing that the sparse-sparse case is
faster after this change (when the precision bound is met) because now it's
handled in the sparse-sparse if case in this code, which definitely does a dot
plus more work.
>
> (If you did remove this check you could remove some other values that get
computed to check this bound, like precision1)
We use only "Vectors Dense", here is the test file
[SparkMLlibTest.txt](https://github.com/apache/spark/files/2538356/SparkMLlibTest.txt)
I extract the relevant part from code and compare the performance, The
result show in Vectors Dense situation the sqdist is fasterã
And for End-to-End test, I consider the worst situation, input vector are
all dense and the precision is not OK!
`
if (precisionBound1 < precision && !v1.isInstanceOf[DenseVector]
&& !v2.isInstanceOf[DenseVector]) {
sqDist = sumSquaredNorm - 2.0 * dot(v1, v2)
} else if (v1.isInstanceOf[SparseVector] ||
v2.isInstanceOf[SparseVector]) {
val dotValue = dot(v1, v2)
sqDist = math.max(sumSquaredNorm - 2.0 * dotValue, 0.0)
val precisionBound2 = EPSILON * (sumSquaredNorm + 2.0 *
math.abs(dotValue)) /
(sqDist + EPSILON)
if (precisionBound2 > precision) {
sqDist = Vectors.sqdist(v1, v2)
}
} else {
sqDist = Vectors.sqdist(v1, v2)
}
`
only use sqdist to calculate distance when the logic is presented above
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]