Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > So the pull request right now doesn't reflect what you tested, but you tested the version pasted above. You're saying that the optimization just never helps the dense-dense case, and sqdist is faster than a dot product. This doesn't make sense mathematically as it should be more math, but stranger things have happened. > > Still, I don't follow your test code here. You parallelize one vector, map it, collect it: why use Spark? and it's the same vector over and over, and it's not a big vector. Your sparse vectors aren't very sparse. > > How about more representative input -- larger vectors (100s of elements, probably), more sparse sparse vectors, and a large set of different inputs. I also don't see where the precision bound is changed here? > > This may be a good change but I'm just not yet convinced by the test methodology, and the result still doesn't make much intuitive sense. 1) why use Spark? not for special reason, only align with my common using tool. 2) About the vector, I did a more representative input test, I show this result below 3) About the precision, it is trick, you can meet your goal (let your calculation logic into which branch) by manually change it. As I said in last comment, take LOGIC2 for example, you can manually change precision to -10000 in ( precisionbound1 < precision) and change precision to 10000 in (precisionbound2 > precision), so you calculation login will into LOGIC2 situation. It is like codecoverage thing. Anyway, we goal is to show the performance will not change in same calculation logic before and after added Enhance for sparse-sparse and sparse-dense situation. There is my test file [SparkMLlibTest.txt](https://github.com/apache/spark/files/2544667/SparkMLlibTest.txt) There is my test data situation I use the data http://archive.ics.uci.edu/ml/datasets/Condition+monitoring+of+hydraulic+systems extract file (PS1, PS2, PS3, PS4, PS5, PS6) to form the test data total instances are 13230 the attributes for line are 6000 **Result for sparse-sparse situation time cost (milliseconds)** Before Enhance: 7670, 7704, 7652 After Enhance: 7634, 7729, 7645
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org