Hi,
I'm not sure how to improve this kind of queries only on vanilla spark
though,
you can write custom physical plans for top-k queries.
You can check the link below as a reference;
benchmark: https://github.com/apache/incubator-hivemall/pull/33
manual:
Hi;
I have 2 dataframes. I am trying to cross join for finding vector distances.
Then i can choose the most similiar vectors.
Cross join speed is too slow. How can i increase the speed, or have you any
suggestion for this comparision?
val result=myDict.join(mainDataset).map(x=>{