Hi all,
I currently have a mapPartitions job which is flatMapping each value in the
iterator, and I'm running into an issue where there will be major GC costs
on certain executions. Some executors will take 20 minutes, 15 of which are
pure garbage collection, and I believe that a lot of it has to
Hi all,
I need to be able to find the cosine similarity of a series of vectors (for
the sake of arguments let's say that every vector is a tweet). However, I'm
having an issue with how I can actually prepare my data to use the
Columnsimilarity function. I'm receiving these vectors in row format