Hello, I have a dataset containing TF-IDF vectors for a corpus of documents. How do I perform a nearest neighbour search on the dataset, using cosine similarity?
val df = spark.read.option("header", "false").csv("data") val tk = new Tokenizer().setInputCol("_c2").setOutputCol("words") val tf = new HashingTF().setInputCol("words").setOutputCol("tf") val idf = new IDF().setInputCol("tf").setOutputCol("tf-idf") val df1 = tf.transform(tk.transform(df)) idf.fit(df1).transform(df1).select("tf-idf").show(10) Thank you -- *Meeraj Kunnumpurath* *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* *00 971 50 409 0169mee...@servicesymphony.com <mee...@servicesymphony.com>*