Hi all, I'm following an TF-IDF example but I’m having some issues that i’m not sure how to fix.
The input is the following val test = sc.textFile("s3n://.../test_tfidf_products.txt") test.collect.mkString("\n") which prints test: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[370] at textFile at <console>:121 res241: String = a a b c d e b c d d After that, I compute the ratings by doing val test2 = test.map(_.split(" ").toSeq) val hashingTF2 = new HashingTF() val tf2: RDD[Vector] = hashingTF2.transform(test2) tf2.cache() val idf2 = new IDF().fit(tf2) val tfidf2: RDD[Vector] = idf2.transform(tf2) val expandedText = idfModel.transform(tf) tfidf2.collect.mkString("\n") which prints (1048576,[97,98,99,100,101],[0.8109302162163288,0.0,0.0,0.0,0.4054651081081644]) (1048576,[98,99,100],[0.0,0.0,0.0]) The numbers [97,98,99,100,101] are indexes of the vector tfidf2. I need to access the rating for example for item “a”, but the only way i have been able to do this is using the method indexOf() of the hasingTF object. hashingTF2.indexOf("a") res236: Int = 97 Is there a better way to perform this? Thank you all.