Hi, I'm new to Mllib and spark. I'm trying to use tf-idf and use those values for term ranking. I'm getting tf values in vector format, but how can get the values of vector?
val sc = new SparkContext(conf) val documents: RDD[Seq[String]] = sc.textFile("/home/andrejs/Datasets/dbpedia/test.txt").map(_.split(" ").toSeq) documents.foreach(println(_)) val hashingTF = new HashingTF() val tf: RDD[Vector] = hashingTF.transform(documents) tf.foreach(println(_)) My output is : WrappedArray(a, a, b, c) WrappedArray(e, a, c, d) (1048576,[97,99,100,101],[1.0,1.0,1.0,1.0]) (1048576,[97,98,99],[2.0,1.0,1.0]) How can I get [97,99,100,101] out, and [1.0,1.0,1.0,1.0] ? And how can I map that 100 = 1.0 ? Some help is greatly appreciated, Andrejs