Hi all,

I want to try the TF-IDF functionality in MLlib.
I can feed it words and generate the tf and idf  RDD[Vector]s, using the code 
below.
But how do I get this back to words and their counts and tf-idf values for 
presentation?


val sentsTmp = sqlContext.sql("SELECT text FROM sentenceTable")
val documents: RDD[Seq[String]] = sentsTmp.map(_.toString.split(" ").toSeq)
val hashingTF = new HashingTF()
val tf: RDD[Vector] = hashingTF.transform(documents)
tf.cache()
val idf = new IDF().fit(tf)
val tfidf: RDD[Vector] = idf.transform(tf)

It looks like I can get the indices of the terms using something like

J = wordListRDD.map(w => hashingTF.indexOf(w))

where wordList is an RDD holding the distinct words from the sequence of words 
used to come up with tf.
But how do I do the equivalent of

Counts  = J.map(j => tf.counts(j))  ?

Thanks,
Ron

Reply via email to