Hi again, I found example in Scala <https://stackoverflow.com/questions/43797758/calculate-co-occurrence-terms-with-spark-using-scala?rq=1> but I don't have any experience with scala? can anyone convert it to java please?
Thank you, Donni On Fri, Mar 23, 2018 at 8:57 AM, Donni Khan <prince.don...@googlemail.com> wrote: > Hi, > > I have a collection of text documents, I extracted the list of significat > terms from that collection. > I want to calculate co-occurance matrix for the extracted terms by using > spark. > > I actually stored the the collection of text document in a DataFrame, > > StructType schema = *new* StructType(*new* StructField[] { > > *new* StructField("ID", DataTypes.*StringType*, *false*, > > Metadata.*empty*()), > > *new* StructField("text", DataTypes.*StringType*, *false*, > > Metadata.*empty*()) }); > > // Create a DataFrame *wrt* a new schema > > DataFrame preProcessedDF = sqlContext.createDataFrame(jrdd, schema); > > I can extract the list of terms from "preProcessedDF " into a List or RDD > or DataFrame. > for each (term_i,term_j) I want to calculate the realted frequency from > the original dataset "preProcessedDF " > > anyone has scalbale soloution? > > thank you, > Donni > > > > > > > > >