Hi all, I want to run huge number of queries on Dataframe in Spark. I have a big data of text documents, I loded all documents into SparkDataFrame and create a temp table.
dataFrame.registerTempTable("table1"); I have more than 50,000 terms, I want to get the document frequency for each by using the "table1". I use the follwing: DataFrame df=sqlContext.sql("select count(ID) from table1 where text like '%"+term+"%'"); but this scenario needs much time to finish because I have t run it from Spark Driver for each term. Does anyone has idea how I can run all queries in distributed way? Thank you && Best Regards, Donni