Hi All, I would like to achieve this below output using spark , I managed to write in Hive and call it in spark but not in just spark (scala), how to group word counts on particular user (column) for example. Imagine users and their given tweets I want to do word count based on user name.
Input:- kali A,B,A,B,B james B,A,A,A,B Output:- kali A [Count] B [Count] James A [Count] B [Count] My Hive Answer:- CREATE EXTERNAL TABLE TEST ( user_name string , COMMENTS STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE LOCATION '/data/kali/test'; ---- HDFS FOLDER (create hdfs folder and create a text file with data mentioned in the email) use default;select user_name,COLLECT_SET(text) from (select user_name,concat(sub,' ',count(comments)) as text from test LATERAL VIEW explode(split(comments,',')) subView AS sub group by user_name,sub)w group by user_name; Spark With Hive:- package com.examples /** * Created by kalit_000 on 17/09/2015. */ import org.apache.log4j.Logger import org.apache.log4j.Level import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.SparkContext._ object HiveWordCount { def main(args: Array[String]): Unit = { Logger.getLogger("org").setLevel(Level.WARN) Logger.getLogger("akka").setLevel(Level.WARN) val conf = new SparkConf().setMaster("local").setAppName("HiveWordCount").set("spark.executor.memory", "1g") val sc = new SparkContext(conf) val sqlContext= new SQLContext(sc) val hc=new HiveContext(sc) hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS default.TEST (user_name string ,COMMENTS STRING )ROW FORMAT DELIMITED FIELDS TERMINATED BY '001' STORED AS TEXTFILE LOCATION '/data/kali/test' ") val op=hc.sql("select user_name,COLLECT_SET(text) from (select user_name,concat(sub,' ',count(comments)) as text from default.test LATERAL VIEW explode(split(comments,',')) subView AS sub group by user_name,sub)w group by user_name") op.collect.foreach(println) } Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/word-count-group-by-users-in-spark-tp24748.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org