Re: Stream group by

2016-02-21 Thread Vinti Maheshwari
> > Yeah, i tried with reduceByKey, was able to do it. Thanks Ayan and Jatin. > For reference, final solution: > > def main(args: Array[String]): Unit = { > val conf = new SparkConf().setAppName("HBaseStream") > val sc = new SparkContext(conf) > // create a StreamingContext, the main

Re: Stream group by

2016-02-21 Thread Vinti Maheshwari
Yeah, i tried with reduceByKey, was able to do it. Thanks Ayan and Jatin. For reference, final solution: def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("HBaseStream") val sc = new SparkContext(conf) // create a StreamingContext, the main entry point for

Re: Stream group by

2016-02-21 Thread ayan guha
I believe the best way would be to use reduceByKey operation. On Mon, Feb 22, 2016 at 5:10 AM, Jatin Kumar < jku...@rocketfuelinc.com.invalid> wrote: > You will need to do a collect and update a global map if you want to. > > myDStream.map(record => (record._2, (record._3, record_4, record._5))

Re: Stream group by

2016-02-21 Thread Jatin Kumar
You will need to do a collect and update a global map if you want to. myDStream.map(record => (record._2, (record._3, record_4, record._5)) .groupByKey((r1, r2) => (r1._1 + r2._1, r1._2 + r2._2, r1._3 + r2._3)) .foreachRDD(rdd => { rdd.collect().foreach((fileName,

Re: Stream group by

2016-02-21 Thread Vinti Maheshwari
Nevermind, seems like an executor level mutable map is not recommended as stated in http://blog.guillaume-pitel.fr/2015/06/spark-trick-shot-node-centered-aggregator/ On Sun, Feb 21, 2016 at 9:54 AM, Vinti Maheshwari wrote: > Thanks for your reply Jatin. I changed my

Re: Stream group by

2016-02-21 Thread Vinti Maheshwari
Thanks for your reply Jatin. I changed my parsing logic to what you suggested: def parseCoverageLine(str: String) = { val arr = str.split(",") ... ... (arr(0), arr(1) :: count.toList) // (test, [file, 1, 1, 2]) } Then in the grouping, can i use a global hash map

Re: Stream group by

2016-02-21 Thread Jatin Kumar
Hello Vinti, One way to get this done is you split your input line into key and value tuple and then you can simply use groupByKey and handle the values the way you want. For example: Assuming you have already split the values into a 5 tuple: myDStream.map(record => (record._2, (record._3,

Stream group by

2016-02-21 Thread Vinti Maheshwari
Hello, I have input lines like below *Input* t1, file1, 1, 1, 1 t1, file1, 1, 2, 3 t1, file2, 2, 2, 2, 2 t2, file1, 5, 5, 5 t2, file2, 1, 1, 2, 2 and i want to achieve the output like below rows which is a vertical addition of the corresponding numbers. *Output* “file1” : [ 1+1+5, 1+2+5, 1+3+5