>
> Yeah, i tried with reduceByKey, was able to do it. Thanks Ayan and Jatin.
> For reference, final solution:
>
> def main(args: Array[String]): Unit = {
> val conf = new SparkConf().setAppName("HBaseStream")
> val sc = new SparkContext(conf)
> // create a StreamingContext, the main
Yeah, i tried with reduceByKey, was able to do it. Thanks Ayan and Jatin.
For reference, final solution:
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("HBaseStream")
val sc = new SparkContext(conf)
// create a StreamingContext, the main entry point for
I believe the best way would be to use reduceByKey operation.
On Mon, Feb 22, 2016 at 5:10 AM, Jatin Kumar <
jku...@rocketfuelinc.com.invalid> wrote:
> You will need to do a collect and update a global map if you want to.
>
> myDStream.map(record => (record._2, (record._3, record_4, record._5))
You will need to do a collect and update a global map if you want to.
myDStream.map(record => (record._2, (record._3, record_4, record._5))
.groupByKey((r1, r2) => (r1._1 + r2._1, r1._2 + r2._2, r1._3 +
r2._3))
.foreachRDD(rdd => {
rdd.collect().foreach((fileName,
Nevermind, seems like an executor level mutable map is not recommended as
stated in
http://blog.guillaume-pitel.fr/2015/06/spark-trick-shot-node-centered-aggregator/
On Sun, Feb 21, 2016 at 9:54 AM, Vinti Maheshwari
wrote:
> Thanks for your reply Jatin. I changed my
Thanks for your reply Jatin. I changed my parsing logic to what you
suggested:
def parseCoverageLine(str: String) = {
val arr = str.split(",")
...
...
(arr(0), arr(1) :: count.toList) // (test, [file, 1, 1, 2])
}
Then in the grouping, can i use a global hash map
Hello Vinti,
One way to get this done is you split your input line into key and value
tuple and then you can simply use groupByKey and handle the values the way
you want. For example:
Assuming you have already split the values into a 5 tuple:
myDStream.map(record => (record._2, (record._3,
Hello,
I have input lines like below
*Input*
t1, file1, 1, 1, 1
t1, file1, 1, 2, 3
t1, file2, 2, 2, 2, 2
t2, file1, 5, 5, 5
t2, file2, 1, 1, 2, 2
and i want to achieve the output like below rows which is a vertical
addition of the corresponding numbers.
*Output*
“file1” : [ 1+1+5, 1+2+5, 1+3+5