Hello,

I have input lines like below

*Input*
t1, file1, 1, 1, 1
t1, file1, 1, 2, 3
t1, file2, 2, 2, 2, 2
t2, file1, 5, 5, 5
t2, file2, 1, 1, 2, 2

and i want to achieve the output like below rows which is a vertical
addition of the corresponding numbers.

*Output*
“file1” : [ 1+1+5, 1+2+5, 1+3+5 ]
“file2” : [ 2+1, 2+1, 2+2, 2+2 ]

I am in a spark streaming context and i am having a hard time trying to
figure out the way to group by file name.

It seems like i will need to use something like below, i am not sure how to
get to the correct syntax. Any inputs will be helpful.

myDStream.foreachRDD(rdd => rdd.groupBy())

I know how to do the vertical sum of array of given numbers, but i am not
sure how to feed that function to the group by.

  def compute_counters(counts : ArrayBuffer[List[Int]]) = {
      counts.toList.transpose.map(_.sum)
  }

~Thanks,
Vinti

Reply via email to