Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Punit Naik
Actually I was writing a code for the Connected Components algorithm. In that I have to keep track of a variable called vertex number which keeps on getting incremented based on the number of triples it encounters in a line. This variable should be available at all the nodes and all the

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Ted Yu
Since the data.length is variable, I am not sure whether mixing data.length and the index makes sense. Can you describe your use case in bit more detail ? Thanks On Tue, Jun 28, 2016 at 11:34 AM, Punit Naik wrote: > Hi Ted > > So would the tuple look like: (x._1,

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Punit Naik
Hi Ted So would the tuple look like: (x._1, split.startIndex + x._2 + x._1.length) ? On Tue, Jun 28, 2016 at 11:09 PM, Ted Yu wrote: > Please take a look at: > core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala > > In compute() method: > val split =

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Ted Yu
Please take a look at: core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala In compute() method: val split = splitIn.asInstanceOf[ZippedWithIndexRDDPartition] firstParent[T].iterator(split.prev, context).zipWithIndex.map { x => (x._1, split.startIndex + x._2) You can

Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Punit Naik
Hi I wanted to change the functioning of the "zipWithIndex" function for spark RDDs in which the output of the function is, just for an example, "(data, prev_index+data.length)" instead of "(data,prev_index+1)". How can I do this? -- Thank You Regards Punit Naik