Re: help plz! how to use zipWithIndex to each subset of a RDD
Hi @rok, thanks I got it -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/help-plz-how-to-use-zipWithIndex-to-each-subset-of-a-RDD-tp24071p24080.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: help plz! how to use zipWithIndex to each subset of a RDD
zipWithIndex gives you global indices, which is not what you want. You'll want to use flatMap with a map function that iterates through each iterable and returns the (String, Int, String) tuple for each element. On Thu, Jul 30, 2015 at 4:13 AM, askformore [via Apache Spark User List] ml-node+s1001560n24071...@n3.nabble.com wrote: I have some data like this: RDD[(String, String)] = ((*key-1*, a), ( *key-1*,b), (*key-2*,a), (*key-2*,c),(*key-3*,b),(*key-4*,d)) and I want to group the data by Key, and for each group, add index fields to the groupmember, at last I can transform the data to below : RDD[(String, *Int*, String)] = ((key-1,*1*, a), (key-1,*2,*b), (key-2,*1*,a), (key-2, *2*,b),(key-3,*1*,b),(key-4,*1*,d)) I tried to groupByKey firstly, then I got a RDD[(String, Iterable[String])], but I don't know how to use zipWithIndex function to each Iterable... thanks. -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/help-plz-how-to-use-zipWithIndex-to-each-subset-of-a-RDD-tp24071.html To start a new topic under Apache Spark User List, email ml-node+s1001560n1...@n3.nabble.com To unsubscribe from Apache Spark User List, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=cm9rcm9za2FyQGdtYWlsLmNvbXwxfC0xNDM4OTI3NjU3 . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/help-plz-how-to-use-zipWithIndex-to-each-subset-of-a-RDD-tp24071p24074.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: help plz! how to use zipWithIndex to each subset of a RDD
This may be what you want val conf = new SparkConf().setMaster(local).setAppName(test) val sc = new SparkContext(conf) val inputRdd = sc.parallelize(Array((key_1, a), (key_1,b), (key_2,c), (key_2, d))) val result = inputRdd.groupByKey().flatMap(e={ val key= e._1 val valuesWithIndex = e._2.zipWithIndex valuesWithIndex.map(value = (key, value._2, value._1)) }) result.collect() foreach println /// output *(key_2,0,c) (key_2,1,d) (key_1,0,a) (key_1,1,b)* On Thu, Jul 30, 2015 at 10:19 AM, ayan guha guha.a...@gmail.com wrote: Is there a relationship between data and index? I.e with a,b,c to 1,2,3? On 30 Jul 2015 12:13, askformore askf0rm...@163.com wrote: I have some data like this: RDD[(String, String)] = ((*key-1*, a), ( *key-1*,b), (*key-2*,a), (*key-2*,c),(*key-3*,b),(*key-4*,d)) and I want to group the data by Key, and for each group, add index fields to the groupmember, at last I can transform the data to below : RDD[(String, *Int*, String)] = ((key-1,*1*, a), (key-1,*2,*b), (key-2,*1*,a), (key-2, *2*,b),(key-3,*1*,b),(key-4,*1*,d)) I tried to groupByKey firstly, then I got a RDD[(String, Iterable[String])], but I don't know how to use zipWithIndex function to each Iterable... thanks. -- View this message in context: help plz! how to use zipWithIndex to each subset of a RDD http://apache-spark-user-list.1001560.n3.nabble.com/help-plz-how-to-use-zipWithIndex-to-each-subset-of-a-RDD-tp24071.html Sent from the Apache Spark User List mailing list archive http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com. -- Best Regards Jeff Zhang
help plz! how to use zipWithIndex to each subset of a RDD
I have some data like this:RDD[(String, String)] = ((*key-1*, a), (*key-1*,b), (*key-2*,a), (*key-2*,c),(*key-3*,b),(*key-4*,d))and I want to group the data by Key, and for each group, add index fields to the groupmember, at last I can transform the data to below : RDD[(String, *Int*, String)] = ((key-1,*1*, a), (key-1,*2,*b), (key-2,*1*,a), (key-2,*2*,b),(key-3,*1*,b),(key-4,*1*,d))I tried to groupByKey firstly, then I got a RDD[(String, Iterable[String])], but I don't know how to use zipWithIndex function to each Iterable...thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/help-plz-how-to-use-zipWithIndex-to-each-subset-of-a-RDD-tp24071.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: help plz! how to use zipWithIndex to each subset of a RDD
Is there a relationship between data and index? I.e with a,b,c to 1,2,3? On 30 Jul 2015 12:13, askformore askf0rm...@163.com wrote: I have some data like this: RDD[(String, String)] = ((*key-1*, a), ( *key-1*,b), (*key-2*,a), (*key-2*,c),(*key-3*,b),(*key-4*,d)) and I want to group the data by Key, and for each group, add index fields to the groupmember, at last I can transform the data to below : RDD[(String, *Int*, String)] = ((key-1,*1*, a), (key-1,*2,*b), (key-2,*1*,a), (key-2, *2*,b),(key-3,*1*,b),(key-4,*1*,d)) I tried to groupByKey firstly, then I got a RDD[(String, Iterable[String])], but I don't know how to use zipWithIndex function to each Iterable... thanks. -- View this message in context: help plz! how to use zipWithIndex to each subset of a RDD http://apache-spark-user-list.1001560.n3.nabble.com/help-plz-how-to-use-zipWithIndex-to-each-subset-of-a-RDD-tp24071.html Sent from the Apache Spark User List mailing list archive http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.