Re: How to add a column to a spark RDD with many columns?
Thanks for your reply! It is what I am after. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729p22740.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to add a column to a spark RDD with many columns?
val newRdd = myRdd.map(row => row ++ Array((row(1).toLong * row(199).toLong).toString)) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729p22735.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to add a column to a spark RDD with many columns?
You have rdd or dataframe? Rdds are kind of tuples. You can add a new column to it by a map. rdd s are immutable, so you will get another rdd. On 1 May 2015 14:59, "Carter" wrote: > Hi all, > > I have a RDD with *MANY *columns (e.g., *hundreds*), how do I add one more > column at the end of this RDD? > > For example, if my RDD is like below: > > 123, 523, 534, ..., 893 > 536, 98, 1623, ..., 98472 > 537, 89, 83640, ..., 9265 > 7297, 98364, 9, ..., 735 > .. > 29, 94, 956, ..., 758 > > how can I efficiently add a column to it, whose value is the sum of the 2nd > and the 200th columns? > > Thank you very much. > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
How to add a column to a spark RDD with many columns?
Hi all, I have a RDD with *MANY *columns (e.g., *hundreds*), how do I add one more column at the end of this RDD? For example, if my RDD is like below: 123, 523, 534, ..., 893 536, 98, 1623, ..., 98472 537, 89, 83640, ..., 9265 7297, 98364, 9, ..., 735 .. 29, 94, 956, ..., 758 how can I efficiently add a column to it, whose value is the sum of the 2nd and the 200th columns? Thank you very much. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org