RDD has methods to zip with another RDD or with an index, but there's no
equivalent for data frames. Anyone know a good way to do this?

I thought I could just convert to RDD, do the zip, and then convert back,
but ...

   1. I don't see a way (outside developer API) to convert RDD[Row]
   directly back to DataFrame. Is there really no way to do this?
   2. I don't see any way to modify Row objects or create new rows with
   additional columns. In other words, no way to convert RDD[(Row, Row)] to
   RDD[Row]

It seems the only way to get what I want is to extract out the data into a
case class and then convert back to a data frame. Did I miss something?

Reply via email to