Hello ! Can both methods be compare in term of performance ? Tried the pull request and it felt slow compare to manual mapping.
Cheers, Jonathan On Mon, Jul 27, 2015, 8:51 PM Reynold Xin <r...@databricks.com> wrote: > There is this pull request: https://github.com/apache/spark/pull/5713 > > We mean to merge it for 1.5. Maybe you can help review it too? > > On Mon, Jul 27, 2015 at 11:23 AM, Vyacheslav Baranov < > slavik.bara...@gmail.com> wrote: > >> Hi all, >> >> For now it's possible to convert RDD of case class to DataFrame: >> >> case class Person(name: String, age: Int) >> >> val people: RDD[Person] = ... >> val df = sqlContext.createDataFrame(people) >> >> but backward conversion is not possible with existing API, so currently >> code looks like this (example from documentation): >> >> teenagers.map(t => "Name: " + t.getAs[String]("name")) >> >> whereas it would be much more convenient to use RDD of case class: >> >> teenagers.rdd[Person].map("Name: " + _.name) >> >> >> I've implemented proof of concept library that allows to convert >> DataFrame to typed RDD with "Pimp my library" pattern. It adds some >> typesafety (conversion fails before running distributed operation if some >> fields have incompatible types) and it's much more convenient when working >> with nested rows, for example: >> >> case class Room(number: Int, visitors: Seq[Person]) >> >> roomsDf.explode[Seq[Row], Person]("visitors", >> "visitor")(_.map(rowToPerson)) >> >> Would the community be interested in having this functionality in core? >> >> Regards, >> Vyacheslav >> >> >