In 1.4, you can do

row.getInt("colName")

In 1.5, some variant of this will come to allow you to turn a DataFrame
into a typed RDD, where the case class's field names match the column
names. https://github.com/apache/spark/pull/5713



On Fri, May 8, 2015 at 11:01 AM, Will Benton <wi...@redhat.com> wrote:

> This might not be the easiest way, but it's pretty easy:  you can use
> Row(field_1, ..., field_n) as a pattern in a case match.  So if you have a
> data frame with foo as an int column and bar as a String columns and you
> want to construct instances of a case class that wraps these up, you can do
> something like this:
>
>     // assuming Record is declared as case class Record(foo: Int, bar:
> String)
>     // and df is a data frame
>
>     df.map {
>       case Row(foo: Int, bar: String) => Record(foo, bar)
>     }
>
>
>
> best,
> wb
>
>
> ----- Original Message -----
> > From: "Alexander Ulanov" <alexander.ula...@hp.com>
> > To: dev@spark.apache.org
> > Sent: Friday, May 8, 2015 11:50:53 AM
> > Subject: Easy way to convert Row back to case class
> >
> > Hi,
> >
> > I created a dataset RDD[MyCaseClass], converted it to DataFrame and
> saved to
> > Parquet file, following
> >
> https://spark.apache.org/docs/latest/sql-programming-guide.html#interoperating-with-rdds
> >
> > When I load this dataset with sqlContext.parquetFile, I get DataFrame
> with
> > column names as in initial case class. I want to convert this DataFrame
> to
> > RDD to perform RDD operations. However, when I convert it I get RDD[Row]
> and
> > all information about row names gets lost. Could you suggest an easy way
> to
> > convert DataFrame to RDD[MyCaseClass]?
> >
> > Best regards, Alexander
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to