Re: Easy way to convert Row back to case class

Will Benton Fri, 08 May 2015 11:02:17 -0700

This might not be the easiest way, but it's pretty easy:  you can use 
Row(field_1, ..., field_n) as a pattern in a case match.  So if you have a data 
frame with foo as an int column and bar as a String columns and you want to 
construct instances of a case class that wraps these up, you can do something 
like this:


    // assuming Record is declared as case class Record(foo: Int, bar: String)
    // and df is a data frame

    df.map {
      case Row(foo: Int, bar: String) => Record(foo, bar)
    }



best,
wb


----- Original Message -----
> From: "Alexander Ulanov" <[email protected]>
> To: [email protected]
> Sent: Friday, May 8, 2015 11:50:53 AM
> Subject: Easy way to convert Row back to case class
> 
> Hi,
> 
> I created a dataset RDD[MyCaseClass], converted it to DataFrame and saved to
> Parquet file, following
> https://spark.apache.org/docs/latest/sql-programming-guide.html#interoperating-with-rdds
> 
> When I load this dataset with sqlContext.parquetFile, I get DataFrame with
> column names as in initial case class. I want to convert this DataFrame to
> RDD to perform RDD operations. However, when I convert it I get RDD[Row] and
> all information about row names gets lost. Could you suggest an easy way to
> convert DataFrame to RDD[MyCaseClass]?
> 
> Best regards, Alexander
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Easy way to convert Row back to case class

Reply via email to