Unfortunately, I don't think there is an easy way to do this in 1.6. In Spark 2.0 we will make DataFrame = Dataset[Row], so this should work out of the box.
On Mon, Apr 25, 2016 at 11:08 PM, Brandon White <bwwintheho...@gmail.com> wrote: > I am creating a dataFrame from parquet files. The schema is based on the > parquet files, I do not know it before hand. What I want to do is group the > entire DF into buckets based on a column. > > val df = sqlContext.read.parquet("/path/to/files") > val groupedBuckets: DataFrame[String, Array[Rows]] = > df.groupBy($"columnName") > > I know this does not work because the DataFrame's groupBy is only used for > aggregate functions. I cannot convert my DataFrame to a DataSet because I > do not have a case class for the DataSet schema. The only thing I can do is > convert the df to an RDD[Rows] and try to deal with the types. This is ugly > and difficult. > > Is there any better way? Can I convert a DataFrame to a DataSet without a > predefined case class? > > Brandon >