Unfortunately, I don't think there is an easy way to do this in 1.6. In
Spark 2.0 we will make DataFrame = Dataset[Row], so this should work out of
the box.
On Mon, Apr 25, 2016 at 11:08 PM, Brandon White
wrote:
> I am creating a dataFrame from parquet files. The
I am creating a dataFrame from parquet files. The schema is based on the
parquet files, I do not know it before hand. What I want to do is group the
entire DF into buckets based on a column.
val df = sqlContext.read.parquet("/path/to/files")
val groupedBuckets: DataFrame[String, Array[Rows]] =