subject:"How can I bucketize \/ group a DataFrame from parquet files\?"

Re: How can I bucketize / group a DataFrame from parquet files?

2016-04-27 Thread Michael Armbrust

Unfortunately, I don't think there is an easy way to do this in 1.6. In Spark 2.0 we will make DataFrame = Dataset[Row], so this should work out of the box. On Mon, Apr 25, 2016 at 11:08 PM, Brandon White wrote: > I am creating a dataFrame from parquet files. The

How can I bucketize / group a DataFrame from parquet files?

2016-04-26 Thread Brandon White

I am creating a dataFrame from parquet files. The schema is based on the parquet files, I do not know it before hand. What I want to do is group the entire DF into buckets based on a column. val df = sqlContext.read.parquet("/path/to/files") val groupedBuckets: DataFrame[String, Array[Rows]] =