Re: How can I bucketize / group a DataFrame from parquet files?

Michael Armbrust Wed, 27 Apr 2016 12:18:12 -0700

Unfortunately, I don't think there is an easy way to do this in 1.6.  In
Spark 2.0 we will make DataFrame = Dataset[Row], so this should work out of
the box.


On Mon, Apr 25, 2016 at 11:08 PM, Brandon White <bwwintheho...@gmail.com>
wrote:

> I am creating a dataFrame from parquet files. The schema is based on the
> parquet files, I do not know it before hand. What I want to do is group the
> entire DF into buckets based on a column.
>
> val df = sqlContext.read.parquet("/path/to/files")
> val groupedBuckets: DataFrame[String, Array[Rows]] =
> df.groupBy($"columnName")
>
> I know this does not work because the DataFrame's groupBy is only used for
> aggregate functions. I cannot convert my DataFrame to a DataSet because I
> do not have a case class for the DataSet schema. The only thing I can do is
> convert the df to an RDD[Rows] and try to deal with the types. This is ugly
> and difficult.
>
> Is there any better way? Can I convert a DataFrame to a DataSet without a
> predefined case class?
>
> Brandon
>

Re: How can I bucketize / group a DataFrame from parquet files?

Reply via email to