Pinging again on this topic.
Is there an easy way to select TopN in a RelationalGroupedDataset?
Basically in the below example dataSet.groupBy("Column1").agg(udaf("Column2",
"Column3") returns a RelationalGroupedDataset. One way to address the data
skew would be to reduce the data per key
Trying again. Hoping to find some help in figuring out the performance
bottleneck we are observing.
Thanks,
Bharath
On Sun, Oct 30, 2016 at 11:58 AM, Spark User
wrote:
> Hi All,
>
> I have a UDAF that seems to perform poorly when its input is skewed. I
> have been
Hi All,
I have a UDAF that seems to perform poorly when its input is skewed. I have
been debugging the UDAF implementation but I don't see any code that is
causing the performance to degrade. More details on the data and the
experiments I have run.
DataSet: Assume 3 columns, column1 being the