Re: Performance bug in UDAF?

2017-02-09 Thread Spark User
Pinging again on this topic. Is there an easy way to select TopN in a RelationalGroupedDataset? Basically in the below example dataSet.groupBy("Column1").agg(udaf("Column2", "Column3") returns a RelationalGroupedDataset. One way to address the data skew would be to reduce the data per key

Re: Performance bug in UDAF?

2016-10-31 Thread Spark User
Trying again. Hoping to find some help in figuring out the performance bottleneck we are observing. Thanks, Bharath On Sun, Oct 30, 2016 at 11:58 AM, Spark User wrote: > Hi All, > > I have a UDAF that seems to perform poorly when its input is skewed. I > have been

Performance bug in UDAF?

2016-10-30 Thread Spark User
Hi All, I have a UDAF that seems to perform poorly when its input is skewed. I have been debugging the UDAF implementation but I don't see any code that is causing the performance to degrade. More details on the data and the experiments I have run. DataSet: Assume 3 columns, column1 being the