I have a dataset where i want to count distinct values for column based a
group of others, i do it like so,

processedDataset = processedDataset.withColumn("freq",
approx_count_distinct("col1").over(Window.partitionBy(groupCols.toArray(new
Column[groupCols.size()]))));


but even when i have duplicate column values i still get 1 at the "freq"
column,

Also when i specify the rsd param to be 0 then i get arrayIndexOutOfBounds
kind of error.
Why?

Reply via email to