approx_count_distinct in spark always return 1

marc nicole Thu, 02 Jun 2022 06:13:28 -0700

I have a dataset where i want to count distinct values for column based a
group of others, i do it like so,


processedDataset = processedDataset.withColumn("freq",
approx_count_distinct("col1").over(Window.partitionBy(groupCols.toArray(new
Column[groupCols.size()]))));


but even when i have duplicate column values i still get 1 at the "freq"
column,

Also when i specify the rsd param to be 0 then i get arrayIndexOutOfBounds
kind of error.
Why?

approx_count_distinct in spark always return 1

Reply via email to