I have a dataset where i want to count distinct values for column based a group of others, i do it like so,
processedDataset = processedDataset.withColumn("freq", approx_count_distinct("col1").over(Window.partitionBy(groupCols.toArray(new Column[groupCols.size()])))); but even when i have duplicate column values i still get 1 at the "freq" column, Also when i specify the rsd param to be 0 then i get arrayIndexOutOfBounds kind of error. Why?