Hi all,

I would like to introduce the idea of duplicate insensitive aggregate
functions.

For such functions, the aggregation results remain the same even after
deduplication.

For example, given a sequence of data {1, 1, 2, 2, 3, 5, 5}, the
aggregation results of MIN are the same regardless of whether we perform
data deduplication first. That is,

MIN({1, 1, 2, 2, 3, 5, 5}) = MIN({1, 2, 3, 5})

So MIN is a *deduplicate insensitive function*.

On the other hand, function SUM is not duplicate insensitive, because

 SUM({1, 1, 2, 2, 3, 5, 5}) != SUM({1, 2, 3, 5})

The concept of deduplicate insensitiveness can help us in many optimization
scenarios.

For example, the curent implementation of AggregateMergeRule rules out any
aggregate calls for which the isDistict() method returns true. However, for
duplicate insensitive functions, the rule should be applicable.

Could you please give your valuable feedback?

Best,
Liya Fan

Reply via email to