dawidwys commented on PR #23411: URL: https://github.com/apache/flink/pull/23411#issuecomment-1892398690
https://github.com/apache/flink/pull/23411#pullrequestreview-1770150649 It took me a while to understand that myself, but I think it's actually ok to depend on the `Object#equals/hashcode` in `AggregateFunctions` @snuyanzin what you said is correct for `ArrayDistinctFunction`, because: 1. it is a scalar function, so it can be chained with other functions without an exchange before it 2. It gets data from two input(s) (previous operator and e.g. a literal) haystack and needle may come from different operators which may produce data in different formats e.g. `GenericRowData` and `BinaryRowData` In `AggregateFunctions` we will always have all records as `BinaryRowData` (and alike) and those `equals/hashcode` should work just fine. (It may be a different story when we support `STRUCTURED_TYPE#equals/hashcode`, but we will need to revisit most of the operators then, because all `MapView(s)` will work incorrectly). This is what we do in other aggregate functions already. Take a look at e.g. `JsonArrayAggFunction` or `FirstValueWithRetractAggFunction ` I haven't seen the previous version @Jiabao-Sun , but I believe we can remove the `equalityHandler` and make the `ArrayAggFunction` look similar to `JsonArrayAggFunction`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org