jihoonson commented on a change in pull request #9800: URL: https://github.com/apache/druid/pull/9800#discussion_r421006133
########## File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java ########## @@ -132,31 +145,40 @@ public FilterTuning getFilterTuning() @Override public byte[] getCacheKey() { - boolean hasNull = false; - for (String value : values) { - if (value == null) { - hasNull = true; - break; + if (cacheKey == null) { + final List<String> sortedValues = new ArrayList<>(values); + sortedValues.sort(Comparator.nullsFirst(Ordering.natural())); + final Hasher hasher = Hashing.sha256().newHasher(); Review comment: The issue with storing all `values` in the cacheKey is that the cache key can become huge as the size of `values` grows which can result in pollution in the cache. Regarding using SHA-256, I believe academical research is more trustable than my testing. [According to the probability table, the required number of hashed elements such that probability of at least one hash collision >= 10^−18 is 4.8×10^29 under assumption that the hash function is perfect](https://en.wikipedia.org/wiki/Birthday_problem). This is negligible when it comes to practice even though the SHA-256 hash function is not proven to be perfect. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org