jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r421006133



##########
File path: 
processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);
+      sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+      final Hasher hasher = Hashing.sha256().newHasher();

Review comment:
       The issue with storing all `values` in the cacheKey is that the cache 
key can become huge as the size of `values` grows which can result in pollution 
in the cache. 
   
   Regarding using SHA-256, I believe academical research is more trustable 
than my testing. [According to the probability table, the required number of 
hashed elements such that probability of at least one hash collision >= 10^−18 
is 4.8×10^29 under assumption that the hash function is 
perfect](https://en.wikipedia.org/wiki/Birthday_problem). This is negligible 
when it comes to practice even though the SHA-256 hash function is not proven 
to be perfect.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to