flyrain commented on a change in pull request #3530:
URL: https://github.com/apache/iceberg/pull/3530#discussion_r749586696



##########
File path: core/src/main/java/org/apache/iceberg/deletes/Deletes.java
##########
@@ -261,7 +258,11 @@ public void close() {
 
     @Override
     protected boolean shouldKeep(T posDelete) {
-      return CHARSEQ_COMPARATOR.compare(dataLocation, (CharSequence) 
FILENAME_ACCESSOR.get(posDelete)) == 0;
+      return charSeqEquals(dataLocation, (CharSequence) 
FILENAME_ACCESSOR.get(posDelete));
+    }
+
+    private boolean charSeqEquals(CharSequence s1, CharSequence s2) {
+      return s1 == s2 || (s1.length() == s2.length() && 
s1.toString().contentEquals(s2));

Review comment:
       @rdblue, thanks for the review. My[ first 
version](https://github.com/apache/iceberg/pull/3530/commits/b880b2c9e1258dc6382eb0c187c663f25b544c83)
 is almost exactly be this way. 
   
   Then I found string `equals` is much faster than not only the original 
CharComparator, but also my first solution. It is 2 times faster when it come 
to compare 2 different strings with the same length. It is also way faster to 
know if two string are identical(see my perf test 1). My first solution has the 
similar perf with CharComparator since it has to go through each char. This is 
really interesting, which I couldn't find any clue from the source code of 
string implementation. It is probably due to some hash comparison internally. I 
was thinking if we can assume file path is a string since Iceberg lib control 
the write of delete files.
   
   Of course we need to stick to this one if the assumption isn't right since 
`toString` may be expensive. Any thought?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to