flyrain commented on a change in pull request #3530: URL: https://github.com/apache/iceberg/pull/3530#discussion_r749586696
########## File path: core/src/main/java/org/apache/iceberg/deletes/Deletes.java ########## @@ -261,7 +258,11 @@ public void close() { @Override protected boolean shouldKeep(T posDelete) { - return CHARSEQ_COMPARATOR.compare(dataLocation, (CharSequence) FILENAME_ACCESSOR.get(posDelete)) == 0; + return charSeqEquals(dataLocation, (CharSequence) FILENAME_ACCESSOR.get(posDelete)); + } + + private boolean charSeqEquals(CharSequence s1, CharSequence s2) { + return s1 == s2 || (s1.length() == s2.length() && s1.toString().contentEquals(s2)); Review comment: @rdblue, thanks for the review. My[ first version](https://github.com/apache/iceberg/pull/3530/commits/b880b2c9e1258dc6382eb0c187c663f25b544c83) is almost exactly be this way. Then I found string `equals` is much faster than not only the original CharComparator, but also my first solution. It is 2 times faster when it come to compare 2 different strings with the same length. It is also way faster to know if two string are identical(see my perf test 1). My first solution has the similar perf with CharComparator since it has to go through each char. This is really interesting, which I couldn't find any clue from the source code of string implementation. It is probably due to some hash comparison internally. I was thinking if we can assume file path is a string since Iceberg lib control the write of delete files. Of course we need to stick to this one if the assumption isn't right since `toString` may be expensive. Any thought? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org