Hi Ivan,

Thanks for bringing this up again.

Some initial comments, not a full review.

Though the enhancement says that no consideration is given to sign characters they may produce puzzling results for strings like "-2000" and "-2001" where the strings appear to be signed numbers but the comparison will be for the "-" prefix and the rest unsigned. Including the word 'unsigned' in the description might help reinforce the semantics.

Also, I don't see test cases for strings like: "abc200123" and "abc20000123" which share a prefix part of which is numeric but differ in the number of "leading" zeros after the prefix.

What about strings that include more than 1 numeric segment: abc2003def0001 and abc02003def001
in the leading zero case?

Though the test adds the @key randomness, it would be useful to use the test library mechanisms to report the seed and be able to run the test with a seed, if case they fail.
(As might be provided by the test library jdk.test.lib.RandomFactory).


Comparator.java:
576: "the common prefix is skipped" is problematic when considering leading zeros.
   The common prefix may contain non-zero digits.
585: it is not clear whether the "different number of leading zeros" is applied regardless
   of the common prefix or only starting after the common prefix.

550, 586: for many readers, it is easier to read 'for example' than "e.g." or "i.e.".

562, 598:  editorial: "is to compare" -> "compares"

Comparators.java:

192: @param for param o is missing; (the param name "o" usually refers to an object, not a string). 200: Can skipLeadingZeros be coded to correctly work if cnt == 0; it would be more robust SkipLeading zeros works correctly only if pos is given the first numeric digit in the subsequence
     so the numStart1 and numStart2 must be first digit in each string.

compare():

Line 223: if strings typically have non-numeric prefixes, it might perform slightly better to detect the end of the common prefix and then scan back to find the beginning of the numeric part.
(Avoiding checking every char for isDigit).

Line 224: If assigned for every digit, it will hold the last equal digit in the common prefix, not the first digit.

239, 240: chars at o1(index) and o2(index) are already known to be digits, can it be avoided checking them twice
by starting at index+1?

$.02, Roger


On 7/19/2017 4:41 AM, Ivan Gerasimov wrote:
Hello!

It is a proposal to provide a String comparator, which will pay attention to the numbers embedded into the strings (should they present).

This proposal was initially discussed back in 2014 and seemed to bring some interest from the community: http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-December/030343.html

In the latest webrev two methods are added to the public API:
j.u.Comparator.comparingNumerically() and
j.u.Comparator.comparingNumericallyLeadingZerosAhead().

The regression test is extended to exercise this new comparator.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512
WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/

Comments, suggestions are very welcome!


Reply via email to