Am 22.01.2014 16:20, schrieb Paul Sandoz:
On Jan 21, 2014, at 11:05 PM, Xueming Shen <xueming.s...@oracle.com> wrote:
On 01/20/2014 09:24 AM, Paul Sandoz wrote:
- it would be nice to get rid of the pseudo goto using the "scan" labelled
block.
webrev has been updated to remove the pseudo goto by checking the "first"
against
"len" after the loop break.
Much for readable :-)
I think, you should compare the performance of both versions on modern + 32-bit
CPUs.
- you might be able to optimize by doing (could depend on the answer to the
next point):
int c = (int)value[i];
int lc = Character.toLowerCase(c);
if (.....) { result[i] = (char)lc; } else { return toLowerCaseEx(result, i,
locale, localeDependent); }
- Do you need to check ERROR for the result of toLowerCase?
2586 if (c == Character.ERROR ||
Yes, Character.toLowerCase() should never return ERROR (while the package
private
Character.toUpperCaseEx() will). In theory there is no need to check if the
return
value of Character.toUpperCase(int) > min_supplementary_code_point in our loop,
because there is no bmp character returns a supplementary code point as its
lower
case. But since it's a data driven mapping table, there is no guarantee the
unicode
data table is not going to change in the "future", so I still keep the check.
In my opinion this check should be subject of JDK's build test, but not of
runtime code.
or:
int c = (int)value[i];
int lc = Character.toLowerCase(c); // is that safe?
if (c < '\u03A3' || (c < Character.MIN_HIGH_SURROGATE && c != 'u03A3' && lc
< Character.MIN_SUPPLEMENTARY_CODE_POINT))) {
result[i] = (char)lc;
} else {
return toLowerCaseEx(result, i, locale, localeDependent);
}
FWIW i personally find those solutions easier to read, if they are safe w.r.t.
Character.toLowerCase and that annoying greek character.
I would like the 3rd version.
-Ulf