On Jan 21, 2014, at 11:05 PM, Xueming Shen <xueming.s...@oracle.com> wrote:
> Hi Paul, thanks for reviewing the changeset, comment inlined below. > > On 01/20/2014 09:24 AM, Paul Sandoz wrote: >> >> Some quick comments. >> >> In String.toLowerCase: >> >> - it would be nice to get rid of the pseudo goto using the "scan" labelled >> block. >> > > webrev has been updated to remove the pseudo goto by checking the "first" > against > "len" after the loop break. > Much for readable :-) >> - there is no need for the "localeDependent" local variable. >> > > I just tried to not throw away the result of this "localeDependent", which is > still needed > in the toXCaseEx() methods. > if (lang == "tr" || lang == "az" || lang == "lt") { // local dependent return toLowerCaseEx(result, firstUpper, locale, true); } // otherwise false is passed to subsequent calls to toLowerCaseEx ? >> - you might be able to optimize by doing (could depend on the answer to the >> next point): >> >> int c = (int)value[i]; >> int lc = Character.toLowerCase(c); >> if (.....) { result[i] = (char)lc; } else { return toLowerCaseEx(result, i, >> locale, localeDependent); } >> >> - Do you need to check ERROR for the result of toLowerCase? >> >> 2586 if (c == Character.ERROR || >> > > Yes, Character.toLowerCase() should never return ERROR (while the package > private > Character.toUpperCaseEx() will). In theory there is no need to check if the > return > value of Character.toUpperCase(int) > min_supplementary_code_point in our > loop, > because there is no bmp character returns a supplementary code point as its > lower > case. But since it's a data driven mapping table, there is no guarantee the > unicode > data table is not going to change in the "future", so I still keep the check. > The webrev > has been updated to use one single "if" inside the loop. > > I have a "single if" implementation for the toUpperCase() as well, though > don't sure > which one is better/faster :-) > OK, i suppose one could measure :-) Not sure how much it is worth obsessing over but... int c = (int)value[i]; if (c < '\u03A3') { result[i] = (char)Character.toLowerCase(c); // Is that safe? } else if (c < Character.MIN_HIGH_SURROGATE && c != 'u03A3' && (c = Character.toLowerCase(c)) < Character.MIN_SUPPLEMENTARY_CODE_POINT)) { result[i] = (char)c; } else { return toLowerCaseEx(result, i, locale, localeDependent); } or: int c = (int)value[i]; int lc = Character.toLowerCase(c); // is that safe? if (c < '\u03A3') { result[i] = (char)lc; } else if (c < Character.MIN_HIGH_SURROGATE && c != 'u03A3' && lc < Character.MIN_SUPPLEMENTARY_CODE_POINT)) { result[i] = (char)lc; } else { return toLowerCaseEx(result, i, locale, localeDependent); } or: int c = (int)value[i]; int lc = Character.toLowerCase(c); // is that safe? if (c < '\u03A3' || (c < Character.MIN_HIGH_SURROGATE && c != 'u03A3' && lc < Character.MIN_SUPPLEMENTARY_CODE_POINT))) { result[i] = (char)lc; } else { return toLowerCaseEx(result, i, locale, localeDependent); } FWIW i personally find those solutions easier to read, if they are safe w.r.t. Character.toLowerCase and that annoying greek character. Paul. > http://cr.openjdk.java.net/~sherman/8032012/ > > -Sherman >