Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/
With this update, "com." is valid (return "com."); "." and "example..com" are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei On 8/7/2013 10:18 PM, Michael McMahon wrote: > On 07/08/13 15:13, Xuelei Fan wrote: >> On 8/7/2013 10:05 PM, Michael McMahon wrote: >>> Resolvers seem to accept queries using trailing dots. >>> >>> eg nslookup www.oracle.com. >>> >>> or InetAddress.getByName("www.oracle.com."); >>> >>> The part of RFC3490 quoted below seems to me to be saying >>> that the empty label implied by the trailing dot is not regarded >>> as a label so that you don't end up calling toAscii() or toUnicode() >>> with an empty string. I don't think it's saying the trailing dot can't >>> be there. >>> >> It makes sense. >> >> What's your preference to return for IDN.toASCII("www.oracle.com."), >> "www.oracle.com." or "www.oracle.com"? The current returned value is >> "www.oracle.com". I would like to reserve the behavior in this update. > > My opinion is to keep it as at present ie. "www.oracle.com." > > Michael > >> I think we are on same page soon. >> >> Thanks, >> Xuelei >> >>> Michael >>> >>> On 07/08/13 13:44, Xuelei Fan wrote: >>>> On 8/7/2013 12:06 AM, Matthew Hall wrote: >>>>> Trailing dots are allowed in plain DNS (thus almost surely in IDN), >>>>> and the single dot represents the root zone. So you have to be >>>>> careful making this sort of change to check the DNS RFCs first. >>>> That's the first question we need to answer, whether IDN allow tailling >>>> dots ("com."), zero-length root label ("."), and zero-length label ("", >>>> for example ""example..com")? >>>> >>>> Per the specification of IDN.toASCII(): >>>> ======================================= >>>> "ToASCII operation can fail. ToASCII fails if any step of it fails. If >>>> ToASCII operation fails, an IllegalArgumentException will be thrown. In >>>> this case, the input string should not be used in an internationalized >>>> domain name. >>>> >>>> A label is an individual part of a domain name. The original ToASCII >>>> operation, as defined in RFC 3490, only operates on a single label. >>>> This >>>> method can handle both label and entire domain name, by assuming that >>>> labels in a domain name are always separated by dots. ... >>>> >>>> Throws IllegalArgumentException - if the input string doesn't >>>> conform to >>>> RFC 3490 specification" >>>> >>>> Per the specification of RFC 3490: >>>> ================================== >>>> [section 2] >>>> "A label is an individual part of a domain name. Labels are usually >>>> shown separated by dots; for example, the domain name >>>> "www.example.com" is composed of three labels: "www", "example", and >>>> "com". (The zero-length root label described in [STD13], which can >>>> be explicit as in "www.example.com." or implicit as in >>>> "www.example.com", is not considered a label in this >>>> specification.)" >>>> >>>> "An "internationalized label" is a label to which the ToASCII >>>> operation (see section 4) can be applied without failing (with the >>>> UseSTD3ASCIIRules flag unset). ... >>>> Although most Unicode characters can appear in >>>> internationalized labels, ToASCII will fail for some input strings, >>>> and such strings are not valid internationalized labels." >>>> >>>> "An "internationalized domain name" (IDN) is a domain name in which >>>> every label is an internationalized label." >>>> >>>> [Section 4.1] >>>> "ToASCII consists of the following steps: >>>> >>>> ... >>>> >>>> 8. Verify that the number of code points is in the range 1 to 63 >>>> inclusive." >>>> >>>> >>>> Here are the questions: >>>> 1. whether "example..com" is an valid IDN? >>>> As dot is used as label separators, there are three labels, >>>> "example", "", "com". Per RFC 3490, "" is not a valid label. Hence, >>>> "example..com" is not a valid IDN. >>>> >>>> We need to address the issue in IDN. >>>> >>>> 2. whether "xyz." is an valid IDN? >>>> It's an gray area, I think. We can treat the trailing "." as root >>>> label, or a label separator. >>>> If the trailing "." is treated as label separator, "xyz." is >>>> invalid >>>> per RFC 3490. >>>> if the trailing "." is treated as root label, what's the expected >>>> return value of IDN.toASCII("xyz.")? I think the return value can be >>>> either "xyz." or "xyz". The current implementation returns "xyz". >>>> >>>> We may need not to update the implementation if tailing "." is >>>> treated as root label. >>>> >>>> 3. whether "." is an valid IDN? >>>> It's an gray area again, I think. >>>> As above, if the trailing "." is treated as root label, I think >>>> the >>>> return value can be either "." or "". The current implementation >>>> throws >>>> a StringIndexOutOfBoundsException. >>>> >>>> However, what empty domain name ("") really means? I would >>>> prefer to >>>> return "." for "." instead. >>>> >>>> We need to address the issue in IDN. >>>> >>>> >>>> Here comes the solution, the IDN.toASCII() returns: >>>> 1. "." for "."; >>>> 2. "xyz" for "xyz."; >>>> 3. IAE for "example..com". >>>> >>>> Does it make sense? >>>> >>>> Thanks, >>>> Xuelei >>>> >>>> >>>> On 8/7/2013 1:35 AM, Michael McMahon wrote: >>>>> I don't really understand the reason for the restriction in >>>>> SNIHostName >>>>> But, I guess that is where it should be enforced if it is required. >>>>> >>>>> Michael. >>>>> >>>>> On 06/08/13 17:43, Dmitry Samersoff wrote: >>>>>> Xuelei, >>>>>> >>>>>> . (dot) is perfectly valid domain name and it means root domain so >>>>>> com. >>>>>> is valid domain name as well. >>>>>> >>>>>> It thinks to me that in context of methods your change we should >>>>>> ignore >>>>>> trailing dots, rather than throw exception. >>>>>> >>>>>> -Dmitry >>>>>> >>>>>> >>>>>> >>>>>> On 2013-08-06 15:44, Xuelei Fan wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review the bug fix to strict the illegal input checking in >>>>>>> IDN. >>>>>>> >>>>>>> webrev: http://cr.openjdk.java.net./~xuelei/8020842/webrev.00/ >>>>>>> >>>>>>> Here is two test cases, which are expected to get IAE. >>>>>>> >>>>>>> Case 1: >>>>>>> String host = IDN.toASCII(".", IDN.USE_STD3_ASCII_RULES); >>>>>>> Exception in thread "main" >>>>>>> java.lang.StringIndexOutOfBoundsException: >>>>>>> String index out of range: 0 >>>>>>> at java.lang.StringBuffer.charAt(StringBuffer.java:204) >>>>>>> at java.net.IDN.toASCIIInternal(IDN.java:279) >>>>>>> at java.net.IDN.toASCII(IDN.java:118) >>>>>>> >>>>>>> Case 2: >>>>>>> String host = IDN.toASCII("com.", IDN.USE_STD3_ASCII_RULES); >>>>>>> >>>>>>> Thanks, >>>>>>> Xuelei >>>>>>> >