Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Seems fine to me Xuelei. - Michael On 19/08/13 06:56, Xuelei Fan wrote: If no objections, I will push the change by COB Monday. Thanks, Xuelei On 8/13/2013 4:29 PM, Xuelei Fan wrote: Can I get an additional code review from networking team? Thanks, Xuelei On 8/12/2013 2:07 PM, Weijun Wang wrote: new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.06/
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
If no objections, I will push the change by COB Monday. Thanks, Xuelei On 8/13/2013 4:29 PM, Xuelei Fan wrote: Can I get an additional code review from networking team? Thanks, Xuelei On 8/12/2013 2:07 PM, Weijun Wang wrote: new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.06/
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
I've been confused through this discussion as to why a trailing dot would be regarded as illegal. Historically a trailing dot has been frequently (though not universally) used to denote a fully qualified domain name. https://en.wikipedia.org/wiki/Fully_qualified_domain_name Is this use now illegal/unsupported/invalid? Does having a trailing dot conflict with other parts of the IDN specification? Mike On Aug 13 2013, at 01:29 , Xuelei Fan wrote: Can I get an additional code review from networking team? Thanks, Xuelei On 8/12/2013 2:07 PM, Weijun Wang wrote: new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.06/
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On Aug 16, 2013, at 1:08, Mike Duigou mike.dui...@oracle.com wrote: I've been confused through this discussion as to why a trailing dot would be regarded as illegal. The discussion is too long to find the final decision easily. A IDN with trailing dot should be regarded as legal IDN. This update is trying to fix this. For example, . and example.com. are legal IDN, and IDN.toASCII() should be return the legal name accordingly. However, per the specification of Server Name Indication of TLS extension, a hostname should not end with trailing dot. So in SNIHostName, we will check the return value of IDN.toASCII() to filter out hostnames with trailing dots. This fix is trying to have IDN working with tailing dot and empty label correctly. The previous code of SNIHostName will work as expected if IDN can handle trailing dot properly. Thanks, Xuelei Historically a trailing dot has been frequently (though not universally) used to denote a fully qualified domain name. https://en.wikipedia.org/wiki/Fully_qualified_domain_name Is this use now illegal/unsupported/invalid? Does having a trailing dot conflict with other parts of the IDN specification? Mike On Aug 13 2013, at 01:29 , Xuelei Fan wrote: Can I get an additional code review from networking team? Thanks, Xuelei On 8/12/2013 2:07 PM, Weijun Wang wrote: new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.06/
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On Thu, Aug 15, 2013 at 10:08:35AM -0700, Mike Duigou wrote: I've been confused through this discussion as to why a trailing dot would be regarded as illegal. Historically a trailing dot has been frequently (though not universally) used to denote a fully qualified domain name. https://en.wikipedia.org/wiki/Fully_qualified_domain_name Is this use now illegal/unsupported/invalid? Does having a trailing dot conflict with other parts of the IDN specification? Mike This is why some of us were protesting the code which disallowed the trailing '.', and eventually the code was changed to allow it to be present. Matthew.
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.05/ Added a new test to test illegal hostname in SNIHostName. Xuelei On 8/10/2013 10:49 AM, Xuelei Fan wrote: Hi Michael, It is pretty hard to get the issue solved in SNIHostName in a good sharp. Here is my try to state why we should fix the issue in IDN. In SNIHostName, the following hostname are not accepted as valid hostname: 1. empty hostname 2. hostname ends with a trailing dot 3. hostname does not comply to RFC 3490. The process in SNIHostName looks like: 1. call IDN.toASCII() to convert a string hostname 2. check that the return value of #1 is an valid hostname (non-empty, non-end-with-tailing-dot). At present, the IDN cannot handle the following IDN properly. 1. returns com for com. the trailing dot is swallowed. 2. throws StringIndexOutOfBoundsException for . If . is an valid IDN that comply to RFC 3490, IDN.toASCII() should be able to handle it; otherwise, IDN.toASCII() should throw IAE as the specification suggested. However, IDN.toASCII(.) throws StringIndexOutOfBoundsException, this behavior does not comply the the specification: 3. throws StringIndexOutOfBoundsException for example...net As #2. We can address #1 and #2 in SNIHostName, but the checking is overloaded as IDN also need to address the issue. And SNIHostName has to know what's the separators (., \u3002, etc) of IDN in order to check the dot character. It is not a good encapsulation, and involved in too much about the details of domain name, I think. It is a little big hard to address #3 in SNIHostName. Both all of above issue can be easily addressed in IDN. And once IDN addressed these issues, the current SNIHostName is able to handle invalid hostname (empty, trailing dot, etc) correctly. We won't need to touch SNIHostName any more. Please consider it. The latest webrev is at: http://cr.openjdk.java.net/~xuelei/8020842/webrev.02/ Thanks, Xuelei On 8/10/2013 9:13 AM, Xuelei Fan wrote: Hi Michael, I plan to address this issue in SNIHostName. I have filled another two the potential bugs in IDN. Thank you, and other people, for the feedback. Thanks, Xuelei On 8/9/2013 11:25 PM, Xuelei Fan wrote: On 8/9/2013 7:31 PM, Michael McMahon wrote: I don't see how this fixes the original problem as the SNIHostName spec still doesn't like hostnames with a trailing '.' The bug description did not reflect the IDN specification correctly. If com. is a valid IDN, SNIHostName should accept it an an valid hostname. The host name in SNIHostName is nothing more or less than an standard IDN. I added a comment in the bug: com. and . are valid IDN according the IDN and domain name specifications. I will contact the bug reporter about this point. Xuelei I'd prefer to check first where that requirement is coming from, if it is actually necessary, and if not consider removing it from SNIHostName. If it is necessary, then the check should be implemented in SNIHostName. Michael On 09/08/13 05:28, Xuelei Fan wrote: Thanks for your feedback and suggestions. Here is the new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.02/ . is regarded as valid IDN in this update. Thanks, Xuelei On 8/9/2013 10:50 AM, Xuelei Fan wrote: On 8/9/2013 10:14 AM, Weijun Wang wrote: On 8/9/13 9:37 AM, Xuelei Fan wrote: On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Thanks for the testing. The behaviors are the same as this fix now. No exactly. It seems nslookup still regards . legal but just cannot find an IP for it. I'm not sure whether a root domain name can be stand alone. Root label is not considered as a label in IDN. I think it is safe to regard that . is not a valid IDN as it contains no label. Anyway, it is a corner case. There are many online IDN conversion web services, some of them can convert ., some of the cannot. In the present implementation, we cannot recognize ., and IDN.toASCII(.) throws StringIndexOutOfBoundsException. With this fix, I was wondering IAE is a better exception for IDN.toASCII(.). Learn something new today to use nslookup. Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? No, we cannot restrict the format of IDN in SNIHostName more than in IDN. However, we may need to rethink about the comparing of two IDN, for example, example.com. should equal to example.com. I want to consider it in another bug. Not sure. Does the spec say IDN and SNIHostName are equivalent sets? And it's not one is another's subset? Per TLS specification, host name in SNI is an IDN. The spec of SNIHostname
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
if (q input.length()) { // Ah, a dot! out.append('.'); } p = q + 1; Using if (q != input.length()) should be even better. The searchDots method clearly specifies that or if there is no dots, return the length of input string. --Max
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.06/ Lines 280 and 333: How about we call them steps 8a and 8b? Step 8 is referring to the steps in RFC 3490. Let's use step 8. Thanks, Xuelei On 8/12/2013 11:11 AM, Weijun Wang wrote: I think the fix is adequate and necessary. One problem: lines 367-373 adds a new IAE to ToUnicode but the method should not fail forever. And some small comments on styles etc. On 8/12/13 9:09 AM, Xuelei Fan wrote: new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.05/ Lines 123 and 185: 184 p = q + 1; 185 if (p input.length() || q == (input.length() - 1)) { 186// has more labels, or keep the trailing dot as at present 187out.append('.'); 188 } I prefer if (q input.length()) { // Ah, a dot! out.append('.'); } p = q + 1; Lines 282, 335, 270: Insert a blank after if. Lines 284 and 372: nslookup uses empty label, which I like better. Lines 453 and 460: Personally I don't like the parenthesis for the whole return value, but you have your choice. Lines 280 and 333: How about we call them steps 8a and 8b? Added a new test to test illegal hostname in SNIHostName. Excellent. Otherwise I will be wondering why the fix in IDN could solve the problem as described in the bug description. Thanks Max Xuelei On 8/10/2013 10:49 AM, Xuelei Fan wrote: Hi Michael, It is pretty hard to get the issue solved in SNIHostName in a good sharp. Here is my try to state why we should fix the issue in IDN. In SNIHostName, the following hostname are not accepted as valid hostname: 1. empty hostname 2. hostname ends with a trailing dot 3. hostname does not comply to RFC 3490. The process in SNIHostName looks like: 1. call IDN.toASCII() to convert a string hostname 2. check that the return value of #1 is an valid hostname (non-empty, non-end-with-tailing-dot). At present, the IDN cannot handle the following IDN properly. 1. returns com for com. the trailing dot is swallowed. 2. throws StringIndexOutOfBoundsException for . If . is an valid IDN that comply to RFC 3490, IDN.toASCII() should be able to handle it; otherwise, IDN.toASCII() should throw IAE as the specification suggested. However, IDN.toASCII(.) throws StringIndexOutOfBoundsException, this behavior does not comply the the specification: 3. throws StringIndexOutOfBoundsException for example...net As #2. We can address #1 and #2 in SNIHostName, but the checking is overloaded as IDN also need to address the issue. And SNIHostName has to know what's the separators (., \u3002, etc) of IDN in order to check the dot character. It is not a good encapsulation, and involved in too much about the details of domain name, I think. It is a little big hard to address #3 in SNIHostName. Both all of above issue can be easily addressed in IDN. And once IDN addressed these issues, the current SNIHostName is able to handle invalid hostname (empty, trailing dot, etc) correctly. We won't need to touch SNIHostName any more. Please consider it. The latest webrev is at: http://cr.openjdk.java.net/~xuelei/8020842/webrev.02/ Thanks, Xuelei On 8/10/2013 9:13 AM, Xuelei Fan wrote: Hi Michael, I plan to address this issue in SNIHostName. I have filled another two the potential bugs in IDN. Thank you, and other people, for the feedback. Thanks, Xuelei On 8/9/2013 11:25 PM, Xuelei Fan wrote: On 8/9/2013 7:31 PM, Michael McMahon wrote: I don't see how this fixes the original problem as the SNIHostName spec still doesn't like hostnames with a trailing '.' The bug description did not reflect the IDN specification correctly. If com. is a valid IDN, SNIHostName should accept it an an valid hostname. The host name in SNIHostName is nothing more or less than an standard IDN. I added a comment in the bug: com. and . are valid IDN according the IDN and domain name specifications. I will contact the bug reporter about this point. Xuelei I'd prefer to check first where that requirement is coming from, if it is actually necessary, and if not consider removing it from SNIHostName. If it is necessary, then the check should be implemented in SNIHostName. Michael On 09/08/13 05:28, Xuelei Fan wrote: Thanks for your feedback and suggestions. Here is the new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.02/ . is regarded as valid IDN in this update. Thanks, Xuelei On 8/9/2013 10:50 AM, Xuelei Fan wrote: On 8/9/2013 10:14 AM, Weijun Wang wrote: On 8/9/13 9:37 AM, Xuelei Fan wrote: On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Xuelei, 119 p = q + 1; 120 if (p input.length() || q == (input.length() - 1)) { Could be simplified to: q = input.length()-1 -Dmitry On 2013-08-09 04:41, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei On 8/7/2013 10:18 PM, Michael McMahon wrote: On 07/08/13 15:13, Xuelei Fan wrote: On 8/7/2013 10:05 PM, Michael McMahon wrote: Resolvers seem to accept queries using trailing dots. eg nslookup www.oracle.com. or InetAddress.getByName(www.oracle.com.); The part of RFC3490 quoted below seems to me to be saying that the empty label implied by the trailing dot is not regarded as a label so that you don't end up calling toAscii() or toUnicode() with an empty string. I don't think it's saying the trailing dot can't be there. It makes sense. What's your preference to return for IDN.toASCII(www.oracle.com.), www.oracle.com. or www.oracle.com? The current returned value is www.oracle.com. I would like to reserve the behavior in this update. My opinion is to keep it as at present ie. www.oracle.com. Michael I think we are on same page soon. Thanks, Xuelei Michael On 07/08/13 13:44, Xuelei Fan wrote: On 8/7/2013 12:06 AM, Matthew Hall wrote: Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. That's the first question we need to answer, whether IDN allow tailling dots (com.), zero-length root label (.), and zero-length label (, for example example..com)? Per the specification of IDN.toASCII(): === ToASCII operation can fail. ToASCII fails if any step of it fails. If ToASCII operation fails, an IllegalArgumentException will be thrown. In this case, the input string should not be used in an internationalized domain name. A label is an individual part of a domain name. The original ToASCII operation, as defined in RFC 3490, only operates on a single label. This method can handle both label and entire domain name, by assuming that labels in a domain name are always separated by dots. ... Throws IllegalArgumentException - if the input string doesn't conform to RFC 3490 specification Per the specification of RFC 3490: == [section 2] A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name www.example.com is composed of three labels: www, example, and com. (The zero-length root label described in [STD13], which can be explicit as in www.example.com. or implicit as in www.example.com, is not considered a label in this specification.) An internationalized label is a label to which the ToASCII operation (see section 4) can be applied without failing (with the UseSTD3ASCIIRules flag unset). ... Although most Unicode characters can appear in internationalized labels, ToASCII will fail for some input strings, and such strings are not valid internationalized labels. An internationalized domain name (IDN) is a domain name in which every label is an internationalized label. [Section 4.1] ToASCII consists of the following steps: ... 8. Verify that the number of code points is in the range 1 to 63 inclusive. Here are the questions: 1. whether example..com is an valid IDN? As dot is used as label separators, there are three labels, example, , com. Per RFC 3490, is not a valid label. Hence, example..com is not a valid IDN. We need to address the issue in IDN. 2. whether xyz. is an valid IDN? It's an gray area, I think. We can treat the trailing . as root label, or a label separator. If the trailing . is treated as label separator, xyz. is invalid per RFC 3490. if the trailing . is treated as root label, what's the expected return value of IDN.toASCII(xyz.)? I think the return value can be either xyz. or xyz. The current implementation returns xyz. We may need not to update the implementation if tailing . is treated as root label. 3. whether . is an valid IDN? It's an gray area again, I think. As above, if the trailing . is treated as root label, I think the return value can be either . or . The current implementation throws a StringIndexOutOfBoundsException. However, what empty domain name () really means? I would prefer to return . for . instead. We need to address the issue in IDN. Here comes the solution, the IDN.toASCII() returns: 1. . for .; 2. xyz for xyz.; 3. IAE for example..com. Does it make sense? Thanks, Xuelei On 8/7/2013
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On Aug 9, 2013, at 14:08, Dmitry Samersoff dmitry.samers...@oracle.com wrote: Xuelei, 119 p = q + 1; 120 if (p input.length() || q == (input.length() - 1)) { Could be simplified to: q = input.length()-1 It's cool! Xuelei -Dmitry On 2013-08-09 04:41, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei On 8/7/2013 10:18 PM, Michael McMahon wrote: On 07/08/13 15:13, Xuelei Fan wrote: On 8/7/2013 10:05 PM, Michael McMahon wrote: Resolvers seem to accept queries using trailing dots. eg nslookup www.oracle.com. or InetAddress.getByName(www.oracle.com.); The part of RFC3490 quoted below seems to me to be saying that the empty label implied by the trailing dot is not regarded as a label so that you don't end up calling toAscii() or toUnicode() with an empty string. I don't think it's saying the trailing dot can't be there. It makes sense. What's your preference to return for IDN.toASCII(www.oracle.com.), www.oracle.com. or www.oracle.com? The current returned value is www.oracle.com. I would like to reserve the behavior in this update. My opinion is to keep it as at present ie. www.oracle.com. Michael I think we are on same page soon. Thanks, Xuelei Michael On 07/08/13 13:44, Xuelei Fan wrote: On 8/7/2013 12:06 AM, Matthew Hall wrote: Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. That's the first question we need to answer, whether IDN allow tailling dots (com.), zero-length root label (.), and zero-length label (, for example example..com)? Per the specification of IDN.toASCII(): === ToASCII operation can fail. ToASCII fails if any step of it fails. If ToASCII operation fails, an IllegalArgumentException will be thrown. In this case, the input string should not be used in an internationalized domain name. A label is an individual part of a domain name. The original ToASCII operation, as defined in RFC 3490, only operates on a single label. This method can handle both label and entire domain name, by assuming that labels in a domain name are always separated by dots. ... Throws IllegalArgumentException - if the input string doesn't conform to RFC 3490 specification Per the specification of RFC 3490: == [section 2] A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name www.example.com is composed of three labels: www, example, and com. (The zero-length root label described in [STD13], which can be explicit as in www.example.com. or implicit as in www.example.com, is not considered a label in this specification.) An internationalized label is a label to which the ToASCII operation (see section 4) can be applied without failing (with the UseSTD3ASCIIRules flag unset). ... Although most Unicode characters can appear in internationalized labels, ToASCII will fail for some input strings, and such strings are not valid internationalized labels. An internationalized domain name (IDN) is a domain name in which every label is an internationalized label. [Section 4.1] ToASCII consists of the following steps: ... 8. Verify that the number of code points is in the range 1 to 63 inclusive. Here are the questions: 1. whether example..com is an valid IDN? As dot is used as label separators, there are three labels, example, , com. Per RFC 3490, is not a valid label. Hence, example..com is not a valid IDN. We need to address the issue in IDN. 2. whether xyz. is an valid IDN? It's an gray area, I think. We can treat the trailing . as root label, or a label separator. If the trailing . is treated as label separator, xyz. is invalid per RFC 3490. if the trailing . is treated as root label, what's the expected return value of IDN.toASCII(xyz.)? I think the return value can be either xyz. or xyz. The current implementation returns xyz. We may need not to update the implementation if tailing . is treated as root label. 3. whether . is an valid IDN? It's an gray area again, I think. As above, if the trailing . is treated as root label, I think the return value can be either . or . The current implementation throws a StringIndexOutOfBoundsException. However, what empty domain name () really means? I would prefer to return . for . instead. We need to address the issue in IDN. Here comes the solution, the IDN.toASCII()
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
I don't see how this fixes the original problem as the SNIHostName spec still doesn't like hostnames with a trailing '.' I'd prefer to check first where that requirement is coming from, if it is actually necessary, and if not consider removing it from SNIHostName. If it is necessary, then the check should be implemented in SNIHostName. Michael On 09/08/13 05:28, Xuelei Fan wrote: Thanks for your feedback and suggestions. Here is the new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.02/ . is regarded as valid IDN in this update. Thanks, Xuelei On 8/9/2013 10:50 AM, Xuelei Fan wrote: On 8/9/2013 10:14 AM, Weijun Wang wrote: On 8/9/13 9:37 AM, Xuelei Fan wrote: On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Thanks for the testing. The behaviors are the same as this fix now. No exactly. It seems nslookup still regards . legal but just cannot find an IP for it. I'm not sure whether a root domain name can be stand alone. Root label is not considered as a label in IDN. I think it is safe to regard that . is not a valid IDN as it contains no label. Anyway, it is a corner case. There are many online IDN conversion web services, some of them can convert ., some of the cannot. In the present implementation, we cannot recognize ., and IDN.toASCII(.) throws StringIndexOutOfBoundsException. With this fix, I was wondering IAE is a better exception for IDN.toASCII(.). Learn something new today to use nslookup. Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? No, we cannot restrict the format of IDN in SNIHostName more than in IDN. However, we may need to rethink about the comparing of two IDN, for example, example.com. should equal to example.com. I want to consider it in another bug. Not sure. Does the spec say IDN and SNIHostName are equivalent sets? And it's not one is another's subset? Per TLS specification, host name in SNI is an IDN. The spec of SNIHostname says, hostname is not a valid Internationalized Domain Name (IDN) compliant with the RFC 3490 specification. The spec in SNIHostName has the same means as IDN. I won't want to add additional restrict beyond the specification of an IDN. Xuelei Can I push the changeset? I think it's better to ask someone in the networking team to make the suggestion. From what I read Michael in this thread, he does not seem totally agreed with your code changes (at least not the 00 version). Thanks Max Thanks, Xuelei Thanks Max On 8/9/13 8:41 AM, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On 8/9/2013 7:31 PM, Michael McMahon wrote: I don't see how this fixes the original problem as the SNIHostName spec still doesn't like hostnames with a trailing '.' The bug description did not reflect the IDN specification correctly. If com. is a valid IDN, SNIHostName should accept it an an valid hostname. The host name in SNIHostName is nothing more or less than an standard IDN. I added a comment in the bug: com. and . are valid IDN according the IDN and domain name specifications. I will contact the bug reporter about this point. Xuelei I'd prefer to check first where that requirement is coming from, if it is actually necessary, and if not consider removing it from SNIHostName. If it is necessary, then the check should be implemented in SNIHostName. Michael On 09/08/13 05:28, Xuelei Fan wrote: Thanks for your feedback and suggestions. Here is the new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.02/ . is regarded as valid IDN in this update. Thanks, Xuelei On 8/9/2013 10:50 AM, Xuelei Fan wrote: On 8/9/2013 10:14 AM, Weijun Wang wrote: On 8/9/13 9:37 AM, Xuelei Fan wrote: On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Thanks for the testing. The behaviors are the same as this fix now. No exactly. It seems nslookup still regards . legal but just cannot find an IP for it. I'm not sure whether a root domain name can be stand alone. Root label is not considered as a label in IDN. I think it is safe to regard that . is not a valid IDN as it contains no label. Anyway, it is a corner case. There are many online IDN conversion web services, some of them can convert ., some of the cannot. In the present implementation, we cannot recognize ., and IDN.toASCII(.) throws StringIndexOutOfBoundsException. With this fix, I was wondering IAE is a better exception for IDN.toASCII(.). Learn something new today to use nslookup. Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? No, we cannot restrict the format of IDN in SNIHostName more than in IDN. However, we may need to rethink about the comparing of two IDN, for example, example.com. should equal to example.com. I want to consider it in another bug. Not sure. Does the spec say IDN and SNIHostName are equivalent sets? And it's not one is another's subset? Per TLS specification, host name in SNI is an IDN. The spec of SNIHostname says, hostname is not a valid Internationalized Domain Name (IDN) compliant with the RFC 3490 specification. The spec in SNIHostName has the same means as IDN. I won't want to add additional restrict beyond the specification of an IDN. Xuelei Can I push the changeset? I think it's better to ask someone in the networking team to make the suggestion. From what I read Michael in this thread, he does not seem totally agreed with your code changes (at least not the 00 version). Thanks Max Thanks, Xuelei Thanks Max On 8/9/13 8:41 AM, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Hi Michael, I plan to address this issue in SNIHostName. I have filled another two the potential bugs in IDN. Thank you, and other people, for the feedback. Thanks, Xuelei On 8/9/2013 11:25 PM, Xuelei Fan wrote: On 8/9/2013 7:31 PM, Michael McMahon wrote: I don't see how this fixes the original problem as the SNIHostName spec still doesn't like hostnames with a trailing '.' The bug description did not reflect the IDN specification correctly. If com. is a valid IDN, SNIHostName should accept it an an valid hostname. The host name in SNIHostName is nothing more or less than an standard IDN. I added a comment in the bug: com. and . are valid IDN according the IDN and domain name specifications. I will contact the bug reporter about this point. Xuelei I'd prefer to check first where that requirement is coming from, if it is actually necessary, and if not consider removing it from SNIHostName. If it is necessary, then the check should be implemented in SNIHostName. Michael On 09/08/13 05:28, Xuelei Fan wrote: Thanks for your feedback and suggestions. Here is the new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.02/ . is regarded as valid IDN in this update. Thanks, Xuelei On 8/9/2013 10:50 AM, Xuelei Fan wrote: On 8/9/2013 10:14 AM, Weijun Wang wrote: On 8/9/13 9:37 AM, Xuelei Fan wrote: On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Thanks for the testing. The behaviors are the same as this fix now. No exactly. It seems nslookup still regards . legal but just cannot find an IP for it. I'm not sure whether a root domain name can be stand alone. Root label is not considered as a label in IDN. I think it is safe to regard that . is not a valid IDN as it contains no label. Anyway, it is a corner case. There are many online IDN conversion web services, some of them can convert ., some of the cannot. In the present implementation, we cannot recognize ., and IDN.toASCII(.) throws StringIndexOutOfBoundsException. With this fix, I was wondering IAE is a better exception for IDN.toASCII(.). Learn something new today to use nslookup. Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? No, we cannot restrict the format of IDN in SNIHostName more than in IDN. However, we may need to rethink about the comparing of two IDN, for example, example.com. should equal to example.com. I want to consider it in another bug. Not sure. Does the spec say IDN and SNIHostName are equivalent sets? And it's not one is another's subset? Per TLS specification, host name in SNI is an IDN. The spec of SNIHostname says, hostname is not a valid Internationalized Domain Name (IDN) compliant with the RFC 3490 specification. The spec in SNIHostName has the same means as IDN. I won't want to add additional restrict beyond the specification of an IDN. Xuelei Can I push the changeset? I think it's better to ask someone in the networking team to make the suggestion. From what I read Michael in this thread, he does not seem totally agreed with your code changes (at least not the 00 version). Thanks Max Thanks, Xuelei Thanks Max On 8/9/13 8:41 AM, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Hi Michael, It is pretty hard to get the issue solved in SNIHostName in a good sharp. Here is my try to state why we should fix the issue in IDN. In SNIHostName, the following hostname are not accepted as valid hostname: 1. empty hostname 2. hostname ends with a trailing dot 3. hostname does not comply to RFC 3490. The process in SNIHostName looks like: 1. call IDN.toASCII() to convert a string hostname 2. check that the return value of #1 is an valid hostname (non-empty, non-end-with-tailing-dot). At present, the IDN cannot handle the following IDN properly. 1. returns com for com. the trailing dot is swallowed. 2. throws StringIndexOutOfBoundsException for . If . is an valid IDN that comply to RFC 3490, IDN.toASCII() should be able to handle it; otherwise, IDN.toASCII() should throw IAE as the specification suggested. However, IDN.toASCII(.) throws StringIndexOutOfBoundsException, this behavior does not comply the the specification: 3. throws StringIndexOutOfBoundsException for example...net As #2. We can address #1 and #2 in SNIHostName, but the checking is overloaded as IDN also need to address the issue. And SNIHostName has to know what's the separators (., \u3002, etc) of IDN in order to check the dot character. It is not a good encapsulation, and involved in too much about the details of domain name, I think. It is a little big hard to address #3 in SNIHostName. Both all of above issue can be easily addressed in IDN. And once IDN addressed these issues, the current SNIHostName is able to handle invalid hostname (empty, trailing dot, etc) correctly. We won't need to touch SNIHostName any more. Please consider it. The latest webrev is at: http://cr.openjdk.java.net/~xuelei/8020842/webrev.02/ Thanks, Xuelei On 8/10/2013 9:13 AM, Xuelei Fan wrote: Hi Michael, I plan to address this issue in SNIHostName. I have filled another two the potential bugs in IDN. Thank you, and other people, for the feedback. Thanks, Xuelei On 8/9/2013 11:25 PM, Xuelei Fan wrote: On 8/9/2013 7:31 PM, Michael McMahon wrote: I don't see how this fixes the original problem as the SNIHostName spec still doesn't like hostnames with a trailing '.' The bug description did not reflect the IDN specification correctly. If com. is a valid IDN, SNIHostName should accept it an an valid hostname. The host name in SNIHostName is nothing more or less than an standard IDN. I added a comment in the bug: com. and . are valid IDN according the IDN and domain name specifications. I will contact the bug reporter about this point. Xuelei I'd prefer to check first where that requirement is coming from, if it is actually necessary, and if not consider removing it from SNIHostName. If it is necessary, then the check should be implemented in SNIHostName. Michael On 09/08/13 05:28, Xuelei Fan wrote: Thanks for your feedback and suggestions. Here is the new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.02/ . is regarded as valid IDN in this update. Thanks, Xuelei On 8/9/2013 10:50 AM, Xuelei Fan wrote: On 8/9/2013 10:14 AM, Weijun Wang wrote: On 8/9/13 9:37 AM, Xuelei Fan wrote: On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Thanks for the testing. The behaviors are the same as this fix now. No exactly. It seems nslookup still regards . legal but just cannot find an IP for it. I'm not sure whether a root domain name can be stand alone. Root label is not considered as a label in IDN. I think it is safe to regard that . is not a valid IDN as it contains no label. Anyway, it is a corner case. There are many online IDN conversion web services, some of them can convert ., some of the cannot. In the present implementation, we cannot recognize ., and IDN.toASCII(.) throws StringIndexOutOfBoundsException. With this fix, I was wondering IAE is a better exception for IDN.toASCII(.). Learn something new today to use nslookup. Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? No, we cannot restrict the format of IDN in SNIHostName more than in IDN. However, we may need to rethink about the comparing of two IDN, for example, example.com. should equal to example.com. I want to consider it in another bug. Not sure. Does the spec say IDN and SNIHostName are equivalent sets? And it's not one is another's subset? Per TLS specification, host name in SNI is an IDN. The spec of SNIHostname says, hostname is not a valid Internationalized Domain Name (IDN) compliant with the RFC 3490 specification. The spec in SNIHostName has the same means as IDN. I won't want to add additional restrict beyond the
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei On 8/7/2013 10:18 PM, Michael McMahon wrote: On 07/08/13 15:13, Xuelei Fan wrote: On 8/7/2013 10:05 PM, Michael McMahon wrote: Resolvers seem to accept queries using trailing dots. eg nslookup www.oracle.com. or InetAddress.getByName(www.oracle.com.); The part of RFC3490 quoted below seems to me to be saying that the empty label implied by the trailing dot is not regarded as a label so that you don't end up calling toAscii() or toUnicode() with an empty string. I don't think it's saying the trailing dot can't be there. It makes sense. What's your preference to return for IDN.toASCII(www.oracle.com.), www.oracle.com. or www.oracle.com? The current returned value is www.oracle.com. I would like to reserve the behavior in this update. My opinion is to keep it as at present ie. www.oracle.com. Michael I think we are on same page soon. Thanks, Xuelei Michael On 07/08/13 13:44, Xuelei Fan wrote: On 8/7/2013 12:06 AM, Matthew Hall wrote: Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. That's the first question we need to answer, whether IDN allow tailling dots (com.), zero-length root label (.), and zero-length label (, for example example..com)? Per the specification of IDN.toASCII(): === ToASCII operation can fail. ToASCII fails if any step of it fails. If ToASCII operation fails, an IllegalArgumentException will be thrown. In this case, the input string should not be used in an internationalized domain name. A label is an individual part of a domain name. The original ToASCII operation, as defined in RFC 3490, only operates on a single label. This method can handle both label and entire domain name, by assuming that labels in a domain name are always separated by dots. ... Throws IllegalArgumentException - if the input string doesn't conform to RFC 3490 specification Per the specification of RFC 3490: == [section 2] A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name www.example.com is composed of three labels: www, example, and com. (The zero-length root label described in [STD13], which can be explicit as in www.example.com. or implicit as in www.example.com, is not considered a label in this specification.) An internationalized label is a label to which the ToASCII operation (see section 4) can be applied without failing (with the UseSTD3ASCIIRules flag unset). ... Although most Unicode characters can appear in internationalized labels, ToASCII will fail for some input strings, and such strings are not valid internationalized labels. An internationalized domain name (IDN) is a domain name in which every label is an internationalized label. [Section 4.1] ToASCII consists of the following steps: ... 8. Verify that the number of code points is in the range 1 to 63 inclusive. Here are the questions: 1. whether example..com is an valid IDN? As dot is used as label separators, there are three labels, example, , com. Per RFC 3490, is not a valid label. Hence, example..com is not a valid IDN. We need to address the issue in IDN. 2. whether xyz. is an valid IDN? It's an gray area, I think. We can treat the trailing . as root label, or a label separator. If the trailing . is treated as label separator, xyz. is invalid per RFC 3490. if the trailing . is treated as root label, what's the expected return value of IDN.toASCII(xyz.)? I think the return value can be either xyz. or xyz. The current implementation returns xyz. We may need not to update the implementation if tailing . is treated as root label. 3. whether . is an valid IDN? It's an gray area again, I think. As above, if the trailing . is treated as root label, I think the return value can be either . or . The current implementation throws a StringIndexOutOfBoundsException. However, what empty domain name () really means? I would prefer to return . for . instead. We need to address the issue in IDN. Here comes the solution, the IDN.toASCII() returns: 1. . for .; 2. xyz for xyz.; 3. IAE for example..com. Does it make sense? Thanks, Xuelei On 8/7/2013 1:35 AM, Michael McMahon wrote: I don't really understand the reason for the restriction in SNIHostName But, I guess that is where it should be enforced if it is required. Michael. On 06/08/13 17:43,
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server: 192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? Thanks Max On 8/9/13 8:41 AM, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Thanks for the testing. The behaviors are the same as this fix now. Learn something new today to use nslookup. Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? No, we cannot restrict the format of IDN in SNIHostName more than in IDN. However, we may need to rethink about the comparing of two IDN, for example, example.com. should equal to example.com. I want to consider it in another bug. Can I push the changeset? Thanks, Xuelei Thanks Max On 8/9/13 8:41 AM, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On 8/9/13 9:37 AM, Xuelei Fan wrote: On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Thanks for the testing. The behaviors are the same as this fix now. No exactly. It seems nslookup still regards . legal but just cannot find an IP for it. Learn something new today to use nslookup. Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? No, we cannot restrict the format of IDN in SNIHostName more than in IDN. However, we may need to rethink about the comparing of two IDN, for example, example.com. should equal to example.com. I want to consider it in another bug. Not sure. Does the spec say IDN and SNIHostName are equivalent sets? And it's not one is another's subset? Can I push the changeset? I think it's better to ask someone in the networking team to make the suggestion. From what I read Michael in this thread, he does not seem totally agreed with your code changes (at least not the 00 version). Thanks Max Thanks, Xuelei Thanks Max On 8/9/13 8:41 AM, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On 8/9/2013 10:14 AM, Weijun Wang wrote: On 8/9/13 9:37 AM, Xuelei Fan wrote: On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Thanks for the testing. The behaviors are the same as this fix now. No exactly. It seems nslookup still regards . legal but just cannot find an IP for it. I'm not sure whether a root domain name can be stand alone. Root label is not considered as a label in IDN. I think it is safe to regard that . is not a valid IDN as it contains no label. Anyway, it is a corner case. There are many online IDN conversion web services, some of them can convert ., some of the cannot. In the present implementation, we cannot recognize ., and IDN.toASCII(.) throws StringIndexOutOfBoundsException. With this fix, I was wondering IAE is a better exception for IDN.toASCII(.). Learn something new today to use nslookup. Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? No, we cannot restrict the format of IDN in SNIHostName more than in IDN. However, we may need to rethink about the comparing of two IDN, for example, example.com. should equal to example.com. I want to consider it in another bug. Not sure. Does the spec say IDN and SNIHostName are equivalent sets? And it's not one is another's subset? Per TLS specification, host name in SNI is an IDN. The spec of SNIHostname says, hostname is not a valid Internationalized Domain Name (IDN) compliant with the RFC 3490 specification. The spec in SNIHostName has the same means as IDN. I won't want to add additional restrict beyond the specification of an IDN. Xuelei Can I push the changeset? I think it's better to ask someone in the networking team to make the suggestion. From what I read Michael in this thread, he does not seem totally agreed with your code changes (at least not the 00 version). Thanks Max Thanks, Xuelei Thanks Max On 8/9/13 8:41 AM, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
But, DNS considers . as the valid root zone... -- Sent from my mobile device. Xuelei Fan xuelei@oracle.com wrote: On 8/9/2013 10:14 AM, Weijun Wang wrote: On 8/9/13 9:37 AM, Xuelei Fan wrote: On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Thanks for the testing. The behaviors are the same as this fix now. No exactly. It seems nslookup still regards . legal but just cannot find an IP for it. I'm not sure whether a root domain name can be stand alone. Root label is not considered as a label in IDN. I think it is safe to regard that . is not a valid IDN as it contains no label. Anyway, it is a corner case. There are many online IDN conversion web services, some of them can convert ., some of the cannot. In the present implementation, we cannot recognize ., and IDN.toASCII(.) throws StringIndexOutOfBoundsException. With this fix, I was wondering IAE is a better exception for IDN.toASCII(.). Learn something new today to use nslookup. Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? No, we cannot restrict the format of IDN in SNIHostName more than in IDN. However, we may need to rethink about the comparing of two IDN, for example, example.com. should equal to example.com. I want to consider it in another bug. Not sure. Does the spec say IDN and SNIHostName are equivalent sets? And it's not one is another's subset? Per TLS specification, host name in SNI is an IDN. The spec of SNIHostname says, hostname is not a valid Internationalized Domain Name (IDN) compliant with the RFC 3490 specification. The spec in SNIHostName has the same means as IDN. I won't want to add additional restrict beyond the specification of an IDN. Xuelei Can I push the changeset? I think it's better to ask someone in the networking team to make the suggestion. From what I read Michael in this thread, he does not seem totally agreed with your code changes (at least not the 00 version). Thanks Max Thanks, Xuelei Thanks Max On 8/9/13 8:41 AM, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On 8/9/2013 11:24 AM, Matthew Hall wrote: But, DNS considers . as the valid root zone... Good! Looks like that IDN.toASCII(.) should returns ., so that a general domain name can always use IDN.toASCII() conversion instead of throwing runtime exception. Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Thanks for your feedback and suggestions. Here is the new webrev: http://cr.openjdk.java.net/~xuelei/8020842/webrev.02/ . is regarded as valid IDN in this update. Thanks, Xuelei On 8/9/2013 10:50 AM, Xuelei Fan wrote: On 8/9/2013 10:14 AM, Weijun Wang wrote: On 8/9/13 9:37 AM, Xuelei Fan wrote: On 8/9/2013 9:22 AM, Weijun Wang wrote: I tried nslookup. Those with .. inside are illegal, $ nslookup com.. nslookup: 'com..' is not a legal name (empty label) but $ nslookup . Server:192.168.10.1 Address:192.168.10.1#53 Non-authoritative answer: *** Can't find .: No answer Thanks for the testing. The behaviors are the same as this fix now. No exactly. It seems nslookup still regards . legal but just cannot find an IP for it. I'm not sure whether a root domain name can be stand alone. Root label is not considered as a label in IDN. I think it is safe to regard that . is not a valid IDN as it contains no label. Anyway, it is a corner case. There are many online IDN conversion web services, some of them can convert ., some of the cannot. In the present implementation, we cannot recognize ., and IDN.toASCII(.) throws StringIndexOutOfBoundsException. With this fix, I was wondering IAE is a better exception for IDN.toASCII(.). Learn something new today to use nslookup. Also, since this bug was originally about SNIHostName, do you need to add some extra restriction there to reject oracle.com. things? No, we cannot restrict the format of IDN in SNIHostName more than in IDN. However, we may need to rethink about the comparing of two IDN, for example, example.com. should equal to example.com. I want to consider it in another bug. Not sure. Does the spec say IDN and SNIHostName are equivalent sets? And it's not one is another's subset? Per TLS specification, host name in SNI is an IDN. The spec of SNIHostname says, hostname is not a valid Internationalized Domain Name (IDN) compliant with the RFC 3490 specification. The spec in SNIHostName has the same means as IDN. I won't want to add additional restrict beyond the specification of an IDN. Xuelei Can I push the changeset? I think it's better to ask someone in the networking team to make the suggestion. From what I read Michael in this thread, he does not seem totally agreed with your code changes (at least not the 00 version). Thanks Max Thanks, Xuelei Thanks Max On 8/9/13 8:41 AM, Xuelei Fan wrote: Ping. Thanks, Xuelei On 8/7/2013 11:17 PM, Xuelei Fan wrote: Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On 8/7/2013 12:06 AM, Matthew Hall wrote: Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. That's the first question we need to answer, whether IDN allow tailling dots (com.), zero-length root label (.), and zero-length label (, for example example..com)? Per the specification of IDN.toASCII(): === ToASCII operation can fail. ToASCII fails if any step of it fails. If ToASCII operation fails, an IllegalArgumentException will be thrown. In this case, the input string should not be used in an internationalized domain name. A label is an individual part of a domain name. The original ToASCII operation, as defined in RFC 3490, only operates on a single label. This method can handle both label and entire domain name, by assuming that labels in a domain name are always separated by dots. ... Throws IllegalArgumentException - if the input string doesn't conform to RFC 3490 specification Per the specification of RFC 3490: == [section 2] A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name www.example.com is composed of three labels: www, example, and com. (The zero-length root label described in [STD13], which can be explicit as in www.example.com. or implicit as in www.example.com, is not considered a label in this specification.) An internationalized label is a label to which the ToASCII operation (see section 4) can be applied without failing (with the UseSTD3ASCIIRules flag unset). ... Although most Unicode characters can appear in internationalized labels, ToASCII will fail for some input strings, and such strings are not valid internationalized labels. An internationalized domain name (IDN) is a domain name in which every label is an internationalized label. [Section 4.1] ToASCII consists of the following steps: ... 8. Verify that the number of code points is in the range 1 to 63 inclusive. Here are the questions: 1. whether example..com is an valid IDN? As dot is used as label separators, there are three labels, example, , com. Per RFC 3490, is not a valid label. Hence, example..com is not a valid IDN. We need to address the issue in IDN. 2. whether xyz. is an valid IDN? It's an gray area, I think. We can treat the trailing . as root label, or a label separator. If the trailing . is treated as label separator, xyz. is invalid per RFC 3490. if the trailing . is treated as root label, what's the expected return value of IDN.toASCII(xyz.)? I think the return value can be either xyz. or xyz. The current implementation returns xyz. We may need not to update the implementation if tailing . is treated as root label. 3. whether . is an valid IDN? It's an gray area again, I think. As above, if the trailing . is treated as root label, I think the return value can be either . or . The current implementation throws a StringIndexOutOfBoundsException. However, what empty domain name () really means? I would prefer to return . for . instead. We need to address the issue in IDN. Here comes the solution, the IDN.toASCII() returns: 1. . for .; 2. xyz for xyz.; 3. IAE for example..com. Does it make sense? Thanks, Xuelei On 8/7/2013 1:35 AM, Michael McMahon wrote: I don't really understand the reason for the restriction in SNIHostName But, I guess that is where it should be enforced if it is required. Michael. On 06/08/13 17:43, Dmitry Samersoff wrote: Xuelei, . (dot) is perfectly valid domain name and it means root domain so com. is valid domain name as well. It thinks to me that in context of methods your change we should ignore trailing dots, rather than throw exception. -Dmitry On 2013-08-06 15:44, Xuelei Fan wrote: Hi, Please review the bug fix to strict the illegal input checking in IDN. webrev: http://cr.openjdk.java.net./~xuelei/8020842/webrev.00/ Here is two test cases, which are expected to get IAE. Case 1: String host = IDN.toASCII(., IDN.USE_STD3_ASCII_RULES); Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.StringBuffer.charAt(StringBuffer.java:204) at java.net.IDN.toASCIIInternal(IDN.java:279) at java.net.IDN.toASCII(IDN.java:118) Case 2: String host = IDN.toASCII(com., IDN.USE_STD3_ASCII_RULES); Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Xuelei, root label is an empty label[1], dot is a label separator, so in printed form domain names is dot-terminated. Please see also below inline. [1] RFC rfc1034.txt: Internally, programs that manipulate domain names should represent them as sequences of labels, where each label is a length octet followed by an octet string. Because all domain names end at the root, *which has a null string for a label*, these internal representations can use a length byte of zero to terminate a domain name. On 2013-08-07 16:44, Xuelei Fan wrote: On 8/7/2013 12:06 AM, Matthew Hall wrote: Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. That's the first question we need to answer, whether IDN allow tailling dots (com.), zero-length root label (.), and zero-length label (, for example example..com)? Per the specification of IDN.toASCII(): === ToASCII operation can fail. ToASCII fails if any step of it fails. If ToASCII operation fails, an IllegalArgumentException will be thrown. In this case, the input string should not be used in an internationalized domain name. A label is an individual part of a domain name. The original ToASCII operation, as defined in RFC 3490, only operates on a single label. This method can handle both label and entire domain name, by assuming that labels in a domain name are always separated by dots. ... Throws IllegalArgumentException - if the input string doesn't conform to RFC 3490 specification Per the specification of RFC 3490: == [section 2] A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name www.example.com is composed of three labels: www, example, and com. (The zero-length root label described in [STD13], which can be explicit as in www.example.com. or implicit as in www.example.com, is not considered a label in this specification.) An internationalized label is a label to which the ToASCII operation (see section 4) can be applied without failing (with the UseSTD3ASCIIRules flag unset). ... Although most Unicode characters can appear in internationalized labels, ToASCII will fail for some input strings, and such strings are not valid internationalized labels. An internationalized domain name (IDN) is a domain name in which every label is an internationalized label. [Section 4.1] ToASCII consists of the following steps: ... 8. Verify that the number of code points is in the range 1 to 63 inclusive. Here are the questions: 1. whether example..com is an valid IDN? As dot is used as label separators, there are three labels, example, , com. Per RFC 3490, is not a valid label. Hence, example..com is not a valid IDN. We need to address the issue in IDN. Root label can't appear in the middle of domain name, so example..com is an invalid domain name and appropriate exception have to be thrown. 2. whether xyz. is an valid IDN? It's an gray area, I think. We can treat the trailing . as root label, or a label separator. If the trailing . is treated as label separator, xyz. is invalid per RFC 3490. if the trailing . is treated as root label, what's the expected return value of IDN.toASCII(xyz.)? I think the return value can be either xyz. or xyz. The current implementation returns xyz. We may need not to update the implementation if tailing . is treated as root label. Empty label at the end of domain names is valid per RFC 1034 and means root label. So we should process this name and return all non-empty labels. 3. whether . is an valid IDN? It's an gray area again, I think. As above, if the trailing . is treated as root label, I think the return value can be either . or . The current implementation throws a StringIndexOutOfBoundsException. However, what empty domain name () really means? I would prefer to return . for . instead. We need to address the issue in IDN. As dot is a label separator and root (empty) label can't appear in the middle of domain name, . (dot) is not valid name and this case is similar to case (1) - we should throw an appropriate exception. -Dmitry Here comes the solution, the IDN.toASCII() returns: 1. . for .; 2. xyz for xyz.; 3. IAE for example..com. Does it make sense? Thanks, Xuelei On 8/7/2013 1:35 AM, Michael McMahon wrote: I don't really understand the reason for the restriction in SNIHostName But, I guess that is where it should be enforced if it is required. Michael. On 06/08/13 17:43, Dmitry Samersoff wrote: Xuelei, . (dot) is perfectly valid domain name and it means root domain so com. is valid domain name as well. It thinks to me that in context of methods your change we should ignore
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Resolvers seem to accept queries using trailing dots. eg nslookup www.oracle.com. or InetAddress.getByName(www.oracle.com.); The part of RFC3490 quoted below seems to me to be saying that the empty label implied by the trailing dot is not regarded as a label so that you don't end up calling toAscii() or toUnicode() with an empty string. I don't think it's saying the trailing dot can't be there. Michael On 07/08/13 13:44, Xuelei Fan wrote: On 8/7/2013 12:06 AM, Matthew Hall wrote: Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. That's the first question we need to answer, whether IDN allow tailling dots (com.), zero-length root label (.), and zero-length label (, for example example..com)? Per the specification of IDN.toASCII(): === ToASCII operation can fail. ToASCII fails if any step of it fails. If ToASCII operation fails, an IllegalArgumentException will be thrown. In this case, the input string should not be used in an internationalized domain name. A label is an individual part of a domain name. The original ToASCII operation, as defined in RFC 3490, only operates on a single label. This method can handle both label and entire domain name, by assuming that labels in a domain name are always separated by dots. ... Throws IllegalArgumentException - if the input string doesn't conform to RFC 3490 specification Per the specification of RFC 3490: == [section 2] A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name www.example.com is composed of three labels: www, example, and com. (The zero-length root label described in [STD13], which can be explicit as in www.example.com. or implicit as in www.example.com, is not considered a label in this specification.) An internationalized label is a label to which the ToASCII operation (see section 4) can be applied without failing (with the UseSTD3ASCIIRules flag unset). ... Although most Unicode characters can appear in internationalized labels, ToASCII will fail for some input strings, and such strings are not valid internationalized labels. An internationalized domain name (IDN) is a domain name in which every label is an internationalized label. [Section 4.1] ToASCII consists of the following steps: ... 8. Verify that the number of code points is in the range 1 to 63 inclusive. Here are the questions: 1. whether example..com is an valid IDN? As dot is used as label separators, there are three labels, example, , com. Per RFC 3490, is not a valid label. Hence, example..com is not a valid IDN. We need to address the issue in IDN. 2. whether xyz. is an valid IDN? It's an gray area, I think. We can treat the trailing . as root label, or a label separator. If the trailing . is treated as label separator, xyz. is invalid per RFC 3490. if the trailing . is treated as root label, what's the expected return value of IDN.toASCII(xyz.)? I think the return value can be either xyz. or xyz. The current implementation returns xyz. We may need not to update the implementation if tailing . is treated as root label. 3. whether . is an valid IDN? It's an gray area again, I think. As above, if the trailing . is treated as root label, I think the return value can be either . or . The current implementation throws a StringIndexOutOfBoundsException. However, what empty domain name () really means? I would prefer to return . for . instead. We need to address the issue in IDN. Here comes the solution, the IDN.toASCII() returns: 1. . for .; 2. xyz for xyz.; 3. IAE for example..com. Does it make sense? Thanks, Xuelei On 8/7/2013 1:35 AM, Michael McMahon wrote: I don't really understand the reason for the restriction in SNIHostName But, I guess that is where it should be enforced if it is required. Michael. On 06/08/13 17:43, Dmitry Samersoff wrote: Xuelei, . (dot) is perfectly valid domain name and it means root domain so com. is valid domain name as well. It thinks to me that in context of methods your change we should ignore trailing dots, rather than throw exception. -Dmitry On 2013-08-06 15:44, Xuelei Fan wrote: Hi, Please review the bug fix to strict the illegal input checking in IDN. webrev: http://cr.openjdk.java.net./~xuelei/8020842/webrev.00/ Here is two test cases, which are expected to get IAE. Case 1: String host = IDN.toASCII(., IDN.USE_STD3_ASCII_RULES); Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.StringBuffer.charAt(StringBuffer.java:204) at java.net.IDN.toASCIIInternal(IDN.java:279) at java.net.IDN.toASCII(IDN.java:118) Case 2: String host = IDN.toASCII(com.,
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On 8/7/2013 10:05 PM, Michael McMahon wrote: Resolvers seem to accept queries using trailing dots. eg nslookup www.oracle.com. or InetAddress.getByName(www.oracle.com.); The part of RFC3490 quoted below seems to me to be saying that the empty label implied by the trailing dot is not regarded as a label so that you don't end up calling toAscii() or toUnicode() with an empty string. I don't think it's saying the trailing dot can't be there. It makes sense. What's your preference to return for IDN.toASCII(www.oracle.com.), www.oracle.com. or www.oracle.com? The current returned value is www.oracle.com. I would like to reserve the behavior in this update. I think we are on same page soon. Thanks, Xuelei Michael On 07/08/13 13:44, Xuelei Fan wrote: On 8/7/2013 12:06 AM, Matthew Hall wrote: Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. That's the first question we need to answer, whether IDN allow tailling dots (com.), zero-length root label (.), and zero-length label (, for example example..com)? Per the specification of IDN.toASCII(): === ToASCII operation can fail. ToASCII fails if any step of it fails. If ToASCII operation fails, an IllegalArgumentException will be thrown. In this case, the input string should not be used in an internationalized domain name. A label is an individual part of a domain name. The original ToASCII operation, as defined in RFC 3490, only operates on a single label. This method can handle both label and entire domain name, by assuming that labels in a domain name are always separated by dots. ... Throws IllegalArgumentException - if the input string doesn't conform to RFC 3490 specification Per the specification of RFC 3490: == [section 2] A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name www.example.com is composed of three labels: www, example, and com. (The zero-length root label described in [STD13], which can be explicit as in www.example.com. or implicit as in www.example.com, is not considered a label in this specification.) An internationalized label is a label to which the ToASCII operation (see section 4) can be applied without failing (with the UseSTD3ASCIIRules flag unset). ... Although most Unicode characters can appear in internationalized labels, ToASCII will fail for some input strings, and such strings are not valid internationalized labels. An internationalized domain name (IDN) is a domain name in which every label is an internationalized label. [Section 4.1] ToASCII consists of the following steps: ... 8. Verify that the number of code points is in the range 1 to 63 inclusive. Here are the questions: 1. whether example..com is an valid IDN? As dot is used as label separators, there are three labels, example, , com. Per RFC 3490, is not a valid label. Hence, example..com is not a valid IDN. We need to address the issue in IDN. 2. whether xyz. is an valid IDN? It's an gray area, I think. We can treat the trailing . as root label, or a label separator. If the trailing . is treated as label separator, xyz. is invalid per RFC 3490. if the trailing . is treated as root label, what's the expected return value of IDN.toASCII(xyz.)? I think the return value can be either xyz. or xyz. The current implementation returns xyz. We may need not to update the implementation if tailing . is treated as root label. 3. whether . is an valid IDN? It's an gray area again, I think. As above, if the trailing . is treated as root label, I think the return value can be either . or . The current implementation throws a StringIndexOutOfBoundsException. However, what empty domain name () really means? I would prefer to return . for . instead. We need to address the issue in IDN. Here comes the solution, the IDN.toASCII() returns: 1. . for .; 2. xyz for xyz.; 3. IAE for example..com. Does it make sense? Thanks, Xuelei On 8/7/2013 1:35 AM, Michael McMahon wrote: I don't really understand the reason for the restriction in SNIHostName But, I guess that is where it should be enforced if it is required. Michael. On 06/08/13 17:43, Dmitry Samersoff wrote: Xuelei, . (dot) is perfectly valid domain name and it means root domain so com. is valid domain name as well. It thinks to me that in context of methods your change we should ignore trailing dots, rather than throw exception. -Dmitry On 2013-08-06 15:44, Xuelei Fan wrote: Hi, Please review the bug fix to strict the illegal input checking in IDN. webrev: http://cr.openjdk.java.net./~xuelei/8020842/webrev.00/ Here is two test cases, which
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On 07/08/13 15:13, Xuelei Fan wrote: On 8/7/2013 10:05 PM, Michael McMahon wrote: Resolvers seem to accept queries using trailing dots. eg nslookup www.oracle.com. or InetAddress.getByName(www.oracle.com.); The part of RFC3490 quoted below seems to me to be saying that the empty label implied by the trailing dot is not regarded as a label so that you don't end up calling toAscii() or toUnicode() with an empty string. I don't think it's saying the trailing dot can't be there. It makes sense. What's your preference to return for IDN.toASCII(www.oracle.com.), www.oracle.com. or www.oracle.com? The current returned value is www.oracle.com. I would like to reserve the behavior in this update. My opinion is to keep it as at present ie. www.oracle.com. Michael I think we are on same page soon. Thanks, Xuelei Michael On 07/08/13 13:44, Xuelei Fan wrote: On 8/7/2013 12:06 AM, Matthew Hall wrote: Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. That's the first question we need to answer, whether IDN allow tailling dots (com.), zero-length root label (.), and zero-length label (, for example example..com)? Per the specification of IDN.toASCII(): === ToASCII operation can fail. ToASCII fails if any step of it fails. If ToASCII operation fails, an IllegalArgumentException will be thrown. In this case, the input string should not be used in an internationalized domain name. A label is an individual part of a domain name. The original ToASCII operation, as defined in RFC 3490, only operates on a single label. This method can handle both label and entire domain name, by assuming that labels in a domain name are always separated by dots. ... Throws IllegalArgumentException - if the input string doesn't conform to RFC 3490 specification Per the specification of RFC 3490: == [section 2] A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name www.example.com is composed of three labels: www, example, and com. (The zero-length root label described in [STD13], which can be explicit as in www.example.com. or implicit as in www.example.com, is not considered a label in this specification.) An internationalized label is a label to which the ToASCII operation (see section 4) can be applied without failing (with the UseSTD3ASCIIRules flag unset). ... Although most Unicode characters can appear in internationalized labels, ToASCII will fail for some input strings, and such strings are not valid internationalized labels. An internationalized domain name (IDN) is a domain name in which every label is an internationalized label. [Section 4.1] ToASCII consists of the following steps: ... 8. Verify that the number of code points is in the range 1 to 63 inclusive. Here are the questions: 1. whether example..com is an valid IDN? As dot is used as label separators, there are three labels, example, , com. Per RFC 3490, is not a valid label. Hence, example..com is not a valid IDN. We need to address the issue in IDN. 2. whether xyz. is an valid IDN? It's an gray area, I think. We can treat the trailing . as root label, or a label separator. If the trailing . is treated as label separator, xyz. is invalid per RFC 3490. if the trailing . is treated as root label, what's the expected return value of IDN.toASCII(xyz.)? I think the return value can be either xyz. or xyz. The current implementation returns xyz. We may need not to update the implementation if tailing . is treated as root label. 3. whether . is an valid IDN? It's an gray area again, I think. As above, if the trailing . is treated as root label, I think the return value can be either . or . The current implementation throws a StringIndexOutOfBoundsException. However, what empty domain name () really means? I would prefer to return . for . instead. We need to address the issue in IDN. Here comes the solution, the IDN.toASCII() returns: 1. . for .; 2. xyz for xyz.; 3. IAE for example..com. Does it make sense? Thanks, Xuelei On 8/7/2013 1:35 AM, Michael McMahon wrote: I don't really understand the reason for the restriction in SNIHostName But, I guess that is where it should be enforced if it is required. Michael. On 06/08/13 17:43, Dmitry Samersoff wrote: Xuelei, . (dot) is perfectly valid domain name and it means root domain so com. is valid domain name as well. It thinks to me that in context of methods your change we should ignore trailing dots, rather than throw exception. -Dmitry On 2013-08-06 15:44, Xuelei Fan wrote: Hi, Please review the bug fix to strict the illegal input checking in IDN. webrev:
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Please review the new update: http://cr.openjdk.java.net./~xuelei/8020842/webrev.01/ With this update, com. is valid (return com.); . and example..com are invalid. And IAE will be thrown for invalid IDN. Thanks, Xuelei On 8/7/2013 10:18 PM, Michael McMahon wrote: On 07/08/13 15:13, Xuelei Fan wrote: On 8/7/2013 10:05 PM, Michael McMahon wrote: Resolvers seem to accept queries using trailing dots. eg nslookup www.oracle.com. or InetAddress.getByName(www.oracle.com.); The part of RFC3490 quoted below seems to me to be saying that the empty label implied by the trailing dot is not regarded as a label so that you don't end up calling toAscii() or toUnicode() with an empty string. I don't think it's saying the trailing dot can't be there. It makes sense. What's your preference to return for IDN.toASCII(www.oracle.com.), www.oracle.com. or www.oracle.com? The current returned value is www.oracle.com. I would like to reserve the behavior in this update. My opinion is to keep it as at present ie. www.oracle.com. Michael I think we are on same page soon. Thanks, Xuelei Michael On 07/08/13 13:44, Xuelei Fan wrote: On 8/7/2013 12:06 AM, Matthew Hall wrote: Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. That's the first question we need to answer, whether IDN allow tailling dots (com.), zero-length root label (.), and zero-length label (, for example example..com)? Per the specification of IDN.toASCII(): === ToASCII operation can fail. ToASCII fails if any step of it fails. If ToASCII operation fails, an IllegalArgumentException will be thrown. In this case, the input string should not be used in an internationalized domain name. A label is an individual part of a domain name. The original ToASCII operation, as defined in RFC 3490, only operates on a single label. This method can handle both label and entire domain name, by assuming that labels in a domain name are always separated by dots. ... Throws IllegalArgumentException - if the input string doesn't conform to RFC 3490 specification Per the specification of RFC 3490: == [section 2] A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name www.example.com is composed of three labels: www, example, and com. (The zero-length root label described in [STD13], which can be explicit as in www.example.com. or implicit as in www.example.com, is not considered a label in this specification.) An internationalized label is a label to which the ToASCII operation (see section 4) can be applied without failing (with the UseSTD3ASCIIRules flag unset). ... Although most Unicode characters can appear in internationalized labels, ToASCII will fail for some input strings, and such strings are not valid internationalized labels. An internationalized domain name (IDN) is a domain name in which every label is an internationalized label. [Section 4.1] ToASCII consists of the following steps: ... 8. Verify that the number of code points is in the range 1 to 63 inclusive. Here are the questions: 1. whether example..com is an valid IDN? As dot is used as label separators, there are three labels, example, , com. Per RFC 3490, is not a valid label. Hence, example..com is not a valid IDN. We need to address the issue in IDN. 2. whether xyz. is an valid IDN? It's an gray area, I think. We can treat the trailing . as root label, or a label separator. If the trailing . is treated as label separator, xyz. is invalid per RFC 3490. if the trailing . is treated as root label, what's the expected return value of IDN.toASCII(xyz.)? I think the return value can be either xyz. or xyz. The current implementation returns xyz. We may need not to update the implementation if tailing . is treated as root label. 3. whether . is an valid IDN? It's an gray area again, I think. As above, if the trailing . is treated as root label, I think the return value can be either . or . The current implementation throws a StringIndexOutOfBoundsException. However, what empty domain name () really means? I would prefer to return . for . instead. We need to address the issue in IDN. Here comes the solution, the IDN.toASCII() returns: 1. . for .; 2. xyz for xyz.; 3. IAE for example..com. Does it make sense? Thanks, Xuelei On 8/7/2013 1:35 AM, Michael McMahon wrote: I don't really understand the reason for the restriction in SNIHostName But, I guess that is where it should be enforced if it is required. Michael. On 06/08/13 17:43, Dmitry Samersoff wrote: Xuelei, . (dot) is perfectly valid domain
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
I am not sure if IDN.java is the correct place to change. At least I've seen trailing dots in DNS entries. So maybe it's not so illegal. --Max On 8/6/13 7:44 PM, Xuelei Fan wrote: Hi, Please review the bug fix to strict the illegal input checking in IDN. webrev: http://cr.openjdk.java.net./~xuelei/8020842/webrev.00/ Here is two test cases, which are expected to get IAE. Case 1: String host = IDN.toASCII(., IDN.USE_STD3_ASCII_RULES); Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.StringBuffer.charAt(StringBuffer.java:204) at java.net.IDN.toASCIIInternal(IDN.java:279) at java.net.IDN.toASCII(IDN.java:118) Case 2: String host = IDN.toASCII(com., IDN.USE_STD3_ASCII_RULES); Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
On Aug 6, 2013, at 23:08, Weijun Wang weijun.w...@oracle.com wrote: I am not sure if IDN.java is the correct place to change. At least I've seen trailing dots in DNS entries. So maybe it's not so illegal. Per RFC 1034, a domain name cannot end with dot. I will check other related specifications. What's the case you saw with trailing dots? Thanks, Xuelei --Max On 8/6/13 7:44 PM, Xuelei Fan wrote: Hi, Please review the bug fix to strict the illegal input checking in IDN. webrev: http://cr.openjdk.java.net./~xuelei/8020842/webrev.00/ Here is two test cases, which are expected to get IAE. Case 1: String host = IDN.toASCII(., IDN.USE_STD3_ASCII_RULES); Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.StringBuffer.charAt(StringBuffer.java:204) at java.net.IDN.toASCIIInternal(IDN.java:279) at java.net.IDN.toASCII(IDN.java:118) Case 2: String host = IDN.toASCII(com., IDN.USE_STD3_ASCII_RULES); Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. Matthew. -- Sent from my mobile device. Weijun Wang weijun.w...@oracle.com wrote: I am not sure if IDN.java is the correct place to change. At least I've seen trailing dots in DNS entries. So maybe it's not so illegal. --Max On 8/6/13 7:44 PM, Xuelei Fan wrote: Hi, Please review the bug fix to strict the illegal input checking in IDN. webrev: http://cr.openjdk.java.net./~xuelei/8020842/webrev.00/ Here is two test cases, which are expected to get IAE. Case 1: String host = IDN.toASCII(., IDN.USE_STD3_ASCII_RULES); Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.StringBuffer.charAt(StringBuffer.java:204) at java.net.IDN.toASCIIInternal(IDN.java:279) at java.net.IDN.toASCII(IDN.java:118) Case 2: String host = IDN.toASCII(com., IDN.USE_STD3_ASCII_RULES); Thanks, Xuelei
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Take a look here for more clarity: http://en.wikipedia.org/wiki/Fully_qualified_domain_name -- Sent from my mobile device. Matthew Hall mh...@mhcomputing.net wrote: Trailing dots are allowed in plain DNS (thus almost surely in IDN), and the single dot represents the root zone. So you have to be careful making this sort of change to check the DNS RFCs first. Matthew.
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
Xuelei, . (dot) is perfectly valid domain name and it means root domain so com. is valid domain name as well. It thinks to me that in context of methods your change we should ignore trailing dots, rather than throw exception. -Dmitry On 2013-08-06 15:44, Xuelei Fan wrote: Hi, Please review the bug fix to strict the illegal input checking in IDN. webrev: http://cr.openjdk.java.net./~xuelei/8020842/webrev.00/ Here is two test cases, which are expected to get IAE. Case 1: String host = IDN.toASCII(., IDN.USE_STD3_ASCII_RULES); Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.StringBuffer.charAt(StringBuffer.java:204) at java.net.IDN.toASCIIInternal(IDN.java:279) at java.net.IDN.toASCII(IDN.java:118) Case 2: String host = IDN.toASCII(com., IDN.USE_STD3_ASCII_RULES); Thanks, Xuelei -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources.
Re: Code review request, 8020842 IDN do not throw IAE when hostname ends with a trailing dot
I don't really understand the reason for the restriction in SNIHostName But, I guess that is where it should be enforced if it is required. Michael. On 06/08/13 17:43, Dmitry Samersoff wrote: Xuelei, . (dot) is perfectly valid domain name and it means root domain so com. is valid domain name as well. It thinks to me that in context of methods your change we should ignore trailing dots, rather than throw exception. -Dmitry On 2013-08-06 15:44, Xuelei Fan wrote: Hi, Please review the bug fix to strict the illegal input checking in IDN. webrev: http://cr.openjdk.java.net./~xuelei/8020842/webrev.00/ Here is two test cases, which are expected to get IAE. Case 1: String host = IDN.toASCII(., IDN.USE_STD3_ASCII_RULES); Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.StringBuffer.charAt(StringBuffer.java:204) at java.net.IDN.toASCIIInternal(IDN.java:279) at java.net.IDN.toASCII(IDN.java:118) Case 2: String host = IDN.toASCII(com., IDN.USE_STD3_ASCII_RULES); Thanks, Xuelei