Hi Paul,

I agree, it looked quite messy, isn't it :), keeping a hack that generates encoded url and yet not support IPv6 addresses that contain international characters?

It IS a more preferable solution to remove the hack of using escapeNonUSAscii -- so I'll do :)

Thanks,
Joe

On 7/10/2012 12:26 PM, Paul Sandoz wrote:
Hi Joe,

If you are gonna fix things to support IPv6 addresses in URLs i really think 
you need to make it work for URLs with international characters too.

On Jul 10, 2012, at 8:50 PM, Joe Wang wrote:
  602         if (reader == null) {
  603             stream = xmlInputSource.getByteStream();
  604             if (stream == null) {
  605                 URL location = new 
URL(escapeNonUSAscii(expandedSystemId));
  606                 URLConnection connect = location.openConnection();
  607                 if (!(connect instanceof HttpURLConnection)) {
  608                     stream = connect.getInputStream();
  609                 }

If this is really about supporting non-percent encoded international characters 
in the system ID, then you can make a simple fix to support IPv6-based URLs in 
general: do not percent encoded *any* ascii characters.
When encoding an url, aren't reserved characters supposed to be encoded as well?

Your fix does not do that:

2608     protected static String escapeNonUSAscii(String str) {
2609         if (str == null) {
2610             return str;
2611         }
2612         int len = str.length(), i=0, ch;
2613         for (; i<  len; i++) {
2614             ch = str.charAt(i);
2615             // if it's not an ASCII 7 character, break here, and use UTF-8 
encoding
2616             if (ch>= 128)
2617                 break;
2618         }
2619
2620         // we saw no non-ascii-7 character
2621         if (i == len) {
2622             return str;       //<--- reserved characters are not 
percent-encoded
2623         }
2624

I know it is deliberate attempt to avoid '[' and ']' characters being percent 
encoded, whether the URL has an IPv6 address or not, even though '[' and ']' 
are reserved characters outside of an IPv6 address .

My point is if you don't encode reserved ASCII characters when all the characters in the 
URL are ASCII characters then don't encode reserved ASCII characters when there are also 
international characters present that will be encoded. Then "escapeNonUSAscii" 
actually does what the method is called :-)

Otherwise the other, more preferable solution IMHO, is to remove the hack of 
using escapeNonUSAscii.

Paul.

Reply via email to