Tim Julien wrote:
> 
> For example, suppose I want to produce this URL:
> 
> http://foo.com/bar?a=b&c=jon%26doe
> 
> // %26 is the encoded value of &
> // %25 is the encoded value of %
> 
> uri = new URI("http", null, "foo.com", -1, "/bar", "a=b&c=jon%26doe",
> null);
> uri.toASCIIString() -> http://foo.com/bar?a=b&c=jon%2526doe
> 
> // java.net.URI encodes the incoming "%" as %25

which is by the way incorrect. It should be + according to W3C standards. That's
because this name/value encoding is an HTML thing, that is not directly
specified by the URI spec, but by the W3C (being the HTML authority):

http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

That W3C spec just references the URI spec for all other characters than space.

NB: URI encoding and HTML (!= HTTP) query string encoding are two DISTINCT
algorithms that are not to be confused. They are almost equal, but not
completely. Note that a URI encoded string can be decoded with a HTML query
decoder, but not vice versa (URI decoder does [should] not know how to decode a 
+).

Please also note that the W3C defines that & and ; are BOTH to be treated as
separators in HTML query strings:

http://www.w3.org/TR/html4/appendix/notes.html#ampersands-in-uris

> uri = new URI("http", null, "foo.com", -1, "/bar", "a=b&c=jon&doe", null);
> uri.toASCIIString() -> http://foo.com/bar?a=b&c=jon&doe
> 
> // java.net.URI has no way of knowing that the un-escaped "&" is
> //actually a value in the URI

Tim, the API Doc of the multi-arg constructor says: "Any character that is not a
legal URI character is quoted." Obviously the implementation does not do that
correctly: it does not escape the ampersand and equals characters. Instead it
tries to be "smart" - which leads to this nonsense behaviour that you're 
observing.

Just accept that the multi-arg constructor is broken and don't use it.

NB: Whenever you have a HTML query string, that is a list of names and values
separated by = and &, the names AND values MUST HAVE already be encoded. You CAN
NOT apply the encoding afterwards in general anymore in an unambigous way. You
can only apply the encoding as long as you have the names and values separately
(in a HashMap for instance).

Odi

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to