Matthias Keller created HTTPCLIENT-2029:
-------------------------------------------

             Summary: URIBuilder cannot parse non-UTF8 URIs
                 Key: HTTPCLIENT-2029
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2029
             Project: HttpComponents HttpClient
          Issue Type: Bug
    Affects Versions: 4.5.10
            Reporter: Matthias Keller


URIBuilder always parses a given URI using UTF-8. For example given the 
following URI that still uses latin1:

{color:#008000}http://host/?x=%E4
{color}

%E4 is an enoded "ä" character in latin1.

{color:#000080}new 
{color}URIBuilder({color:#008000}"http://host/?x=%E4"{color}).setCharset({color:#660e7a}ISO_8859_1{color}).getQueryParams().get({color:#0000ff}0{color}).getValue()
 outputs {color:#808080}"{color}{color:#808080}�{color}{color:#808080}"{color}

This is because the URIBuilder constructor already parses the given URI and the 
charset is at this time always null, thus UTF-8 is used.

Proposed fix:
Provide overloaded constructors that also allow to specify the charset; for 
example:

{code}
    public URIBuilder(final String string, final Charset charset) throws 
URISyntaxException {
        this.charset = charset;
        digestURI(new URI(string));
    }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to