[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074493#comment-18074493
 ] 

ASF subversion and git services commented on HTTPCLIENT-2418:
-------------------------------------------------------------

Commit f75d2833e6d5f6127c13b5c6911bc07830bb9382 in httpcomponents-client's 
branch refs/heads/master from Arturo Bernal
[ https://gitbox.apache.org/repos/asf?p=httpcomponents-client.git;h=f75d2833e ]

HTTPCLIENT-2418 - Fix default charset handling in SimpleBody for JSON 
content.Use UTF-8 instead of US-ASCII when no charset parameter is present.


> Another case of invalid handling of charset content type parameter on the 
> Simple Async API
> ------------------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-2418
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2418
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (async)
>    Affects Versions: 5.4.1, 5.6
>         Environment: java 21 on macOS 26.4.1
>            Reporter: Gilles Compienne CFX
>            Priority: Minor
>             Fix For: 5.7-alpha1
>
>         Attachments: json-utf8-issue.zip
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hello,
> Ticket 2159 (https://issues.apache.org/jira/browse/HTTPCLIENT-2159) has 
> resolved several issues around the handling of the content type parameter, 
> but there is still one that I can see:
> If I have small JSON payload in the response body (and the content type is 
> set by the server to be "Content-Type: application/json") then it is still 
> currently decoding it using US-ASCII when it should have used UTF-8...
> It works fine when we create a JSON payload for a request (as the ContentType 
> class now specifies UTF-8 as being the default for "application/json") but 
> the code fails if the JSON payload is in the response.
> This is caused by the fact by the line `final Charset charset = (contentType 
> != null ? contentType : ContentType.DEFAULT_TEXT).getCharset();` in the 
> `getBodyText()` method of the `SimpleBody` class will set the `charset` 
> variable to null, which in turns causes StandardCharsets.US_ASCII to be used 
> in the line that follows, wrecking any Emojis or non-english symbols that 
> string could contain (which decodes the string).
> In my humble opinion, it is reasonable to be using SimpleBody (and the Simple 
> API in general) when we know we are expecting small payloads and all we are 
> doing is passing the "string" along to some other clients. In those cases, we 
> don't need the advanced capabilities of httpcomponents-jackson or similar (we 
> are not even parsing the JSON, just treating it as a string)...
> But we still need the 'getBodyText()' to handle the charset properly, even 
> when it is "assumed", and not wreck the string (or only "getBodyBytes()" 
> should be offered and 'getBodyText()' removed).
> I suspect an improved variant of that code would use the (currently 
> deprecated) `ContentType.getByMimeType()` method to find out if the mime type 
> is known and if it has an associated default encoding (and if so, use it if 
> the charset is not present).
> I have attached a code sample highlighting the problem in the associated ZIP 
> file.
> It can be run with maven:
> {noformat}
> mvn clean compile assembly:single
> java -jar 
> target/httpclient5-demo-1.0-SNAPSHOT-jar-with-dependencies.jar{noformat}
> And it will need a dummy server on localhost:12345 that returns a UTF-8 JSON 
> payload with some emojis or similar.
> If you have a tool like `dummyhttp` and a terminal console set to use UTF-8 
> then you can setup the dummy server with the following command:
> {noformat}
> dummyhttp -p 12345 -v -c 200 -b "{\"msg\": \"Test emoji 👋\"}" -H 
> Content-Type:application/json{noformat}
> Running the test client app will then cause this to appear (again assuming 
> your terminal is set to the UTF-8 locale):
>  
> {noformat}
> Fetching: http://localhost:12345/
> -----------------------------
> Status code : 200
> Reason      : OK
> Content-Type: application/json
> Body (via getBodyText):
> {"msg": "Test emoji ����"}
> Body (via proper UTF-8 decoding):
> {"msg": "Test emoji 👋"}
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to