[jira] (SUREFIRE-1137) Problem with Umlauts in stdout

Andreas Gudian (JIRA) Sun, 01 Feb 2015 02:46:13 -0800

    [ 
https://jira.codehaus.org/browse/SUREFIRE-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=362273#comment-362273
 ]


Andreas Gudian commented on SUREFIRE-1137:
------------------------------------------

I should have answered yesterday night already: I was able to reproduce the 
problem on my local Windows machine by encoding the test java file as UTF-8 and 
using UTF-8 in the pom. Stacktraces and error messages are correctly encoded in 
the output XML, but the sysout doesn't survive the journey, just as Jürgen 
describes.

My main maven process has {{Charset.defaultCharset()}} being my windows-1252, 
whereas the forked VM has {{Charset.defaultCharset()}} UTF-8. The current 
implementation relies on the default charset being the same on both the main 
process and the forked process, hence the encoding garbage.

* if I don't pass file.encoding to the forked VM, then the forked VM also uses 
windows-1252
* If I pass -Dfile.encoding=UTF-8 in the MAVEN_OPTS to the main process, then 
System.getProperty("file.encoding") is "UTF-8", but 
{{Charset.defaultCharset()}} _remains being windows-1252_ - I was not able to 
manipulate the defaultCharset of the main process with a system property. 

But the documentation is quite clear on that: you're not supposed to change the 
defaultCharset by using file.encoding, but instead change the system's locale / 
language settings. Meh.

I'm not really sure yet what to make of this. I could pass the fork's 
defaultCharset back to the main process to properly recode the stream into 
UTF-8. I could pass the main's defaultCharset to the fork to use that one for 
encoding the String in PrintSteam's print(String) method (although that may 
cause strange side-effects with other ways how to use that print stream). Or I 
could convert any print stream activity in the fork to UTF-16 (although not 
every charset can transform all its characters to UTF-16 and then again back 
from UTF-16, which is why I tried to rely on the defaultEncoding in the first 
place)... 
So I might go with the first option, but I still need to think about it (to see 
if it really is the right thing to do). 

If you guys have an idea here, let me know.

> Problem with Umlauts in stdout
> ------------------------------
>
>                 Key: SUREFIRE-1137
>                 URL: https://jira.codehaus.org/browse/SUREFIRE-1137
>             Project: Maven Surefire
>          Issue Type: Bug
>          Components: Maven Surefire Plugin
>    Affects Versions: 2.18
>         Environment: Linux
>            Reporter: Jürgen Zeller
>            Assignee: Andreas Gudian
>         Attachments: surefire-test.zip
>
>
> When using Cp1252 as file encoding, the generated Surefire stdout report 
> contains invalid characters when run on Linux. When running the same test on 
> Windows, everything is fine.
> A simular Problem was reported in SUREFIRE-998



--
This message was sent by Atlassian JIRA
(v6.1.6#6162)

[jira] (SUREFIRE-1137) Problem with Umlauts in stdout

Reply via email to