stevedlawrence commented on code in PR #1236:
URL: https://github.com/apache/daffodil/pull/1236#discussion_r1596615336


##########
.github/workflows/main.yml:
##########
@@ -39,7 +39,7 @@ jobs:
       fail-fast: false
       matrix:
         java_distribution: [ temurin ]
-        java_version: [ 8, 11, 17 ]
+        java_version: [ 8, 11, 17, 21 ]

Review Comment:
   I think I've tracked down the issue and solution. Patch to fix it at the end 
of this comment.
   
   In the CLI we are good about specifying a UTF-8 encoding when we create 
things like a 
[PrintStream](https://docs.oracle.com/javase/8/docs/api/java/io/PrintStream.html).
 UTF-8 is needed because our CLI debugger can output special characters and 
some tests look for those characters. For example, non-integration threaded CLI 
tests do this to create the `PrintSteam`'s for stdin/stdout:
   
   ```scala
   val psOut = new PrintStream(out, false, StandardCharsets.UTF_8.name)
   val psErr = new PrintStream(err, false, StandardCharsets.UTF_8.name)
   ```
   
   If we didn't specify the encoding here Java uses the value of the 
`file.encoding` property for these `PrintStream`'s.
   
   But that's only in the threaded case. In the case where we fork tests like 
in these failing integration tests, Java is responsible for creating the 
stdin/stdout `PrintStream`'s. And because Java does this and not our code, it 
just uses the `PrintStream` constructor that gets the encoding from 
`file.encoding`.
   
   So for our integration tests, we must provide `file.encoding` to ensure our 
forked processes have stdin/out with the correct UTF-8 encodings. And we do 
currently do that here:
   
   
https://github.com/apache/daffodil/blob/main/daffodil-test-integration/src/test/scala/org/apache/daffodil/cliTest/TestCLIDebugger.scala#L61
   
   However, Java 18 changed things with 
[JEP-400](https://openjdk.org/jeps/400), which made it so Java always uses 
UTF-8 when creating `PrintStream`'s (and other similar classes) when an 
encoding is not specified. It explicitly says this change should not affect 
`System.err` and `System.out`, but Java 18 does actually change the behavior, 
with their encodings now coming from the `sun.stdout.encoding` and 
`sun.stderr.encoding` properties instead of `file.encoding`. And then Java 19+ 
changed it again so that the encoding comes from the `stdout.encoding` and 
`stderr.encoding` properties.
   
   Note that on GitHub windows CI VMs, these properties are set to CP1252 
encoding, which cause these tests to fail since they need UTF-8. So all we need 
to do is set these properties to UTF-8 for our debugger tests and it should fix 
things. Here's  patch: 
   
   ```diff
   --- 
a/daffodil-test-integration/src/test/scala/org/apache/daffodil/cliTest/TestCLIDebugger.scala
   +++ 
b/daffodil-test-integration/src/test/scala/org/apache/daffodil/cliTest/TestCLIDebugger.scala
   @@ -58,7 +58,9 @@ class TestCLIDebugger {
        "-Dorg.jline.terminal.provider=dumb",
        "-Dorg.jline.terminal.dumb=true",
        "-Dorg.jline.terminal.dumb.color=false",
   -    "-Dfile.encoding=UTF-8"
   +    "-Dfile.encoding=UTF-8",
   +    "-Dstderr.encoding=UTF-8",
   +    "-Dstdout.encoding=UTF-8"
      )
    
      val envs = Map(
   ```
   
   We still need to specify `file.encoding` for Java 17 and older, and we now 
need `stderr.encoding` and `stdout.encoding` for Java 19 and newer. I don't 
think we should worry about supporting Java 18 since it's EOL.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to