stevedlawrence commented on code in PR #1236:
URL: https://github.com/apache/daffodil/pull/1236#discussion_r1596615336
##########
.github/workflows/main.yml:
##########
@@ -39,7 +39,7 @@ jobs:
fail-fast: false
matrix:
java_distribution: [ temurin ]
- java_version: [ 8, 11, 17 ]
+ java_version: [ 8, 11, 17, 21 ]
Review Comment:
I think I've tracked down the issue and solution. Patch to fix it at the end
of this comment.
In the CLI we are good about specifying a UTF-8 encoding when we create
things like a
[PrintStream](https://docs.oracle.com/javase/8/docs/api/java/io/PrintStream.html).
UTF-8 is needed because our CLI debugger can output special characters and
some tests look for those characters. For example, non-integration threaded CLI
tests do this to create the `PrintSteam`'s for stdin/stdout:
```scala
val psOut = new PrintStream(out, false, StandardCharsets.UTF_8.name)
val psErr = new PrintStream(err, false, StandardCharsets.UTF_8.name)
```
If we didn't specify the encoding here Java uses the value of the
`file.encoding` property for these `PrintStream`'s.
But that's only in the threaded case. In the case where we fork tests like
in these failing integration tests, Java is responsible for creating the
stdin/stdout `PrintStream`'s. And because Java does this and not our code, it
just uses the `PrintStream` constructor that gets the encoding from
`file.encoding`.
So for our integration tests, we must provide `file.encoding` to ensure our
forked processes have stdin/out with the correct UTF-8 encodings. And we do
currently do that here:
https://github.com/apache/daffodil/blob/main/daffodil-test-integration/src/test/scala/org/apache/daffodil/cliTest/TestCLIDebugger.scala#L61
However, Java 18 changed things with
[JEP-400](https://openjdk.org/jeps/400), which made it so Java always uses
UTF-8 when creating `PrintStream`'s (and other similar classes) when an
encoding is not specified. It explicitly says this change should not affect
`System.err` and `System.out`, but Java 18 does actually change the behavior,
with their encodings now coming from the `sun.stdout.encoding` and
`sun.stderr.encoding` properties instead of `file.encoding`. And then Java 19+
changed it again so that the encoding comes from the `stdout.encoding` and
`stderr.encoding` properties.
Note that on GitHub windows CI VMs, these properties are set to CP1252
encoding, which cause these tests to fail since they need UTF-8. So all we need
to do is set these properties to UTF-8 for our debugger tests and it should fix
things. Here's patch:
```diff
---
a/daffodil-test-integration/src/test/scala/org/apache/daffodil/cliTest/TestCLIDebugger.scala
+++
b/daffodil-test-integration/src/test/scala/org/apache/daffodil/cliTest/TestCLIDebugger.scala
@@ -58,7 +58,9 @@ class TestCLIDebugger {
"-Dorg.jline.terminal.provider=dumb",
"-Dorg.jline.terminal.dumb=true",
"-Dorg.jline.terminal.dumb.color=false",
- "-Dfile.encoding=UTF-8"
+ "-Dfile.encoding=UTF-8",
+ "-Dstderr.encoding=UTF-8",
+ "-Dstdout.encoding=UTF-8"
)
val envs = Map(
```
We still need to specify `file.encoding` for Java 17 and older, and we now
need `stderr.encoding` and `stdout.encoding` for Java 19 and newer. I don't
think we should worry about supporting Java 18 since it's EOL.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]