[ https://issues.apache.org/jira/browse/LUCENE-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499420#comment-17499420 ]
Dawid Weiss commented on LUCENE-10447: -------------------------------------- Sorry, you're right - we don't: {code:java} private static Supplier<Charset> forkedProcessCharset = () -> { // The default charset for a forked java process could be computed for the current // platform but it adds more complexity. For now, assume it's just parseable ascii. return StandardCharsets.US_ASCII; }; {code} I see the following options: 1) try to detect the default charset (not easy but doable), 2) use the test JVM's default charset, assuming it's the same as the forked JVM's charset (we'd have to force it for the subprocess because we randomize tests), 3) use byte-identity-mapping codepage like iso8859-1 which never fails on any bytes. Given the complexity of (1) perhaps (2) would be better? Explicitly pass the charset Luke should run with. > Charset issue in TestScripts#testLukeCanBeLaunched() > ---------------------------------------------------- > > Key: LUCENE-10447 > URL: https://issues.apache.org/jira/browse/LUCENE-10447 > Project: Lucene - Core > Issue Type: Bug > Components: luke > Reporter: Lu Xugang > Assignee: Dawid Weiss > Priority: Major > Attachments: 1.png, process-10536545874299101128.out > > > When run TestScripts#testLukeCanBeLaunched(), a temp file will be created in > the path of lucene/distribution.tests/build/tmp/tests-tmp/process-*.out, this > process-*.out file may contains some non StandardCharsets.US_ASCII content > base on Operating System language, and then a Exception will be throw because > later the test will read this temp file with StandardCharsets.US_ASCII. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org