[jira] [Commented] (LUCENE-10447) Charset issue in TestScripts#testLukeCanBeLaunched()

Dawid Weiss (Jira) Tue, 01 Mar 2022 01:53:06 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499420#comment-17499420
 ]


Dawid Weiss commented on LUCENE-10447:
--------------------------------------

Sorry, you're right - we don't:
{code:java}
  private static Supplier<Charset> forkedProcessCharset =
      () -> {
        // The default charset for a forked java process could be computed for 
the current
        // platform but it adds more complexity. For now, assume it's just 
parseable ascii.
        return StandardCharsets.US_ASCII;
      };  {code}
I see the following options:

1) try to detect the default charset (not easy but doable),

2) use the test JVM's default charset, assuming it's the same as the forked 
JVM's charset (we'd have to force it for the subprocess because we randomize 
tests),

3) use byte-identity-mapping codepage like iso8859-1 which never fails on any 
bytes.

Given the complexity of (1) perhaps (2) would be better? Explicitly pass the 
charset Luke should run with.

> Charset issue in TestScripts#testLukeCanBeLaunched()
> ----------------------------------------------------
>
>                 Key: LUCENE-10447
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10447
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: luke
>            Reporter: Lu Xugang
>            Assignee: Dawid Weiss
>            Priority: Major
>         Attachments: 1.png, process-10536545874299101128.out
>
>
> When run TestScripts#testLukeCanBeLaunched(), a temp file will be created in 
> the path of lucene/distribution.tests/build/tmp/tests-tmp/process-*.out, this 
> process-*.out file may contains some non StandardCharsets.US_ASCII content 
> base on Operating System language, and then a Exception will be throw because 
> later the test will read this temp file with StandardCharsets.US_ASCII.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10447) Charset issue in TestScripts#testLukeCanBeLaunched()

Reply via email to