FYI, here's how you can create a list of all available text
encodings in the JVM you're running in.  This can lead to a
very long combo box, though :-)

    Map<String, Charset> charsetMap = Charset.availableCharsets();

--Thilo

On 5/18/2010 01:40, Jörn Kottmann (JIRA) wrote:
> 
>     [ 
> https://issues.apache.org/jira/browse/UIMA-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868448#action_12868448
>  ] 
> 
> Jörn Kottmann commented on UIMA-1782:
> -------------------------------------
> 
> There is now an option to specify the encoding of the text import files. It 
> is always preset to the default platform encoding. The combo box displays the 
> Java standard charsets (see here: 
> http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html).
> In case the user wants to use a non-standard Java charset (which usually are 
> there) he has to type in the name of the charset he wants to use, while the 
> name is typed in, it is validated if the charset is available and he can 
> proceed with the import, otherwise the "Apply"  button just remains disabled. 
> 
> It would be nice to add a warning to tell the user that the "Apply" button is 
> disable because of an invalid charset name or unsupported charset.
> 
>> Encoding of text files during import should be confugurable
>> -----------------------------------------------------------
>>
>>                 Key: UIMA-1782
>>                 URL: https://issues.apache.org/jira/browse/UIMA-1782
>>             Project: UIMA
>>          Issue Type: Improvement
>>          Components: CasEditor
>>    Affects Versions: 2.3
>>            Reporter: Thomas Hampp
>>            Assignee: Jörn Kottmann
>>             Fix For: 2.3.1
>>
>>
>> During import of text files into a corpus it seems to be impossible to 
>> control the encoding used. Looks like the default platform encoding is used 
>> (Latin 1 on Western Windows systems). The Eclipse default encoding settings 
>> for text files don't seem to affect import encoding. That makes it 
>> impossible to import documents with international characters in UTF8.
>> Ideally the encoding should be selectable in a drop down field in the import 
>> wizard.
> 

Reply via email to