[ 
https://issues.apache.org/jira/browse/UIMA-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868448#action_12868448
 ] 

Jörn Kottmann commented on UIMA-1782:
-------------------------------------

There is now an option to specify the encoding of the text import files. It is 
always preset to the default platform encoding. The combo box displays the Java 
standard charsets (see here: 
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html).
In case the user wants to use a non-standard Java charset (which usually are 
there) he has to type in the name of the charset he wants to use, while the 
name is typed in, it is validated if the charset is available and he can 
proceed with the import, otherwise the "Apply"  button just remains disabled. 

It would be nice to add a warning to tell the user that the "Apply" button is 
disable because of an invalid charset name or unsupported charset.

> Encoding of text files during import should be confugurable
> -----------------------------------------------------------
>
>                 Key: UIMA-1782
>                 URL: https://issues.apache.org/jira/browse/UIMA-1782
>             Project: UIMA
>          Issue Type: Improvement
>          Components: CasEditor
>    Affects Versions: 2.3
>            Reporter: Thomas Hampp
>            Assignee: Jörn Kottmann
>             Fix For: 2.3.1
>
>
> During import of text files into a corpus it seems to be impossible to 
> control the encoding used. Looks like the default platform encoding is used 
> (Latin 1 on Western Windows systems). The Eclipse default encoding settings 
> for text files don't seem to affect import encoding. That makes it impossible 
> to import documents with international characters in UTF8.
> Ideally the encoding should be selectable in a drop down field in the import 
> wizard.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to