[ 
https://issues.apache.org/jira/browse/TIKA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827387#comment-15827387
 ] 

Hudson commented on TIKA-2231:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1182 (See 
[https://builds.apache.org/job/Tika-trunk/1182/])
TIKA-2231: Improved param validation of TesseractOCRConfig.setLanguage() 
(graham: rev 5c51534a5731dba0ed22bc04b7da9d95adfb6f50)
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRConfigTest.java
TIKA-2231 -- update changes.txt. This closes #147 (tallison: rev 
c978a1195b0b910f523468d51d73e54caba535c0)
* (edit) CHANGES.txt


> Invalid language code exception
> -------------------------------
>
>                 Key: TIKA-2231
>                 URL: https://issues.apache.org/jira/browse/TIKA-2231
>             Project: Tika
>          Issue Type: Bug
>          Components: ocr
>    Affects Versions: 1.14
>            Reporter: Peter Weiss
>            Priority: Minor
>              Labels: beginner, easyfix, easytest, newbie
>             Fix For: 2.0, 1.15
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> There is a regex in TesseractOCRConfig.setLanguage(String language) which 
> attempts to validate the language being set.  Unfortunately it does not allow 
> you to set some languages that are valid for tesseract.
> For example:
> TesseractOCRConfig config = new TesseractOCRConfig();
> config.setLanguage("chi_tra");
> This throws an IllegalArgumentException because of the '_' in the language 
> name.  "chi_tra" is a valid tesseract language code.
> Need to update the regex to allow '_' character.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to