Advokat created SOLR-11773:
------------------------------

             Summary: configurable language config for tesseract ocr
                 Key: SOLR-11773
                 URL: https://issues.apache.org/jira/browse/SOLR-11773
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
    Affects Versions: 7.1
            Reporter: Advokat
            Priority: Minor


Currently to change the language for tesseract I have to manipulate the 
\org\apache\tika\parser\ocr\TesseractOCRConfig.properties in 
tika-parsers-1.16.jar.

There is no possibility to set the language in solrconfig.xml or on each 
request to the ExtractingRequestHandler.

If someone has documents with different languages its impossible to configure 
this. Tesseract will not work as good as it could with correct set language.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to