Advokat created SOLR-11773: ------------------------------ Summary: configurable language config for tesseract ocr Key: SOLR-11773 URL: https://issues.apache.org/jira/browse/SOLR-11773 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 7.1 Reporter: Advokat Priority: Minor
Currently to change the language for tesseract I have to manipulate the \org\apache\tika\parser\ocr\TesseractOCRConfig.properties in tika-parsers-1.16.jar. There is no possibility to set the language in solrconfig.xml or on each request to the ExtractingRequestHandler. If someone has documents with different languages its impossible to configure this. Tesseract will not work as good as it could with correct set language. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org