I poked around at other parsers for Tika that require additional installation 
steps to see how they warn the user, like the GrobidNERecogniser class...   It 
turns out the way that is handled is by NOT having a unit test at all ;-(

 

> On Aug 20, 2019, at 10:46 AM, Eric Pugh <ep...@opensourceconnections.com> 
> wrote:
> 
> In order to get the TesseractOCRParserTest to run, having installed Tesseract 
> on OSX using “brew install tesseract”, I had to be explicit about the paths.
> 
> Any thoughts on how we could convey to a user that they might need to tweak 
> the path to run the unit tests?  I was thinking about adding some sort of 
> messaging, but I don’t know if that is a pattern that we have in Tika with 
> these external dependencies?
> 
> Thoughts?
> 
> diff --git 
> a/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
>  
> b/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
> index 9ebcee068..32db2c442 100644
> --- 
> a/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
> +++ 
> b/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
> @@ -51,6 +51,7 @@ public class TesseractOCRParserTest extends TikaTest {
>  
>      public static boolean canRun() {
>          TesseractOCRConfig config = new TesseractOCRConfig();
> +        config.setTesseractPath("/usr/local/bin");
>          TesseractOCRParserTest tesseractOCRTest = new 
> TesseractOCRParserTest();
>          return tesseractOCRTest.canRun(config);
>      }
> @@ -164,6 +165,8 @@ public class TesseractOCRParserTest extends TikaTest {
>                            BasicContentHandlerFactory.HANDLER_TYPE 
> handlerType,
>                            TesseractOCRConfig.OUTPUT_TYPE outputType) throws 
> Exception {
>          TesseractOCRConfig config = new TesseractOCRConfig();
> +        config.setTesseractPath("/usr/local/bin");
> +        
> config.setTessdataPath("/usr/local/Cellar/tesseract/4.1.0/share/tessdata");
>          config.setOutputType(outputType);
>          
>          Parser parser = new RecursiveParserWrapper(new AutoDetectParser(),
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com <http://www.opensourceconnections.com/> 
> | My Free/Busy <http://tinyurl.com/eric-cal>  
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>   
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as such.
> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Reply via email to