[ https://issues.apache.org/jira/browse/NIFI-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580785#comment-15580785 ]
ASF GitHub Bot commented on NIFI-1815: -------------------------------------- Github user trixpan commented on the issue: https://github.com/apache/nifi/pull/397 @olegz @jdye64 is this PR still active / plan to review? Cheers > Tesseract OCR Processor > ----------------------- > > Key: NIFI-1815 > URL: https://issues.apache.org/jira/browse/NIFI-1815 > Project: Apache NiFi > Issue Type: Improvement > Reporter: Jeremy Dyer > Assignee: Jeremy Dyer > Attachments: 0006-changes-to-the-OCR-processor.patch, > nifi_1815_1.x_patch.zip > > > This ticket is a follow-up to NIFI-1718 minus the use of the Tika library > Expose OCR capabilities through a new processor which uses the Tesseract > library. Use of this processor would require that Tesseract be installed on > the NiFi host. Since the processor will have a system dependency care must be > taken to ensure that the overall NiFi cluster continues to function properly > in the absence of the Tesseract system dependency even though the OCR > processor itself will be unable to perform its duties. In the event that the > system dependencies are not detected the processor should display a > validation warning rather than failing or preventing the NiFi instance from > booting properly. > Properties expose to configure Tesseract > tesseractPath - Path to tesseract installation folder, if not on system path. > language - Language ID (e.g. "eng"); language dictionary to be used. > pageSegMode - Tesseract page segmentation mode, defaults to 1. > minFileSizeToOcr - Minimum file size to submit file to OCR, defaults to 0. > maxFileSizeToOcr - Maximum file size to submit file to OCR, defaults to > Integer.MAX_VALUE. > timeout - Maximum time (in seconds) to wait for the OCR process termination; > defaults to 120. -- This message was sent by Atlassian JIRA (v6.3.4#6332)