[ https://issues.apache.org/jira/browse/NIFI-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339897#comment-15339897 ]
Joseph Witt commented on NIFI-1815: ----------------------------------- BSD is fine. The (L)GPL ones are the problem. [~jeremy.dyer] could you find out if those three are critical? IF not we could exclude them. > Tesseract OCR Processor > ----------------------- > > Key: NIFI-1815 > URL: https://issues.apache.org/jira/browse/NIFI-1815 > Project: Apache NiFi > Issue Type: Improvement > Reporter: Jeremy Dyer > Assignee: Jeremy Dyer > Attachments: 0006-changes-to-the-OCR-processor.patch, > nifi_1815_1.x_patch.zip > > > This ticket is a follow-up to NIFI-1718 minus the use of the Tika library > Expose OCR capabilities through a new processor which uses the Tesseract > library. Use of this processor would require that Tesseract be installed on > the NiFi host. Since the processor will have a system dependency care must be > taken to ensure that the overall NiFi cluster continues to function properly > in the absence of the Tesseract system dependency even though the OCR > processor itself will be unable to perform its duties. In the event that the > system dependencies are not detected the processor should display a > validation warning rather than failing or preventing the NiFi instance from > booting properly. > Properties expose to configure Tesseract > tesseractPath - Path to tesseract installation folder, if not on system path. > language - Language ID (e.g. "eng"); language dictionary to be used. > pageSegMode - Tesseract page segmentation mode, defaults to 1. > minFileSizeToOcr - Minimum file size to submit file to OCR, defaults to 0. > maxFileSizeToOcr - Maximum file size to submit file to OCR, defaults to > Integer.MAX_VALUE. > timeout - Maximum time (in seconds) to wait for the OCR process termination; > defaults to 120. -- This message was sent by Atlassian JIRA (v6.3.4#6332)