Github user jskora commented on the pull request:

    https://github.com/apache/nifi/pull/252#issuecomment-211868734
  
    @joewitt On [NIFI-1717|https://issues.apache.org/jira/browse/NIFI-1717] and 
[NIFI-1718|https://issues.apache.org/jira/browse/NIFI-1718] Dmitry Goldenberg 
and I discussed using Tika to extract content (OCR) documents and images.  
@markap14 also suggested removing the filters.
    
    I don't know where the OCR changes stand, those tickets have been quiet for 
a couple of weeks.  I think that's a tougher capability to test, and as pointed 
out on [NIFI-1717|https://issues.apache.org/jira/browse/NIFI-1717] and 
[NIFI-1718|https://issues.apache.org/jira/browse/NIFI-1718] it is an expensive 
process that may need special consideration.
    
    As for the filters, I like having them in the processor, especially since 
this one includes filename and mimetype filters.  If consensus is to remove 
them, I can update the PR for that, but I think they are affective for this 
purpose as it currently is.
    
    I don't think we should hold this for the OCR, but if you want the filters 
removed let me know.  It'd be nice to get the metadata functionality in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to