Chetan Mehrotra created OAK-2468:
------------------------------------

             Summary: Index binary only if some Tika parser can support the 
bianries mimeType
                 Key: OAK-2468
                 URL: https://issues.apache.org/jira/browse/OAK-2468
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: oak-lucene
            Reporter: Chetan Mehrotra
            Assignee: Chetan Mehrotra
            Priority: Minor
             Fix For: 1.2


Currently all binaries are passed to Tika for text extraction. However Tika can 
only parse those for which it has supported parser present. Therefore 
extraction logic should parse a binary only if the mimeType is supported by 
Tika. 

With this change {{jcr:mimeType}} would become a mandatory property 

JR2 had a similar check [1]

[1] 
https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/NodeIndexer.java#L932



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to