Andriy Binetsky created SOLR-8166:
-------------------------------------

             Summary: Introduce possibility to configure ParseContext in 
ExtractingRequestHandler/ExtractingDocumentLoader
                 Key: SOLR-8166
                 URL: https://issues.apache.org/jira/browse/SOLR-8166
             Project: Solr
          Issue Type: Improvement
          Components: contrib - Solr Cell (Tika extraction)
    Affects Versions: 5.3
            Reporter: Andriy Binetsky


Actually there is no possibility to hand over some additional configuration by 
document extracting with ExtractingRequestHandler/ExtractingDocumentLoader.

For example I need to put org.apache.tika.parser.pdf.PDFParserConfig with 
"extractInlineImages" set to true in ParseContext to trigger extraction/OCR 
recognizing of embedded images from pdf. 

It would be nice to have possibility to configure created ParseContext due 
xml-config file like TikaConfig does.

I would suggest to have following:

solrconfig.xml:
  <requestHandler name="/update/extract" 
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
    <str name="parseContext.config">parseContext.config</str>
  </requestHandler>

parseContext.config:

<entries>
  <entry class="org.apache.tika.parser.pdf.PDFParserConfig" 
value="org.apache.tika.parser.pdf.PDFParserConfig">
    <property name="extractInlineImages" value="true"/>
  </entry>
</entries>








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to