Andriy Binetsky created SOLR-8166:
-------------------------------------
Summary: Introduce possibility to configure ParseContext in
ExtractingRequestHandler/ExtractingDocumentLoader
Key: SOLR-8166
URL: https://issues.apache.org/jira/browse/SOLR-8166
Project: Solr
Issue Type: Improvement
Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 5.3
Reporter: Andriy Binetsky
Actually there is no possibility to hand over some additional configuration by
document extracting with ExtractingRequestHandler/ExtractingDocumentLoader.
For example I need to put org.apache.tika.parser.pdf.PDFParserConfig with
"extractInlineImages" set to true in ParseContext to trigger extraction/OCR
recognizing of embedded images from pdf.
It would be nice to have possibility to configure created ParseContext due
xml-config file like TikaConfig does.
I would suggest to have following:
solrconfig.xml:
<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<str name="parseContext.config">parseContext.config</str>
</requestHandler>
parseContext.config:
<entries>
<entry class="org.apache.tika.parser.pdf.PDFParserConfig"
value="org.apache.tika.parser.pdf.PDFParserConfig">
<property name="extractInlineImages" value="true"/>
</entry>
</entries>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]