Andriy Binetsky created SOLR-8166: ------------------------------------- Summary: Introduce possibility to configure ParseContext in ExtractingRequestHandler/ExtractingDocumentLoader Key: SOLR-8166 URL: https://issues.apache.org/jira/browse/SOLR-8166 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Affects Versions: 5.3 Reporter: Andriy Binetsky
Actually there is no possibility to hand over some additional configuration by document extracting with ExtractingRequestHandler/ExtractingDocumentLoader. For example I need to put org.apache.tika.parser.pdf.PDFParserConfig with "extractInlineImages" set to true in ParseContext to trigger extraction/OCR recognizing of embedded images from pdf. It would be nice to have possibility to configure created ParseContext due xml-config file like TikaConfig does. I would suggest to have following: solrconfig.xml: <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> <str name="parseContext.config">parseContext.config</str> </requestHandler> parseContext.config: <entries> <entry class="org.apache.tika.parser.pdf.PDFParserConfig" value="org.apache.tika.parser.pdf.PDFParserConfig"> <property name="extractInlineImages" value="true"/> </entry> </entries> -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org