[
https://issues.apache.org/jira/browse/SOLR-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960399#comment-14960399
]
Uwe Schindler edited comment on SOLR-8166 at 10/16/15 9:09 AM:
---------------------------------------------------------------
Hi,
we disallow using setAccessible inside reflection throughout Lucene/Solr (cause
is Java 9 where this is veeeery limited), so your patch would not pass the code
quality checks (forbidden-apis).
I would suggest to add a ParseContextFactory that you can specify in your
config and that has to be supplied by the user, implemented as native Java code
by the user (using Solr's plugin mechanism).
Alternatively add setters for all ParseContext methods in your parser.
was (Author: thetaphi):
Hi,
we disallow using setAccessible inside reflection throughout Lucene/Solr (cause
is Java 9 where this is veeeery limited), so your patch would not pass the code
quality checks (forbidden-apis).
I would suggest to add a ParseContextFactory that you can specify in your
config and that has to be supplied by the user, implemented as native Java code
by the user (using Solr's plugin mechanism).
> Introduce possibility to configure ParseContext in
> ExtractingRequestHandler/ExtractingDocumentLoader
> ----------------------------------------------------------------------------------------------------
>
> Key: SOLR-8166
> URL: https://issues.apache.org/jira/browse/SOLR-8166
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction)
> Affects Versions: 5.3
> Reporter: Andriy Binetsky
>
> Actually there is no possibility to hand over some additional configuration
> by document extracting with ExtractingRequestHandler/ExtractingDocumentLoader.
> For example I need to put org.apache.tika.parser.pdf.PDFParserConfig with
> "extractInlineImages" set to true in ParseContext to trigger extraction/OCR
> recognizing of embedded images from pdf.
> It would be nice to have possibility to configure created ParseContext due
> xml-config file like TikaConfig does.
> I would suggest to have following:
> solrconfig.xml:
> <requestHandler name="/update/extract"
> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
> <str name="parseContext.config">parseContext.config</str>
> </requestHandler>
> parseContext.config:
> <entries>
> <entry class="org.apache.tika.parser.pdf.PDFParserConfig"
> value="org.apache.tika.parser.pdf.PDFParserConfig">
> <property name="extractInlineImages" value="true"/>
> </entry>
> </entries>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]