[ 
https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187205#comment-15187205
 ] 

Nick Burch commented on TIKA-1508:
----------------------------------

> I think that's exactly what ParseContext should be for..it should be a 
> vehicle for Param passing. We can delineate by property name (FQ) and/or by 
> class.

I view {{ParseContext}} as somewhere you configure things on a per-document 
basis, not a per-parser basis. 

So, need to set where Tesseract lives on your system? Applies to everything, so 
on the parser. Need to tell Tesseract to use a German not an English dictionary 
on this particular jpeg? Applies to just this one document being parserd, so on 
the {{ParseContext}}

> Add uniformity to parser parameter configuration
> ------------------------------------------------
>
>                 Key: TIKA-1508
>                 URL: https://issues.apache.org/jira/browse/TIKA-1508
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>             Fix For: 1.13
>
>
> We can currently configure parsers by the following means:
> 1) programmatically by direct calls to the parsers or their config objects
> 2) sending in a config object through the ParseContext
> 3) modifying .properties files for specific parsers (e.g. PDFParser)
> Rather than scattering the landscape with .properties files for each parser, 
> it would be great if we could specify parser parameters in the main config 
> file, something along the lines of this:
> {noformat}
>     <parser class="org.apache.tika.parser.audio.AudioParser">
>       <params>
>         <int name="someparam1">2</int>
>         <str name="someOtherParam2">something or other</str>
>       </params>
>       <mime>audio/basic</mime>
>       <mime>audio/x-aiff</mime>
>       <mime>audio/x-wav</mime>
>     </parser>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to