[ 
https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187683#comment-15187683
 ] 

Thamme Gowda N commented on TIKA-1508:
--------------------------------------

1.  Please Let me know the final verdict when all of you agree to one thing, I 
will make changes as per the recommendation.

2. +1. Agreed. I will update the code

3.  I really like the suggestion. That would allow us to validate parameters 
and fail early when they are wrong.
 But I think it requires a lot of rework on the side of Parsers as well. 
Parsers have to declare what params they expect from the configuration file, it 
is only after that we will be able to validate.  Another simple/lazy approach 
is to simply assume all params are valid, pass all the params and let the 
parser raise exception when there are errors. The current PR  has the latter 
approach. Let me know what you think?

4. +1 Agreed. Will update the code.

5. Anything that extends AbstractParser is now instance of Configurable. 
Anything that is an instance of Configurable will be checked and invoked with 
params while instantiating them. So ParserDecorator, DelegatingParser, 
ParserPostProcessor are all covered, Yay!! If no params are found in config 
file, a call is made with empty Map<String, String>. Now it is up to the 
implementation of these parsers to make use of params by overriding configure() 
method. 

A & B) I think solr way is complex to implement considering that we dont gain 
much after the effort (As of now we can just do Integer.parse() or similar ). 
Plus it introduces ambiguities with the type expected by parsers and the values 
supplied from configuration.


Being said that, I am open to all the suggestions.




> Add uniformity to parser parameter configuration
> ------------------------------------------------
>
>                 Key: TIKA-1508
>                 URL: https://issues.apache.org/jira/browse/TIKA-1508
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>             Fix For: 1.13
>
>
> We can currently configure parsers by the following means:
> 1) programmatically by direct calls to the parsers or their config objects
> 2) sending in a config object through the ParseContext
> 3) modifying .properties files for specific parsers (e.g. PDFParser)
> Rather than scattering the landscape with .properties files for each parser, 
> it would be great if we could specify parser parameters in the main config 
> file, something along the lines of this:
> {noformat}
>     <parser class="org.apache.tika.parser.audio.AudioParser">
>       <params>
>         <int name="someparam1">2</int>
>         <str name="someOtherParam2">something or other</str>
>       </params>
>       <mime>audio/basic</mime>
>       <mime>audio/x-aiff</mime>
>       <mime>audio/x-wav</mime>
>     </parser>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to