Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "CompositeParserDiscussion" page has been changed by NickBurch:
https://wiki.apache.org/tika/CompositeParserDiscussion?action=diff&rev1=4&rev2=5

Comment:
Hierarchy question

  
  == Fastest ==
  If there are two parsers, use the faster one even if it might mean lower 
quality (eg avoid OCR)
+ 
+ = Mime type hierarchies =
+ Consider a case like:
+ 
+  * application/vnd.ms-excel
+   * application/x-tika-msoffice
+ 
+ Or
+ 
+  * application/dita+xml;format=concept
+   * application/dita+xml;format=topic
+    * application/dita+xml
+ 
+ If there were two parsers available for application/vnd.ms-excel, and another 
for application/x-tika-msoffice, should it be possible to specify in a strategy 
that a parser for the parent type also be used? Should it be possible to set a 
strategy like "use the dita concept, then the general dita, then the dita 
topic", hopping around up and down the hierarchy? 
+ 
+ Or do we keep the current behaviour where once a point in the hierachy with a 
parser is found, it is parsed at that point?
  
  = Allowing the User to select a strategy =
  The right strategy for one user may not be the right for another. The right 
strategy for one file may not be the right one for another. We therefore need 
to allow users to pick their strategy, on an overall basis, and on a per-file 
basis

Reply via email to