Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "Tika2_0RoadMap" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/Tika2_0RoadMap

New page:
= Background =
This page is intended for a discussion of changes anticipated in Tika 2.0.

This is only a first draft from one voice.  Please contribute!

= Major Planned Changes =

 * Move from service loading to config file for parser specification and 
loading.  [[https://issues.apache.org/jira/browse/TIKA-1445|TIKA-1445]] raised 
this as an important area for improvement within Tika.  The current strategy in 
the AutoDetectParser is to load all parsers and then pick the first parser that 
matches a given mime type.  Tika chooses the "first" by first sorting on 
whether or not the class name begins with org.apache.tika and then 
(effectively) by reverse alphabetical order of the class name.  It would be 
great if the user could specify the order of parser selection in the config 
file.  We will be working towards this gradually through Tika 1.8 and 1.9, and 
we will remove service loading entirely in Tika 2.0.

 * Allow users to build composite parsers with configurable strategies via the 
config file ([[https://issues.apache.org/jira/browse/TIKA-1509|TIKA-1509]] and 
CompositeParserDiscussion).  We will be working towards this gradually through 
Tika 1.8 and 1.9.  By Tika 2.0, however, this will be the default.

 * Move to Java 1.7 (???)

= Minor Planned Changes =

= Wishes =
 * Allow for easily configurable parser sub-packages.  The tika-app, 
tika-server and tika-bundle jars are now pushing or are > 30MB.  It would be 
great if users easily could specify a subset of parsers they care about, either 
a la carte or by category (image, common office files (MSOffice, PDF, etc.), 
environmental data) and only get the dependencies required for that subset of 
parsers. 

Reply via email to