On Sun, 11 Sep 2016, Bob Paulin wrote:
I'd like to propose a new Tika App for the 2.0 branch. One of the reasons we broke apart the Tika parsers into modules was due to the complexity of having to deal with all the parser dependencies and transitive dependencies. Now developers can use just the modules they want without pulling the kitchen sink with it. Unfortunately this approach doesn't simplify the problem in the tika-parser or tika-app project where the whole kitchen sink comes together again.
One of the nice things about the tika app (and server) is you do get everything, so it's very easy to test and get started with!
Another nice thing is that you can test small changes (eg a new parser or a new mime type) quite quickly, just by using the tika app jar on your classpath along with your customisation. Makes it very easy to try out new things if you're a new developer, and I find usually easier than firing up eclipe if I just want to try a new mime type change for someone.
More modular versions of the Tika server I could certainly get behind, if we haven't already done so!
For the app, are there that many use cases for it where you might only want some of Tika? (Most people calling Tika from another language would likely be better off with the server, to avoid the JVM start/stop overhead).
Would the new osgi version make it harder for people to test new bits with tika? For one example, whenever we've done a hackathon and are helping people with a new parser, helping them get their new parser used with just the app is about do-able. I fear if we made them also learn osgi + build a bundle, at that stage when they're trying to do a "hello world", we'd loose them :/
The github project does look interesting though! I'd hate for us to get a few shiny new bits, but loose some key bits important for newbies / quick-win developers in the process though...
Nick