Thanks Chris! It's only gonna get harder with stanford this weekend ;).
- Bob
On 9/13/2016 11:06 PM, Mattmann, Chris A (3980) wrote:
I’ll try and comment on this tomorrow sorry it’s been a tough few weeks, really
busy.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect, Instrument Software and Science Data Systems Section (398)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On 9/13/16, 8:35 PM, "Bob Paulin" <b...@bobpaulin.com> wrote:
Hey Nick,
Thanks for the thoughts. Just to clear a few things up. The version of
the app on my github does already include all the parsers as the current
app does. If you build it and run --list-parsers you'll see them
there. As for the desire to quickly test new bits I think much of the
OSGi stuff has been abstracted away. For an example see the example
folder [1]. The only additions are the Activator class (which is
identical for all the current bundles) and the maven-bundle-plugin in
the pom.xml. But don't take my word for it why not give it a spin?
As for the use cases I would say consider whenever we upgrade or add
parsers/detectors/encodingdetectors/languagedetectors we .may introduce
new dependencies or new versions. For example the pom for the tika-app
currently pulls in 3 different versions of commons-io, 2 versions of
commons-codec, 2 versions of Guava. Maven resolves to just one version
in the final build but the effect is that every part of the code must
work with the selected version. In the OSGi version of tika-app the
modules can have different versions of the dependencies within the same
app. Also within TIKA-1285 [2] it could have been possible to support 2
different versions of PDFBox within different OSGi bundles. So I see it
as more of a gain but I'd be interesting in hearing if there is any
degradation in the development experience.
- Bob
[1]
https://github.com/bobpaulin/tika-app-osgi/tree/master/examples/dummy-parser-bundle
[2] https://issues.apache.org/jira/browse/TIKA-1285
On 9/13/2016 3:38 PM, Nick Burch wrote:
> On Sun, 11 Sep 2016, Bob Paulin wrote:
>> I'd like to propose a new Tika App for the 2.0 branch. One of the
>> reasons we broke apart the Tika parsers into modules was due to the
>> complexity of having to deal with all the parser dependencies and
>> transitive dependencies. Now developers can use just the modules
>> they want without pulling the kitchen sink with it. Unfortunately
>> this approach doesn't simplify the problem in the tika-parser or
>> tika-app project where the whole kitchen sink comes together again.
>
> One of the nice things about the tika app (and server) is you do get
> everything, so it's very easy to test and get started with!
>
> Another nice thing is that you can test small changes (eg a new parser
> or a new mime type) quite quickly, just by using the tika app jar on
> your classpath along with your customisation. Makes it very easy to
> try out new things if you're a new developer, and I find usually
> easier than firing up eclipe if I just want to try a new mime type
> change for someone.
>
>
> More modular versions of the Tika server I could certainly get behind,
> if we haven't already done so!
>
> For the app, are there that many use cases for it where you might only
> want some of Tika? (Most people calling Tika from another language
> would likely be better off with the server, to avoid the JVM
> start/stop overhead).
>
> Would the new osgi version make it harder for people to test new bits
> with tika? For one example, whenever we've done a hackathon and are
> helping people with a new parser, helping them get their new parser
> used with just the app is about do-able. I fear if we made them also
> learn osgi + build a bundle, at that stage when they're trying to do a
> "hello world", we'd loose them :/
>
> The github project does look interesting though! I'd hate for us to
> get a few shiny new bits, but loose some key bits important for
> newbies / quick-win developers in the process though...
>
> Nick
>