Hi Tika members, Thank for this great initiative. I guess that there's some use cases possible when creating such service: 1. Tika exploitation We may create a free accessible Tika Server to parse documents coming from public requests, a kind of demo or free-try document parser to check Tika feasibility on special user documents. That will make sense because a native user don't have to download, install latest build from snapshot version. We should add some check on incoming requests to refuse abusing/spam requests. This case provides similar service as in any23.org site.
2. Tika parser development "Tika users can do adhoc parsing" is a great idea. I think we would have an "online IDE" for Tika parsers development. For this case, we may can have 2 sub scenarios: 2.1: Using existing parsers and adding new features (as adding missing parsed metadata, fixing bugs on XHTML handler)... This case don't need adding new library, and user can extends the interested Parser and try with testing documents. Using Groovy is an idea, because it's simple and Java-like language. 2.2: Creating new parser: but, from parser development experience, creating new parser ask usually 3rd party libraries, to build/run with this online service, we need to extend dynamically classloader. If we really want to support this use case, we can eventually wrap client's jars & classes as OSGi plugin, then loading/executing on server side. I don't know this scenario make a great sense when users have always possibility to checkout/build/develop new parser locally. 3. Tika parsers libraries store For some reason (incapability of libraries, license's constrains ...), Official Tika could not integrate contributed parsers, this kind of service stores these parsers and anyone can download, apply within user's context. Anyway, this service requires resources and humain effort in creating and maintenance. Hong-Thai -----Message d'origine----- De : Nick Burch [mailto:apa...@gagravarr.org] Envoyé : mercredi 9 avril 2014 06:32 À : dev@tika.apache.org Objet : Re: Tika VM Service On Tue, 8 Apr 2014, Lewis John Mcgibbney wrote: > I would like to propose that we get a Tika service up and running on a VM. > Tika users can do adhoc parsing, etc and can do this based on possibly > stable nightly SNAPSHOT's or alternatively based on the most recent > stable release. > Preferably, the service should provide a list of parsers and also > MediaType's supported. My vision of how this would work would be to use the Tika Server, with some extensions so that it self hosted some basic documentation. We're thinking of trying to start that tomorrow in the hackathon, any help / ideas / projects to crib off gratefully received! Nick