Hi Tika members,

Thank for this great initiative. I guess that there's some use cases possible 
when creating such service:
1. Tika exploitation
We may create a free accessible Tika Server to parse documents coming from 
public requests, a kind of demo or free-try document parser to check Tika 
feasibility on special user documents. That will make sense because a native 
user don't have to download, install latest build from snapshot version. We 
should add some check on incoming requests to refuse abusing/spam requests. 
This case provides similar service as in any23.org site.

2. Tika parser development
"Tika users can do adhoc parsing" is a great idea. I think we would have an 
"online IDE" for Tika parsers development. For this case, we may can have 2 sub 
scenarios:
2.1: Using existing parsers and adding new features (as adding missing parsed 
metadata, fixing bugs on XHTML handler)... This case don't need adding new 
library, and user can extends the interested Parser and try with testing 
documents. Using Groovy is an idea, because it's simple and Java-like language.
2.2: Creating new parser: but, from parser development experience, creating new 
parser ask usually 3rd party libraries, to build/run with this online service, 
we need to extend dynamically classloader. If we really want to support this 
use case, we can eventually wrap client's jars & classes as OSGi plugin, then 
loading/executing on server side. I don't know this scenario make a great sense 
when users have always possibility to checkout/build/develop new parser locally.

3. Tika parsers libraries store
For some reason (incapability of libraries, license's constrains ...), Official 
Tika could not integrate contributed parsers, this kind of service stores these 
parsers and anyone can download, apply within user's context.

Anyway, this service requires resources and humain effort in creating and 
maintenance.

Hong-Thai

-----Message d'origine-----
De : Nick Burch [mailto:apa...@gagravarr.org] 
Envoyé : mercredi 9 avril 2014 06:32
À : dev@tika.apache.org
Objet : Re: Tika VM Service

On Tue, 8 Apr 2014, Lewis John Mcgibbney wrote:
> I would like to propose that we get a Tika service up and running on a VM.
> Tika users can do adhoc parsing, etc and can do this based on possibly 
> stable nightly SNAPSHOT's or alternatively based on the most recent 
> stable release.
> Preferably, the service should provide a list of parsers and also 
> MediaType's supported.

My vision of how this would work would be to use the Tika Server, with some 
extensions so that it self hosted some basic documentation. We're thinking of 
trying to start that tomorrow in the hackathon, any help / ideas / projects to 
crib off gratefully received!

Nick

Reply via email to