On Feb 9, 2010, at 10:24 AM, Robin Anil wrote:

> Yeah!. Tika looks great!. I bet Drew's patch to create a structured document
> format via Avro should essentially go into Tika. Then we could really use
> the Tika library to the full.

Solr has code here that would be pretty simple to grab, but it's also really 
straightforward to do standalone.  The key is making sure that people can 
provide there own DocumentHandler if they want, while still providing good 
default options.

> 
> I should really spend time to explore Apache projects. I think we could
> reuse a whole lot.

+1.  Cross fertilization is a good thing.  Many people in the Lucene 
communities are working on these types of things.

We're getting to the point where UIMA integration makes sense, too, I think, 
but I'm not a UIMA expert, so...

-Grant

Reply via email to