On Feb 9, 2010, at 10:24 AM, Robin Anil wrote: > Yeah!. Tika looks great!. I bet Drew's patch to create a structured document > format via Avro should essentially go into Tika. Then we could really use > the Tika library to the full.
Solr has code here that would be pretty simple to grab, but it's also really straightforward to do standalone. The key is making sure that people can provide there own DocumentHandler if they want, while still providing good default options. > > I should really spend time to explore Apache projects. I think we could > reuse a whole lot. +1. Cross fertilization is a good thing. Many people in the Lucene communities are working on these types of things. We're getting to the point where UIMA integration makes sense, too, I think, but I'm not a UIMA expert, so... -Grant