Hey Guys, I've been toying around with the idea of writing a simple Tika Parser Decorator that extends the Text Parser, but that generates TDF-IDF metadata maybe top word count (summarized) and frequencies/term map. I was also thinking of then writing a similar ContentHandler as well so it could be piped together with the other handlers (e.g., like LinkContentHandler).
Would others find this useful? Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++