Hey Guys,

I've been toying around with the idea of writing a simple Tika Parser Decorator 
that extends the Text Parser, but that generates TDF-IDF metadata maybe top
word count (summarized) and frequencies/term map. I was also thinking of then
writing a similar ContentHandler as well so it could be piped together with the
other handlers (e.g., like LinkContentHandler).

Would others find this useful?

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to