Re: tika-eval

Tilman Hausherr Mon, 22 May 2017 01:56:40 -0700

Am 21.05.2017 um 18:20 schrieb Andreas Lehmkuehler:

Am 17.02.2017 um 17:58 schrieb Allison, Timothy B.:
All,
I finally got around to adding tika-eval[1] to Apache Tika. If youhave any interest in comparing the output of differenttools/versions/parameters on text extraction, give it a try. Youdon't need to use Tika or format the output in a specific format;plain UTF-8 text will work.
Tilman, I generalized your common word count methodology. The codenow runs language id on the text and then counts the common words forthat language.
Lots more work remains. Thank you, all, for contributing to themethodologies!
And here is the talk about it Tim gave at ApacheCon

https://youtu.be/vRPTPMwI53k?list=PLbzoR-pLrL6pLDCyPxByWQwYTL-JrF5Rp
I've enjoyed it (the video).


So did I!

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: tika-eval

Reply via email to