Re: tika-eval

Andreas Lehmkuehler Sun, 21 May 2017 09:21:06 -0700

Am 17.02.2017 um 17:58 schrieb Allison, Timothy B.:

All,


   I finally got around to adding tika-eval[1] to Apache Tika.  If you have any 
interest in comparing the output of different tools/versions/parameters on text 
extraction, give it a try.  You don't need to use Tika or format the output in 
a specific format; plain UTF-8 text will work.

   Tilman, I generalized your common word count methodology.  The code now runs 
language id on the text and then counts the common words for that language.

   Lots more work remains.  Thank you, all, for contributing to the 
methodologies!

And here is the talk about it Tim gave at ApacheCon

https://youtu.be/vRPTPMwI53k?list=PLbzoR-pLrL6pLDCyPxByWQwYTL-JrF5Rp

I've enjoyed it (the video).

Andreas


          Cheers,

                       Tim


[1] https://wiki.apache.org/tika/TikaEval



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: tika-eval

Reply via email to