Tim Allison created TIKA-3145:
---------------------------------
Summary: Add a content digester to tika-eval text stats
Key: TIKA-3145
URL: https://issues.apache.org/jira/browse/TIKA-3145
Project: Tika
Issue Type: Task
Reporter: Tim Allison
When comparing files, it can be useful to digest the text contents so that
users can identify files that may have duplicate content but different overall
digests. Let's add a content digester to tika-eval's text stats calculator.
See:
https://builds.apache.org/job/nutch-trunk/javadoc/org/apache/nutch/crawl/TextMD5Signature.html
--
This message was sent by Atlassian Jira
(v8.3.4#803005)