Tim Allison created TIKA-3145:
---------------------------------

             Summary: Add a content digester to tika-eval text stats
                 Key: TIKA-3145
                 URL: https://issues.apache.org/jira/browse/TIKA-3145
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


When comparing files, it can be useful to digest the text contents so that 
users can identify files that may have duplicate content but different overall 
digests.  Let's add a content digester to tika-eval's text stats calculator.

See: 
https://builds.apache.org/job/nutch-trunk/javadoc/org/apache/nutch/crawl/TextMD5Signature.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to