[
https://issues.apache.org/jira/browse/TIKA-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-3145.
-------------------------------
Fix Version/s: 1.25
Assignee: Tim Allison
Resolution: Fixed
> Add a content digester to tika-eval text stats
> ----------------------------------------------
>
> Key: TIKA-3145
> URL: https://issues.apache.org/jira/browse/TIKA-3145
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Assignee: Tim Allison
> Priority: Major
> Fix For: 1.25
>
>
> When comparing files, it can be useful to digest the text contents so that
> users can identify files that may have duplicate content but different
> overall digests. Let's add a content digester to tika-eval's text stats
> calculator.
> See:
> https://builds.apache.org/job/nutch-trunk/javadoc/org/apache/nutch/crawl/TextMD5Signature.html
--
This message was sent by Atlassian Jira
(v8.3.4#803005)