Tim Allison created TIKA-3146:
---------------------------------
Summary: Add Nutch's TextProfileSignature digest to tika-eval's
text stats
Key: TIKA-3146
URL: https://issues.apache.org/jira/browse/TIKA-3146
Project: Tika
Issue Type: Task
Reporter: Tim Allison
https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/TextProfileSignature
Will require trivial modifications to work within the tika-eval context. As
with TIKA-3145, this will give users the ability to calculate a fuzzier digest
to identify near duplicates.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)