Tim Allison created TIKA-3146:
---------------------------------

             Summary: Add Nutch's TextProfileSignature digest to tika-eval's 
text stats
                 Key: TIKA-3146
                 URL: https://issues.apache.org/jira/browse/TIKA-3146
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/crawl/TextProfileSignature

Will require trivial modifications to work within the tika-eval context.  As 
with TIKA-3145, this will give users the ability to calculate a fuzzier digest 
to identify near duplicates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to