Tien Nguyen Manh created NUTCH-1693: ---------------------------------------
Summary: TextMD5Signatue compute on textual content Key: NUTCH-1693 URL: https://issues.apache.org/jira/browse/NUTCH-1693 Project: Nutch Issue Type: Bug Reporter: Tien Nguyen Manh Priority: Minor I create a new MD5Signature that based on textual content. In our case we use boilerpipe to extract main text from content so this signature is more effective to deduplicate. -- This message was sent by Atlassian JIRA (v6.1.5#6160)