[ https://issues.apache.org/jira/browse/DATAFU-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mohammad S Amin updated DATAFU-67: ---------------------------------- Description: Adding Simple SimHash for near duplicate detection. The UDF computes SimHash for each document which can then be compared accross multiple documents. RB: https://reviews.apache.org/r/25049/ was:Adding Simple SimHash for near duplicate detection. The UDF computes SimHash for each document which can then be compared accross multiple documents. > Adding Simple SimHash for near duplicate detection > -------------------------------------------------- > > Key: DATAFU-67 > URL: https://issues.apache.org/jira/browse/DATAFU-67 > Project: DataFu > Issue Type: New Feature > Reporter: Mohammad S Amin > Attachments: DATAFU-67 > > > Adding Simple SimHash for near duplicate detection. The UDF computes SimHash > for each document which can then be compared accross multiple documents. > RB: https://reviews.apache.org/r/25049/ -- This message was sent by Atlassian JIRA (v6.2#6252)