Taewoo Kim created ASTERIXDB-1813: ------------------------------------- Summary: similarity-jaccard-prefix() issue Key: ASTERIXDB-1813 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1813 Project: Apache AsterixDB Issue Type: Bug Reporter: Taewoo Kim
For the following two records, similarity-jaccard-prefix() doesn't generate the correct result. Switch the line (skip-index, indexnl) to see the difference. In order to see this, you need to enable the fuzzy join rule. It doesn't happen in the master yet. This bug needs to be fixed before enabling the fuzzy join rule. {code} drop dataverse test if exists; create dataverse test; use dataverse test; create type DBLPType as open { id: uuid } create dataset AmazonReviewNoDup(DBLPType) primary key id; create index AmazonReviewNoDup_summary_b_idx on AmazonReviewNoDup(summary:string?) type btree enforced; create index AmazonReviewNoDup_summary_kw_idx on AmazonReviewNoDup(summary:string?) type keyword enforced; insert into dataset AmazonReviewNoDup( { "id": uuid("83208a78-7007-8d77-935b-d9127e4cc9dc"), "summary": "Clear, Concise, and fun!" } ); insert into dataset AmazonReviewNoDup( { "id": uuid("83208a78-7007-8d77-935b-d9127e4cc9dd"), "summary": "Clear, Concise, and Charitable" } ); for $o in dataset AmazonReviewNoDup for $i in dataset AmazonReviewNoDup //where /* +indexnl */ similarity-jaccard(word-tokens($o.summary), word-tokens($i.summary)) >= 0.6 where /* +skip-index */ similarity-jaccard(word-tokens($o.summary), word-tokens($i.summary)) >= 0.6 and $o.id < $i.id return {"oid":$o.id, "iid":$i.id}; {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)