[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147844#comment-15147844
 ] 

Heri Ramampiaro commented on ASTERIXDB-1179:
--------------------------------------------

Yes, I checked this morning. I think one reason is JaccardSimilarityEvaluator 
has been designed to use a binary hash table to support the checking of the 
lists. This hash table uses a hash function, which, in turn, generates hash 
values based on the types. So does the binary comparator that the hash table is 
using. In conclusion, the current implementation is not allowing comparing 
[int32('1'),int32('7'),int32('9')] against [1,5,9] correctly... 

> Similarity functions should coerce numeric types
> ------------------------------------------------
>
>                 Key: ASTERIXDB-1179
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1179
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: AsterixDB, Similarity
>         Environment: master (I50442edc3187d003987bc4119559eda676c9b2eb)
>            Reporter: Cameron Samak
>            Assignee: Taewoo Kim
>
> Similarity functions should coerce to the largest numeric type compared. 
> Currently, comparing lists with different numeric types always results in 
> similarity 0 or max edit distance.
> If on the other hand this is intended behavior, a note in the documentation 
> would be helpful. Or, more consistently, a type error should be thrown (as is 
> currently implemented for edit-distance(string, OrderedList)).
> Example query:
> {code}
> similarity-jaccard([1,7,9], [1,5,9])
> similarity-jaccard([int32('1'),int32('7'),int32('9')], [1,5,9])
> edit-distance([1,5,9], [1,6,9])
> edit-distance([int32('1'),int32('5'),int32('9')], [1,6,9])
> {code}
> Result:
> {code}
> 0.5f
> 0.0f
> 1
> 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to