[
https://issues.apache.org/jira/browse/ASTERIXDB-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148208#comment-15148208
]
Heri Ramampiaro commented on ASTERIXDB-1179:
--------------------------------------------
I remember modifying the ATypeHierarchy to get the item domain. If the domain
is numeric, the number is "casted" to double, and so on. This allows us to
check [1.0, 2, 3.0] to be against [1, 2, 3]. Another thing is that even though
deep-equality also uses binary hash table, since we are mainly interested in
the content (and that the types are checked before comparing the content), it
ignores the type when generating the hash values and the comparator...
E.g.:
{code}
deep-equal([1.0,2, 3], [1,2, 3.0])
{code}
returns "true", while
{code}
similarity-jaccard([1.0,2, 3], [1,2, 3.0])
{code}
returns 0.2f.
> Similarity functions should coerce numeric types
> ------------------------------------------------
>
> Key: ASTERIXDB-1179
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1179
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: AsterixDB, Similarity
> Environment: master (I50442edc3187d003987bc4119559eda676c9b2eb)
> Reporter: Cameron Samak
> Assignee: Taewoo Kim
>
> Similarity functions should coerce to the largest numeric type compared.
> Currently, comparing lists with different numeric types always results in
> similarity 0 or max edit distance.
> If on the other hand this is intended behavior, a note in the documentation
> would be helpful. Or, more consistently, a type error should be thrown (as is
> currently implemented for edit-distance(string, OrderedList)).
> Example query:
> {code}
> similarity-jaccard([1,7,9], [1,5,9])
> similarity-jaccard([int32('1'),int32('7'),int32('9')], [1,5,9])
> edit-distance([1,5,9], [1,6,9])
> edit-distance([int32('1'),int32('5'),int32('9')], [1,6,9])
> {code}
> Result:
> {code}
> 0.5f
> 0.0f
> 1
> 3
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)