[jira] [Commented] (ASTERIXDB-1179) Similarity functions should coerce numeric types

Heri Ramampiaro (JIRA) Mon, 15 Feb 2016 23:40:35 -0800

    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148208#comment-15148208
 ]


Heri Ramampiaro commented on ASTERIXDB-1179:
--------------------------------------------

I remember modifying the ATypeHierarchy to get the item domain. If the domain 
is numeric, the number is "casted" to double, and so on. This allows us to 
check [1.0, 2, 3.0] to be against [1, 2, 3]. Another thing is that even though 
deep-equality also uses binary hash table, since we are mainly interested in 
the content (and that the types are checked before comparing the content), it 
ignores the type when generating the hash values and the comparator... 

E.g.:
{code}
deep-equal([1.0,2, 3], [1,2, 3.0])
{code}

returns "true", while

{code}
similarity-jaccard([1.0,2, 3], [1,2, 3.0])
{code}

returns 0.2f.





> Similarity functions should coerce numeric types
> ------------------------------------------------
>
>                 Key: ASTERIXDB-1179
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1179
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: AsterixDB, Similarity
>         Environment: master (I50442edc3187d003987bc4119559eda676c9b2eb)
>            Reporter: Cameron Samak
>            Assignee: Taewoo Kim
>
> Similarity functions should coerce to the largest numeric type compared. 
> Currently, comparing lists with different numeric types always results in 
> similarity 0 or max edit distance.
> If on the other hand this is intended behavior, a note in the documentation 
> would be helpful. Or, more consistently, a type error should be thrown (as is 
> currently implemented for edit-distance(string, OrderedList)).
> Example query:
> {code}
> similarity-jaccard([1,7,9], [1,5,9])
> similarity-jaccard([int32('1'),int32('7'),int32('9')], [1,5,9])
> edit-distance([1,5,9], [1,6,9])
> edit-distance([int32('1'),int32('5'),int32('9')], [1,6,9])
> {code}
> Result:
> {code}
> 0.5f
> 0.0f
> 1
> 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ASTERIXDB-1179) Similarity functions should coerce numeric types

Reply via email to