[jira] [Commented] (CALCITE-3951) Support different string comparison based on SqlCollation

2020-06-02 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123525#comment-17123525
 ] 

Ruben Q L commented on CALCITE-3951:


No problem [~zabetak], it can wait. In the meanwhile, maybe somebody else will 
take a look.

> Support different string comparison based on SqlCollation
> -
>
> Key: CALCITE-3951
> URL: https://issues.apache.org/jira/browse/CALCITE-3951
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Ruben Q L
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.24.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently SqlCollation defines concepts like Coercibility, Charset, Locale, 
> etc. However, we cannot specify on a certain collation that e.g. a string 
> field should use case insensitive comparison. The goal of this ticket is to 
> evolve SqlCollation to support that, and adapt the corresponding classes to 
> use that (optional) "non-standard" comparison.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3951) Support different string comparison based on SqlCollation

2020-06-01 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121362#comment-17121362
 ] 

Stamatis Zampetakis commented on CALCITE-3951:
--

Hey [~rubenql] I am a bit underwater at the moment. Will try my best to review 
it soon but don't think it is going to be during this week.

> Support different string comparison based on SqlCollation
> -
>
> Key: CALCITE-3951
> URL: https://issues.apache.org/jira/browse/CALCITE-3951
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Ruben Q L
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.24.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently SqlCollation defines concepts like Coercibility, Charset, Locale, 
> etc. However, we cannot specify on a certain collation that e.g. a string 
> field should use case insensitive comparison. The goal of this ticket is to 
> evolve SqlCollation to support that, and adapt the corresponding classes to 
> use that (optional) "non-standard" comparison.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3951) Support different string comparison based on SqlCollation

2020-05-28 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118402#comment-17118402
 ] 

Ruben Q L commented on CALCITE-3951:


[~zabetak], 1.23 has been released and master is open, do you think we could 
move on with [PR#1937|https://github.com/apache/calcite/pull/1937]? Thanks.

> Support different string comparison based on SqlCollation
> -
>
> Key: CALCITE-3951
> URL: https://issues.apache.org/jira/browse/CALCITE-3951
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Ruben Q L
>Assignee: Ruben Q L
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.24.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently SqlCollation defines concepts like Coercibility, Charset, Locale, 
> etc. However, we cannot specify on a certain collation that e.g. a string 
> field should use case insensitive comparison. The goal of this ticket is to 
> evolve SqlCollation to support that, and adapt the corresponding classes to 
> use that (optional) "non-standard" comparison.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3951) Support different string comparison based on SqlCollation

2020-05-06 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100573#comment-17100573
 ] 

Stamatis Zampetakis commented on CALCITE-3951:
--

Hey [~rubenql], I am quite busy at the moment, will try to look over the 
weekend. 

> Support different string comparison based on SqlCollation
> -
>
> Key: CALCITE-3951
> URL: https://issues.apache.org/jira/browse/CALCITE-3951
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Ruben Q L
>Assignee: Ruben Q L
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently SqlCollation defines concepts like Coercibility, Charset, Locale, 
> etc. However, we cannot specify on a certain collation that e.g. a string 
> field should use case insensitive comparison. The goal of this ticket is to 
> evolve SqlCollation to support that, and adapt the corresponding classes to 
> use that (optional) "non-standard" comparison.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3951) Support different string comparison based on SqlCollation

2020-05-05 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099634#comment-17099634
 ] 

Ruben Q L commented on CALCITE-3951:


Thanks for you feedback [~zabetak].

bq. I am not sure if SqlCollation is the place to keep the comparison logic. 
Then, what could be the alternative?

As I see it, SqlCollation already represents a (simplified) notion of what is 
described in "4.2.2 Comparison of character strings":
- It stores a character set, and we do not allow the comparison of strings 
whose SqlCollation's charset are not the same (see 
SqlTypeUtil#isCharTypeComparable)
- It stores the "Coercibility" value (EXPLICIT, IMPLICIT, COERCIBLE, NONE), 
which reflects the "collation derivation" rules in the standard (see 
SqCollation#getCoercibilityDyadic* methods).

One thing that is missing is the actual string comparison process (the goal of 
the current ticket). I think that the easiest way to implement it with 
(hopefully) the minimum impact of the rest of Calcite classes and full 
backwards compatibility, was adding this information as a Collator in 
SqlCollation (as I mentioned in the PR, I would agree to annotate the feature 
as "Experimental" if we feel that this design might be reviewed in future 
versions). As you know ;) this is a quite urgent requirement for us, so it 
would be nice to have a functional implementation for this (even Experimental) 
in the next release. 


> Support different string comparison based on SqlCollation
> -
>
> Key: CALCITE-3951
> URL: https://issues.apache.org/jira/browse/CALCITE-3951
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Ruben Q L
>Assignee: Ruben Q L
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently SqlCollation defines concepts like Coercibility, Charset, Locale, 
> etc. However, we cannot specify on a certain collation that e.g. a string 
> field should use case insensitive comparison. The goal of this ticket is to 
> evolve SqlCollation to support that, and adapt the corresponding classes to 
> use that (optional) "non-standard" comparison.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3951) Support different string comparison based on SqlCollation

2020-04-27 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093270#comment-17093270
 ] 

Stamatis Zampetakis commented on CALCITE-3951:
--

Thanks for pushing this forward [~rubenql]. 

I am not sure if SqlCollation is the place to keep the comparison logic. 

Here is what the SQL standard says about comparisons of character strings.

*4.2.2 Comparison of character strings*

Two character strings are comparable if and only if either they have the same 
character set or there exists at
least one collation that is applicable to both their respective character sets 
(which is possible only if the character
sets share the same repertoire).

A collation is defined by [ISO14651] as “a process by which two strings are 
determined to be in exactly one
of the relationships of less than, greater than, or equal to one another”. Each 
collation known in an SQL-environment is applicable to one or more character 
sets, and for each character set, one or more collations are
applicable to it, one of which is associated with it as its character set 
collation.

Anything that has a declared type can, if that type is a character string type, 
be associated with a collation
applicable to its character set; this is known as a declared type collation. 
Every declared type that is a character
string type has a collation derivation, this being either none, implicit, or 
explicit. The collation derivation of a
declared type with a declared type collation that is explicitly or implicitly 
specified by a  is implicit.
If the collation derivation of a declared type that has a declared type 
collation is not implicit, then it is explicit.
The collation derivation of an expression of character string type that has no 
declared type collation is none.

An operation that explicitly or implicitly involves character string comparison 
is a character comparison
operation. At least one of the operands of a character comparison operation 
shall have a declared type collation.

There may be an SQL-session collation for some or all of the character sets 
known to the SQL-implementation
(see Subclause 4.38, “SQL-sessions”).

The collation used for a particular character comparison is specified by 
Subclause 9.15, “Collation determination”.

The comparison of two character string expressions depends on the collation 
used for the comparison (see
Subclause 9.15, “Collation determination”). When values of unequal length are 
compared, if the collation for
the comparison has the NO PAD characteristic and the shorter value is equal to 
some prefix of the longer value,
then the shorter value is considered less than the longer value. If the 
collation for the comparison has the PAD
SPACE characteristic, for the purposes of the comparison, the shorter value is 
effectively extended to the length
of the longer by concatenation of s on the right.

For every character set, there is at least one collation

> Support different string comparison based on SqlCollation
> -
>
> Key: CALCITE-3951
> URL: https://issues.apache.org/jira/browse/CALCITE-3951
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Ruben Q L
>Assignee: Ruben Q L
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently SqlCollation defines concepts like Coercibility, Charset, Locale, 
> etc. However, we cannot specify on a certain collation that e.g. a string 
> field should use case insensitive comparison. The goal of this ticket is to 
> evolve SqlCollation to support that, and adapt the corresponding classes to 
> use that (optional) "non-standard" comparison.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)