[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported

2021-05-17 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346144#comment-17346144
 ] 

Aleksey Plekhanov commented on IGNITE-14545:


[~tledkov-gridgain], thanks for the review! Merged to sql-calcite branch.

> Calcite engine. Unicode literal not supported
> -
>
> Key: IGNITE-14545
> URL: https://issues.apache.org/jira/browse/IGNITE-14545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Taras Ledkov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unicode literal not supported.
>  e.g. {{SELECT }}
> Tests:
>  {{aggregate/aggregates/test_aggr_string.test}}
> {{types/string/test_unicode.test_ignored}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported

2021-05-17 Thread Taras Ledkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346118#comment-17346118
 ] 

Taras Ledkov commented on IGNITE-14545:
---

[~alex_pl], the patch is OK with me. Please merge

> Calcite engine. Unicode literal not supported
> -
>
> Key: IGNITE-14545
> URL: https://issues.apache.org/jira/browse/IGNITE-14545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Taras Ledkov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unicode literal not supported.
>  e.g. {{SELECT }}
> Tests:
>  {{aggregate/aggregates/test_aggr_string.test}}
> {{types/string/test_unicode.test_ignored}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported

2021-05-04 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338855#comment-17338855
 ] 

Aleksey Plekhanov commented on IGNITE-14545:


[~tledkov-gridgain], I've fixed your comments, please have a look again.

> Calcite engine. Unicode literal not supported
> -
>
> Key: IGNITE-14545
> URL: https://issues.apache.org/jira/browse/IGNITE-14545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Taras Ledkov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unicode literal not supported.
>  e.g. {{SELECT }}
> Tests:
>  {{aggregate/aggregates/test_aggr_string.test}}
> {{types/string/test_unicode.test_ignored}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported

2021-04-30 Thread Taras Ledkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337428#comment-17337428
 ] 

Taras Ledkov commented on IGNITE-14545:
---

[~alex_pl], the patch is OK with me.
Please fix minors:
1. {{test_unicode.test_ignored}} contains reference to this ticket too. Please 
fix the test.
2. I modified any scripts (includes {{test_aggr_string.test}}) before team 
generates a style for {{_ignored}} test.
We didn't have time to create small document. 
But the style rule is short and easy:
It seems to be more convenient when a passed test script is *a subset *of the 
full script that marked as {{_ignored}}.
In this case a simple diff between {{

[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported

2021-04-22 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329176#comment-17329176
 ] 

Aleksey Plekhanov commented on IGNITE-14545:


{quote}it's not true. We need to override only those functions sensible to the 
size of the character, like length, substring, strpos
{quote}
All mentioned above functions are sensitive to the size of the character.

Currently, with H2 (which uses {{v.getString().length()}}) we have the same 
issue, so I think it's ok to have default Calcite implemented string function 
and provide some new functions in the future to deal with 32-bit characters (by 
separate low-priority ticket).  

> Calcite engine. Unicode literal not supported
> -
>
> Key: IGNITE-14545
> URL: https://issues.apache.org/jira/browse/IGNITE-14545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Taras Ledkov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unicode literal not supported.
>  e.g. {{SELECT }}
> Tests:
>  {{aggregate/aggregates/test_aggr_string.test}}
> {{types/string/test_unicode.test_ignored}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported

2021-04-22 Thread Konstantin Orlov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329126#comment-17329126
 ] 

Konstantin Orlov commented on IGNITE-14545:
---

# it's true, so may be it would be better to toggle supporting somehow or even 
add under different name
 # it's not true. We need to override only those functions sensible to the size 
of the character, like length, substring, strpos.

In fact, I don't mind to leave current solution. But then we need to create 
separate ticket with lower priority to fix the rest issues related to the 
strings containing 32-bit characters.

> Calcite engine. Unicode literal not supported
> -
>
> Key: IGNITE-14545
> URL: https://issues.apache.org/jira/browse/IGNITE-14545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Taras Ledkov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unicode literal not supported.
>  e.g. {{SELECT }}
> Tests:
>  {{aggregate/aggregates/test_aggr_string.test}}
> {{types/string/test_unicode.test_ignored}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported

2021-04-22 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17328580#comment-17328580
 ] 

Aleksey Plekhanov commented on IGNITE-14545:


# It will work much slower them standard functions for all cases (including 
cases where there are no 32-bit chars in the string)
 # In this case, we should write our own implementation for all string 
functions in Calcite (including used SqlLibrary dialects). There are plenty of 
such functions and their aliases (char_length, character_length, substring, 
lpad, rpad, locate, strpos, left, right, etc) and there is a chance to miss 
something, or chance that some new string function will be added to calcite.  

> Calcite engine. Unicode literal not supported
> -
>
> Key: IGNITE-14545
> URL: https://issues.apache.org/jira/browse/IGNITE-14545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Taras Ledkov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unicode literal not supported.
>  e.g. {{SELECT }}
> Tests:
>  {{aggregate/aggregates/test_aggr_string.test}}
> {{types/string/test_unicode.test_ignored}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported

2021-04-22 Thread Konstantin Orlov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327264#comment-17327264
 ] 

Konstantin Orlov commented on IGNITE-14545:
---

There is relatively easy way to support 32-bit characters too. We could provide 
{{SqlOperatorTables.chain}} with our custom table first. And for string related 
function we have to deal with code points, not chars. Here is how we could 
derive a proper string length with regards to 32-chars: {{str.codePointCount(0, 
str.length())}}. And here is snippets for substring: 
{{str.substring(str.offsetByCodePoints(0, from), str.offsetByCodePoints(0, from 
+ len))}}. But I don't 100% sure about this approach.

> Calcite engine. Unicode literal not supported
> -
>
> Key: IGNITE-14545
> URL: https://issues.apache.org/jira/browse/IGNITE-14545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Taras Ledkov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unicode literal not supported.
>  e.g. {{SELECT }}
> Tests:
>  {{aggregate/aggregates/test_aggr_string.test}}
> {{types/string/test_unicode.test_ignored}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported

2021-04-21 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326584#comment-17326584
 ] 

Aleksey Plekhanov commented on IGNITE-14545:


It's relatively easy to support 16-bit unicode characters (I've raised the pull 
request). But there is limited support for 32-bit characters, since java uses 
16-bit Char array for strings. Length, substring, and some other methods treat 
each 32-bit character as two 16-bit characters. For example, the result of 
{{"🦆".length()}} will be 2. String functions in calcite reuse java {{String}} 
methods and have the same problem. The result of {{SELECT 
CHAR_LENGTH('{{🦆}}')}} will be 2. And we cannot change this behavior without 
rewriting Calcite string functions. I'm not sure we need it right now.

> Calcite engine. Unicode literal not supported
> -
>
> Key: IGNITE-14545
> URL: https://issues.apache.org/jira/browse/IGNITE-14545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Taras Ledkov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unicode literal not supported.
>  e.g. {{SELECT }}
> Tests:
>  {{aggregate/aggregates/test_aggr_string.test}}
> {{types/string/test_unicode.test_ignored}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported

2021-04-15 Thread Konstantin Orlov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322172#comment-17322172
 ] 

Konstantin Orlov commented on IGNITE-14545:
---

Worth to note that unicode string should be supported not only in terms of 
storing/deriving, but every string function (like {{substring}} or 
{{character_length}}) should be able to properly deal with strings with 
multi-codepoint characters

> Calcite engine. Unicode literal not supported
> -
>
> Key: IGNITE-14545
> URL: https://issues.apache.org/jira/browse/IGNITE-14545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Taras Ledkov
>Priority: Major
>
> Unicode literal not supported.
> e.g. {{SELECT }}
> Tests:
> {{aggregate/aggregates/test_aggr_string.test}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)