[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported
[ https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346144#comment-17346144 ] Aleksey Plekhanov commented on IGNITE-14545: [~tledkov-gridgain], thanks for the review! Merged to sql-calcite branch. > Calcite engine. Unicode literal not supported > - > > Key: IGNITE-14545 > URL: https://issues.apache.org/jira/browse/IGNITE-14545 > Project: Ignite > Issue Type: Bug >Reporter: Taras Ledkov >Assignee: Aleksey Plekhanov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Unicode literal not supported. > e.g. {{SELECT }} > Tests: > {{aggregate/aggregates/test_aggr_string.test}} > {{types/string/test_unicode.test_ignored}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported
[ https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346118#comment-17346118 ] Taras Ledkov commented on IGNITE-14545: --- [~alex_pl], the patch is OK with me. Please merge > Calcite engine. Unicode literal not supported > - > > Key: IGNITE-14545 > URL: https://issues.apache.org/jira/browse/IGNITE-14545 > Project: Ignite > Issue Type: Bug >Reporter: Taras Ledkov >Assignee: Aleksey Plekhanov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Unicode literal not supported. > e.g. {{SELECT }} > Tests: > {{aggregate/aggregates/test_aggr_string.test}} > {{types/string/test_unicode.test_ignored}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported
[ https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338855#comment-17338855 ] Aleksey Plekhanov commented on IGNITE-14545: [~tledkov-gridgain], I've fixed your comments, please have a look again. > Calcite engine. Unicode literal not supported > - > > Key: IGNITE-14545 > URL: https://issues.apache.org/jira/browse/IGNITE-14545 > Project: Ignite > Issue Type: Bug >Reporter: Taras Ledkov >Assignee: Aleksey Plekhanov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Unicode literal not supported. > e.g. {{SELECT }} > Tests: > {{aggregate/aggregates/test_aggr_string.test}} > {{types/string/test_unicode.test_ignored}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported
[ https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337428#comment-17337428 ] Taras Ledkov commented on IGNITE-14545: --- [~alex_pl], the patch is OK with me. Please fix minors: 1. {{test_unicode.test_ignored}} contains reference to this ticket too. Please fix the test. 2. I modified any scripts (includes {{test_aggr_string.test}}) before team generates a style for {{_ignored}} test. We didn't have time to create small document. But the style rule is short and easy: It seems to be more convenient when a passed test script is *a subset *of the full script that marked as {{_ignored}}. In this case a simple diff between {{
[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported
[ https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329176#comment-17329176 ] Aleksey Plekhanov commented on IGNITE-14545: {quote}it's not true. We need to override only those functions sensible to the size of the character, like length, substring, strpos {quote} All mentioned above functions are sensitive to the size of the character. Currently, with H2 (which uses {{v.getString().length()}}) we have the same issue, so I think it's ok to have default Calcite implemented string function and provide some new functions in the future to deal with 32-bit characters (by separate low-priority ticket). > Calcite engine. Unicode literal not supported > - > > Key: IGNITE-14545 > URL: https://issues.apache.org/jira/browse/IGNITE-14545 > Project: Ignite > Issue Type: Bug >Reporter: Taras Ledkov >Assignee: Aleksey Plekhanov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Unicode literal not supported. > e.g. {{SELECT }} > Tests: > {{aggregate/aggregates/test_aggr_string.test}} > {{types/string/test_unicode.test_ignored}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported
[ https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329126#comment-17329126 ] Konstantin Orlov commented on IGNITE-14545: --- # it's true, so may be it would be better to toggle supporting somehow or even add under different name # it's not true. We need to override only those functions sensible to the size of the character, like length, substring, strpos. In fact, I don't mind to leave current solution. But then we need to create separate ticket with lower priority to fix the rest issues related to the strings containing 32-bit characters. > Calcite engine. Unicode literal not supported > - > > Key: IGNITE-14545 > URL: https://issues.apache.org/jira/browse/IGNITE-14545 > Project: Ignite > Issue Type: Bug >Reporter: Taras Ledkov >Assignee: Aleksey Plekhanov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Unicode literal not supported. > e.g. {{SELECT }} > Tests: > {{aggregate/aggregates/test_aggr_string.test}} > {{types/string/test_unicode.test_ignored}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported
[ https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17328580#comment-17328580 ] Aleksey Plekhanov commented on IGNITE-14545: # It will work much slower them standard functions for all cases (including cases where there are no 32-bit chars in the string) # In this case, we should write our own implementation for all string functions in Calcite (including used SqlLibrary dialects). There are plenty of such functions and their aliases (char_length, character_length, substring, lpad, rpad, locate, strpos, left, right, etc) and there is a chance to miss something, or chance that some new string function will be added to calcite. > Calcite engine. Unicode literal not supported > - > > Key: IGNITE-14545 > URL: https://issues.apache.org/jira/browse/IGNITE-14545 > Project: Ignite > Issue Type: Bug >Reporter: Taras Ledkov >Assignee: Aleksey Plekhanov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Unicode literal not supported. > e.g. {{SELECT }} > Tests: > {{aggregate/aggregates/test_aggr_string.test}} > {{types/string/test_unicode.test_ignored}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported
[ https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327264#comment-17327264 ] Konstantin Orlov commented on IGNITE-14545: --- There is relatively easy way to support 32-bit characters too. We could provide {{SqlOperatorTables.chain}} with our custom table first. And for string related function we have to deal with code points, not chars. Here is how we could derive a proper string length with regards to 32-chars: {{str.codePointCount(0, str.length())}}. And here is snippets for substring: {{str.substring(str.offsetByCodePoints(0, from), str.offsetByCodePoints(0, from + len))}}. But I don't 100% sure about this approach. > Calcite engine. Unicode literal not supported > - > > Key: IGNITE-14545 > URL: https://issues.apache.org/jira/browse/IGNITE-14545 > Project: Ignite > Issue Type: Bug >Reporter: Taras Ledkov >Assignee: Aleksey Plekhanov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Unicode literal not supported. > e.g. {{SELECT }} > Tests: > {{aggregate/aggregates/test_aggr_string.test}} > {{types/string/test_unicode.test_ignored}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported
[ https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326584#comment-17326584 ] Aleksey Plekhanov commented on IGNITE-14545: It's relatively easy to support 16-bit unicode characters (I've raised the pull request). But there is limited support for 32-bit characters, since java uses 16-bit Char array for strings. Length, substring, and some other methods treat each 32-bit character as two 16-bit characters. For example, the result of {{"🦆".length()}} will be 2. String functions in calcite reuse java {{String}} methods and have the same problem. The result of {{SELECT CHAR_LENGTH('{{🦆}}')}} will be 2. And we cannot change this behavior without rewriting Calcite string functions. I'm not sure we need it right now. > Calcite engine. Unicode literal not supported > - > > Key: IGNITE-14545 > URL: https://issues.apache.org/jira/browse/IGNITE-14545 > Project: Ignite > Issue Type: Bug >Reporter: Taras Ledkov >Assignee: Aleksey Plekhanov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Unicode literal not supported. > e.g. {{SELECT }} > Tests: > {{aggregate/aggregates/test_aggr_string.test}} > {{types/string/test_unicode.test_ignored}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14545) Calcite engine. Unicode literal not supported
[ https://issues.apache.org/jira/browse/IGNITE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322172#comment-17322172 ] Konstantin Orlov commented on IGNITE-14545: --- Worth to note that unicode string should be supported not only in terms of storing/deriving, but every string function (like {{substring}} or {{character_length}}) should be able to properly deal with strings with multi-codepoint characters > Calcite engine. Unicode literal not supported > - > > Key: IGNITE-14545 > URL: https://issues.apache.org/jira/browse/IGNITE-14545 > Project: Ignite > Issue Type: Bug >Reporter: Taras Ledkov >Priority: Major > > Unicode literal not supported. > e.g. {{SELECT }} > Tests: > {{aggregate/aggregates/test_aggr_string.test}} -- This message was sent by Atlassian Jira (v8.3.4#803005)