[jira] [Commented] (CALCITE-3933) Incorrect SQL Emitted for Unicode for Several Dialects
[ https://issues.apache.org/jira/browse/CALCITE-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776007#comment-17776007 ] LakeShen commented on CALCITE-3933: --- I have saw the source code,and I think that I could try to fix this problem. Hi [~aryeh] ,do you want to fix this problem?If not ,maybe I could try to fix this problem. > Incorrect SQL Emitted for Unicode for Several Dialects > -- > > Key: CALCITE-3933 > URL: https://issues.apache.org/jira/browse/CALCITE-3933 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.22.0 > Environment: master with latest commit on April 15 ( > dfb842e55e1fa7037c8a731341010ed1c0cfb6f7) >Reporter: Aryeh Hillman >Priority: Major > > A string literal like "schön" should emit "schön" in SQL for many dialects, > but instead emits > {code:java} > u&'sch\\00f6n' {code} > which is (ISO-8859-1 ASCII). > It's possible that some of the above dialects may support ISO-8859, but in my > tests with *BigQuery Standard SQL*, *MySQL*, and *Redshift* engines, the > following fails: > {code:java} > select u&'sch\\00f6n';{code} > But this succeeds: > {code:java} > select 'schön'; {code} > Test that demonstrates (add to > `org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java` and run from > there): > {code:java} > @Test void testBigQueryUnicode() { > final Function relFn = b -> > b.scan("EMP") > .filter( > b.call(SqlStdOperatorTable.IN, b.field("ENAME"), > b.literal("schön"))) > .build(); > final String expectedSql = "SELECT *\n" + > "FROM scott.EMP\n" + > "WHERE ENAME IN ('schön')"; > relFn(relFn).withBigQuery().ok(expectedSql); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-6041) Map query failed with NullPointerException in runtime phase
[ https://issues.apache.org/jira/browse/CALCITE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ran Tao reassigned CALCITE-6041: Assignee: Ran Tao > Map query failed with NullPointerException in runtime phase > --- > > Key: CALCITE-6041 > URL: https://issues.apache.org/jira/browse/CALCITE-6041 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Ran Tao >Assignee: Ran Tao >Priority: Major > > calcite support array/map/multiset query constructor, but if we run map query > such as: > {code:java} > select map(select 1, 2) > select map(select empno, deptno from emps); {code} > If will cause exception: > {noformat} > java.sql.SQLException: Error while executing SQL "select map(select 1, 2)": > Unable to implement EnumerableNestedLoopJoin(condition=[true], > joinType=[semi]): rowcount = 1.0, cumulative cost = \{13.0 rows, 3.0 cpu, 0.0 > io}, id = 72 > EnumerableCollect(field=[x]): rowcount = 1.0, cumulative cost = \{2.0 rows, > 2.0 cpu, 0.0 io}, id = 69 > EnumerableValues(tuples=[[\\{ 1, 2 }]]): rowcount = 1.0, cumulative cost > = \{1.0 rows, 1.0 cpu, 0.0 io}, id = 38 > EnumerableValues(tuples=[[\\{ 0 }]]): rowcount = 1.0, cumulative cost = > \{1.0 rows, 1.0 cpu, 0.0 io}, id = 35 > at org.apache.calcite.avatica.Helper.createException(Helper.java:56) > at org.apache.calcite.avatica.Helper.createException(Helper.java:41) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:164) > at > org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:228) > at > org.apache.calcite.test.SqlOperatorTest$TesterImpl.check(SqlOperatorTest.java:13107) > at > org.apache.calcite.sql.test.SqlOperatorFixture.check(SqlOperatorFixture.java:439) > at > org.apache.calcite.sql.test.SqlOperatorFixture.check(SqlOperatorFixture.java:415) > at > org.apache.calcite.sql.test.SqlOperatorFixture.check(SqlOperatorFixture.java:420) > at > org.apache.calcite.test.SqlOperatorTest.testMapQueryConstructor(SqlOperatorTest.java:10235) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) > at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:217) > at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:213) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:138) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:68) > at >
[jira] [Commented] (CALCITE-6052) SqlImplementor writes FLOATING POINT literals as DECIMAL literals
[ https://issues.apache.org/jira/browse/CALCITE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775982#comment-17775982 ] Julian Hyde commented on CALCITE-6052: -- Good idea. I like small changes. I suggest changing 'FLOATING POINT' to 'REAL, FLOAT and DOUBLE' in the summary, since those are the specific SQL type names. One test in the Pig module isn't sufficient. Can you add a test, with one column for each type, in SqlToRelConverterTest. Do a quick experiment to see whether IEEE special values work (+inf, -inf, nan, -0), and if so, add them to the test. I don't recall whether SQL supports those values, but the test could at least document our current behavior. > SqlImplementor writes FLOATING POINT literals as DECIMAL literals > - > > Key: CALCITE-6052 > URL: https://issues.apache.org/jira/browse/CALCITE-6052 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Minor > Labels: pull-request-available > Fix For: 1.36.0 > > > This bug is already fixed in https://github.com/apache/calcite/pull/3411, but > I plan to submit a smaller point fix for it, which doesn't require reworking > the type families. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6045) CURRENT_TIMESTAMP has incorrect return type
[ https://issues.apache.org/jira/browse/CALCITE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775977#comment-17775977 ] Julian Hyde commented on CALCITE-6045: -- I think that Calcite's {{CURRENT_TIMESTAMP}} function should have type {{TIMESTAMP WITH LOCAL TIME ZONE}}, which means that (after type alias translation) it will have the requisite type for BigQuery (what BigQuery calls {{TIMESTAMP}}). (That is a change to current behavior, and a departure from the ISO standard, but still the best type, in my opinion.) If people want a {{TIMESTAMP}} (what BigQuery calls {{DATETIME}}) they can call {{LOCALTIMESTAMP}}, whose behavior will be unchanged. > CURRENT_TIMESTAMP has incorrect return type > --- > > Key: CALCITE-6045 > URL: https://issues.apache.org/jira/browse/CALCITE-6045 > Project: Calcite > Issue Type: Bug >Reporter: Tanner Clary >Priority: Major > > When trying to work on CALCITE-6021, I noticed that {{CURRENT_TIMESTAMP}} > currently returns type {{TIMESTAMP}} when it should be > {{TIMESTAMP_WITH_LOCAL_TIME_ZONE}}. > After modifying it, I noticed function was returning the time from (UTC - > System TZ) hours ago. For example, I am in {{America/Los_Angeles}} and if I > called the function at {{2023-10-10 13:28:00 America/Los_Angeles}}, it would > return {{2023-10-10 06:28:00 America/Los_Angeles}}. > I think this is because the DataContext {{CURRENT_TIMESTAMP}} variable, which > is meant to represent milliseconds since epoch UTC, actually has the timezone > offset applied in {{CalciteConnectionImpl#DataContextImpl}} > [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L442]. > To be clear: it is meant to represent millis since epoch UTC, but instead it > is millis since epoch [system tz], as I understand it. > Additionally, I believe the {{getString()}} method for timestamps in > AvaticaResultSet should behave similarly to > [{{SqlFunctions#timestampWithLocalTimezoneToString()}}|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4021] > when dealing with a {{TIMESTAMP WITH LOCAL TIME ZONE}}. Right now, it does > not take the timezone into consideration so although it represents the > accurate instant in time, it displays differently than > {{CAST(CURRENT_TIMESTAMP AS VARCHAR)}}. > For example, {{SELECT CURRENT_TIMESTAMP, CAST(CURRENT_TIMESTAMP AS > VARCHAR)}}, with the correct return type, returns something like: > {{2023-10-10 13:28:00 | 2023-10-10 06:28:00.000 America/Los_Angeles}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-3933) Incorrect SQL Emitted for Unicode for Several Dialects
[ https://issues.apache.org/jira/browse/CALCITE-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775957#comment-17775957 ] LakeShen commented on CALCITE-3933: --- Maybe we could according to SqlDialect#databaseProduct's type, writing different behavior in `SqlDialect#quoteStringLiteralUnicode` method. > Incorrect SQL Emitted for Unicode for Several Dialects > -- > > Key: CALCITE-3933 > URL: https://issues.apache.org/jira/browse/CALCITE-3933 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.22.0 > Environment: master with latest commit on April 15 ( > dfb842e55e1fa7037c8a731341010ed1c0cfb6f7) >Reporter: Aryeh Hillman >Priority: Major > > A string literal like "schön" should emit "schön" in SQL for many dialects, > but instead emits > {code:java} > u&'sch\\00f6n' {code} > which is (ISO-8859-1 ASCII). > It's possible that some of the above dialects may support ISO-8859, but in my > tests with *BigQuery Standard SQL*, *MySQL*, and *Redshift* engines, the > following fails: > {code:java} > select u&'sch\\00f6n';{code} > But this succeeds: > {code:java} > select 'schön'; {code} > Test that demonstrates (add to > `org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java` and run from > there): > {code:java} > @Test void testBigQueryUnicode() { > final Function relFn = b -> > b.scan("EMP") > .filter( > b.call(SqlStdOperatorTable.IN, b.field("ENAME"), > b.literal("schön"))) > .build(); > final String expectedSql = "SELECT *\n" + > "FROM scott.EMP\n" + > "WHERE ENAME IN ('schön')"; > relFn(relFn).withBigQuery().ok(expectedSql); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6045) CURRENT_TIMESTAMP has incorrect return type
[ https://issues.apache.org/jira/browse/CALCITE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775934#comment-17775934 ] Tanner Clary commented on CALCITE-6045: --- For FLOOR and CEIL, we added some logic to the parser that checks the current conformance and if it is BQ, then use the BQ-specific operator. I think Jerin is dealing with something similar and a lot of the suggestions (that I have had, at least) involve checking the conformance. Then maybe you could have an operator like CURRENT_TIMESTAMP_BQ or something similar. See FLOOR/CEIL like I mentioned or SUBSTR. If there's another difference between the operators, like operand count, which I don't think is applicable here, you could also use that instead. > CURRENT_TIMESTAMP has incorrect return type > --- > > Key: CALCITE-6045 > URL: https://issues.apache.org/jira/browse/CALCITE-6045 > Project: Calcite > Issue Type: Bug >Reporter: Tanner Clary >Priority: Major > > When trying to work on CALCITE-6021, I noticed that {{CURRENT_TIMESTAMP}} > currently returns type {{TIMESTAMP}} when it should be > {{TIMESTAMP_WITH_LOCAL_TIME_ZONE}}. > After modifying it, I noticed function was returning the time from (UTC - > System TZ) hours ago. For example, I am in {{America/Los_Angeles}} and if I > called the function at {{2023-10-10 13:28:00 America/Los_Angeles}}, it would > return {{2023-10-10 06:28:00 America/Los_Angeles}}. > I think this is because the DataContext {{CURRENT_TIMESTAMP}} variable, which > is meant to represent milliseconds since epoch UTC, actually has the timezone > offset applied in {{CalciteConnectionImpl#DataContextImpl}} > [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L442]. > To be clear: it is meant to represent millis since epoch UTC, but instead it > is millis since epoch [system tz], as I understand it. > Additionally, I believe the {{getString()}} method for timestamps in > AvaticaResultSet should behave similarly to > [{{SqlFunctions#timestampWithLocalTimezoneToString()}}|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4021] > when dealing with a {{TIMESTAMP WITH LOCAL TIME ZONE}}. Right now, it does > not take the timezone into consideration so although it represents the > accurate instant in time, it displays differently than > {{CAST(CURRENT_TIMESTAMP AS VARCHAR)}}. > For example, {{SELECT CURRENT_TIMESTAMP, CAST(CURRENT_TIMESTAMP AS > VARCHAR)}}, with the correct return type, returns something like: > {{2023-10-10 13:28:00 | 2023-10-10 06:28:00.000 America/Los_Angeles}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6045) CURRENT_TIMESTAMP has incorrect return type
[ https://issues.apache.org/jira/browse/CALCITE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775930#comment-17775930 ] Will Noble commented on CALCITE-6045: - This would seem to be a case where the standard SQL function {{CURRENT_TIMESTAMP}} (meant to return a {{{}TIMESTAMP WITH TIME ZONE}}) has a name collision with the BigQuery-specific function {{CURRENT_TIMESTAMP}} (mean to return a {{{}TIMESTAMP WITH LOCAL TIME ZONE}} a.k.a. BigQuery-specific {{TIMESTAMP}}). Is there a standard procedure for handling function name collisions between standard SQL and particular dialects? > CURRENT_TIMESTAMP has incorrect return type > --- > > Key: CALCITE-6045 > URL: https://issues.apache.org/jira/browse/CALCITE-6045 > Project: Calcite > Issue Type: Bug >Reporter: Tanner Clary >Priority: Major > > When trying to work on CALCITE-6021, I noticed that {{CURRENT_TIMESTAMP}} > currently returns type {{TIMESTAMP}} when it should be > {{TIMESTAMP_WITH_LOCAL_TIME_ZONE}}. > After modifying it, I noticed function was returning the time from (UTC - > System TZ) hours ago. For example, I am in {{America/Los_Angeles}} and if I > called the function at {{2023-10-10 13:28:00 America/Los_Angeles}}, it would > return {{2023-10-10 06:28:00 America/Los_Angeles}}. > I think this is because the DataContext {{CURRENT_TIMESTAMP}} variable, which > is meant to represent milliseconds since epoch UTC, actually has the timezone > offset applied in {{CalciteConnectionImpl#DataContextImpl}} > [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L442]. > To be clear: it is meant to represent millis since epoch UTC, but instead it > is millis since epoch [system tz], as I understand it. > Additionally, I believe the {{getString()}} method for timestamps in > AvaticaResultSet should behave similarly to > [{{SqlFunctions#timestampWithLocalTimezoneToString()}}|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4021] > when dealing with a {{TIMESTAMP WITH LOCAL TIME ZONE}}. Right now, it does > not take the timezone into consideration so although it represents the > accurate instant in time, it displays differently than > {{CAST(CURRENT_TIMESTAMP AS VARCHAR)}}. > For example, {{SELECT CURRENT_TIMESTAMP, CAST(CURRENT_TIMESTAMP AS > VARCHAR)}}, with the correct return type, returns something like: > {{2023-10-10 13:28:00 | 2023-10-10 06:28:00.000 America/Los_Angeles}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6052) SqlImplementor writes FLOATING POINT literals as DECIMAL literals
[ https://issues.apache.org/jira/browse/CALCITE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu updated CALCITE-6052: - Summary: SqlImplementor writes FLOATING POINT literals as DECIMAL literals (was: reltosql writes FLOATING POINT literals as DECIMAL literals) > SqlImplementor writes FLOATING POINT literals as DECIMAL literals > - > > Key: CALCITE-6052 > URL: https://issues.apache.org/jira/browse/CALCITE-6052 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Minor > Labels: pull-request-available > Fix For: 1.36.0 > > > This bug is already fixed in https://github.com/apache/calcite/pull/3411, but > I plan to submit a smaller point fix for it, which doesn't require reworking > the type families. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6052) reltosql writes FLOATING POINT literals as DECIMAL literals
[ https://issues.apache.org/jira/browse/CALCITE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu updated CALCITE-6052: - Fix Version/s: 1.36.0 > reltosql writes FLOATING POINT literals as DECIMAL literals > --- > > Key: CALCITE-6052 > URL: https://issues.apache.org/jira/browse/CALCITE-6052 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Minor > Labels: pull-request-available > Fix For: 1.36.0 > > > This bug is already fixed in https://github.com/apache/calcite/pull/3411, but > I plan to submit a smaller point fix for it, which doesn't require reworking > the type families. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6052) reltosql writes FLOATING POINT literals as DECIMAL literals
[ https://issues.apache.org/jira/browse/CALCITE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CALCITE-6052: Labels: pull-request-available (was: ) > reltosql writes FLOATING POINT literals as DECIMAL literals > --- > > Key: CALCITE-6052 > URL: https://issues.apache.org/jira/browse/CALCITE-6052 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Minor > Labels: pull-request-available > > This bug is already fixed in https://github.com/apache/calcite/pull/3411, but > I plan to submit a smaller point fix for it, which doesn't require reworking > the type families. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-5987) SqlImplementor loses type information for literals
[ https://issues.apache.org/jira/browse/CALCITE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu reassigned CALCITE-5987: Assignee: Mihai Budiu > SqlImplementor loses type information for literals > -- > > Key: CALCITE-5987 > URL: https://issues.apache.org/jira/browse/CALCITE-5987 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Minor > > When converting a SqlNode to a String query, the conversion can produce SQL > that computes different results. This happens because literals do not carry > type information in the result string. For example, this plan: > {code} > rel#7:LogicalValues.(type=RecordType(VARCHAR(3) EXPR$0),tuples=[{ 'A' }]) > {code} > will generate a SQL query: > {code} > SELECT 'A' > {code} > While the type of the former result is VARCHAR(3), the latter query produces > a CHAR(1) result. > It would be nice if SqlImplementor had an option to produce a query that > preserves the output type, e.g.: > {code} > SELECT (CAST 'A' as VARCHAR(3)) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-5891) Create a test fixture that would apply PROJECT_REDUCE_EXPRESSIONS to all tests in SqlOperatorTest
[ https://issues.apache.org/jira/browse/CALCITE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu reassigned CALCITE-5891: Assignee: Mihai Budiu > Create a test fixture that would apply PROJECT_REDUCE_EXPRESSIONS to all > tests in SqlOperatorTest > -- > > Key: CALCITE-5891 > URL: https://issues.apache.org/jira/browse/CALCITE-5891 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Major > > SqlOperatorTest has many tests, including end-to-end tests. > However, none of these tests exercise the PROJECT_REDUCE_EXPRESSION rules, > which often produce different results than these tests for constant > expressions. > Ideally we should be able to subclass SqlOperatorTest and use a fixture that > also applies this optimization prior to evaluation. > I have marked this as a {*}major priority{*}, because I suspect it would > catch many bugs with minimal effort. (I have found at least 10 so far.) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-5884) Description of ARRAY_TO_STRING function is incomplete
[ https://issues.apache.org/jira/browse/CALCITE-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu reassigned CALCITE-5884: Assignee: Mihai Budiu > Description of ARRAY_TO_STRING function is incomplete > - > > Key: CALCITE-5884 > URL: https://issues.apache.org/jira/browse/CALCITE-5884 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Trivial > Labels: pull-request-available > Fix For: 1.36.0 > > > This is the current definition of the function ARRAY_TO_STRING in > SqlLibraryOperators: > {code:java} > /** The "ARRAY_TO_STRING(array, delimiter [, nullText ])" function. */ > @LibraryOperator(libraries = {BIG_QUERY}) > public static final SqlFunction ARRAY_TO_STRING = > SqlBasicFunction.create(SqlKind.ARRAY_TO_STRING, > ReturnTypes.VARCHAR_NULLABLE, > OperandTypes.STRING_ARRAY_CHARACTER_OPTIONAL_CHARACTER); > {code} > So the result is nullable if any of the arguments is nullable. However, the > nullability of the last argument does not influence the result nullabillity: > a NULL value for the third optional argument will not cause a NULL value to > be output. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-5998) The SAFE_OFFSET operator can cause an index out of bounds exception
[ https://issues.apache.org/jira/browse/CALCITE-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu reassigned CALCITE-5998: Assignee: Tanner Clary > The SAFE_OFFSET operator can cause an index out of bounds exception > --- > > Key: CALCITE-5998 > URL: https://issues.apache.org/jira/browse/CALCITE-5998 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Tanner Clary >Priority: Minor > > The following query, when added as a SqlOperatorTest: > {code:sql} > select ARRAY[p3,p2,p1][SAFE_OFFSET(p0)] from (values (-1, 6, 4, 2)) as t(p0, > p1, p2, p3) > {code} > causes an exception. Here is the top of the stack trace: > {code:java} > Array index -1 is out of bounds > org.apache.calcite.runtime.CalciteException: Array index -1 is out of bounds > at > java.base@11.0.18/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base@11.0.18/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > java.base@11.0.18/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > java.base@11.0.18/java.lang.reflect.Constructor.newInstance(Constructor.java:490) > at > app//org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:507) > at > app//org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:601) > at > app//org.apache.calcite.runtime.SqlFunctions.arrayItem(SqlFunctions.java:4742) > at > app//org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(SqlFunctions.java:4780) > at Baz$1$1.current(Unknown Source) > at > app//org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.next(Linq4j.java:687) > at > app//org.apache.calcite.avatica.util.IteratorCursor.next(IteratorCursor.java:46) > at > app//org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:219) > at > app//org.apache.calcite.sql.test.ResultCheckers.compareResultSet(ResultCheckers.java:128) > at > app//org.apache.calcite.sql.test.ResultCheckers$RefSetResultChecker.checkResult(ResultCheckers.java:336) > at > app//org.apache.calcite.test.SqlOperatorTest$TesterImpl.check(SqlOperatorTest.java:12987) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-5986) The typeFamily property of SqlTypeName is used inconsistently
[ https://issues.apache.org/jira/browse/CALCITE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu reassigned CALCITE-5986: Assignee: Mihai Budiu > The typeFamily property of SqlTypeName is used inconsistently > - > > Key: CALCITE-5986 > URL: https://issues.apache.org/jira/browse/CALCITE-5986 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Minor > Labels: pull-request-available > > In SqlTypeFamily we have this code: > {code:java} > private static final Map JDBC_TYPE_TO_FAMILY = > ... > .put(Types.FLOAT, NUMERIC) > .put(Types.REAL, NUMERIC) > .put(Types.DOUBLE, NUMERIC) > {code} > But it looks to me like the type family should be APPROXIMATE_NUMERIC. > This impacts the way RelToSqlConverter works, for instance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-6014) Create a SqlOperatorFixture that parses, unparses, and then parses again before executing
[ https://issues.apache.org/jira/browse/CALCITE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu reassigned CALCITE-6014: Assignee: Mihai Budiu > Create a SqlOperatorFixture that parses, unparses, and then parses again > before executing > - > > Key: CALCITE-6014 > URL: https://issues.apache.org/jira/browse/CALCITE-6014 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Minor > Labels: pull-request-available > Fix For: 1.36.0 > > > Such a fixture will help catch bugs in the unparsing code. > Several bugs were found using this technique, e.g., CALCITE-5997. > This is related to CALCITE-5891, CALCITE-6000. > The SqlParserFixture UnparsingTesterImpl provides a similar service, but > since it does not validate the code after unparsing, it will catch fewer bugs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-6029) SqlOperatorTest cannot test operators that require the Babel parser
[ https://issues.apache.org/jira/browse/CALCITE-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu reassigned CALCITE-6029: Assignee: Mihai Budiu > SqlOperatorTest cannot test operators that require the Babel parser > --- > > Key: CALCITE-6029 > URL: https://issues.apache.org/jira/browse/CALCITE-6029 > Project: Calcite > Issue Type: Bug > Components: babel, core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Minor > Labels: pull-request-available > > In SqlOperatorTest one can write code like this: > {code:java} > @Test void testDatePart() { > final SqlOperatorFixture f = fixture().withLibrary(SqlLibrary.POSTGRESQL) > .withParserConfig(p -> > p.withParserFactory(SqlBabelParserImpl.FACTORY)); > {code} > This almost works, but the SqlOperatorTest.check function makes a connection > ignores the parserFactory, so parsing will fail: > {code:java} > @Override public void check(SqlTestFactory factory, String query, > SqlTester.TypeChecker typeChecker, > SqlTester.ParameterChecker parameterChecker, > SqlTester.ResultChecker resultChecker) { > super.check(factory, query, typeChecker, parameterChecker, > resultChecker); > final RelDataTypeSystem typeSystem = > factory.typeSystemTransform.apply(RelDataTypeSystem.DEFAULT); > final ConnectionFactory connectionFactory = > factory.connectionFactory > .with(CalciteConnectionProperty.TYPE_SYSTEM, uri(FIELD)); /// > NO PARSER_FACTORY HERE > {code} > I am trying to fix this by adding a PARSER_FACTORY argument to the > connection, but then I get a class loader error from > AvaticaUtils.instantiatePlugin, which, in this case, cannot find the > SqlBabelParserImpl#FACTORY in the classpath. > I would appreciate some help solving this last bit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-6030) DATE_PART is not handled by the RexToLixTranslator
[ https://issues.apache.org/jira/browse/CALCITE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu reassigned CALCITE-6030: Assignee: Mihai Budiu > DATE_PART is not handled by the RexToLixTranslator > -- > > Key: CALCITE-6030 > URL: https://issues.apache.org/jira/browse/CALCITE-6030 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Minor > Labels: pull-request-available > Fix For: 1.36.0 > > > The following test, when added to SqlOperatorTest, causes a RuntimeException: > {code:java} > @Test void testDatePart() { > final SqlOperatorFixture f = fixture().withLibrary(SqlLibrary.POSTGRESQL) > .withParserConfig(p -> > p.withParserFactory(SqlBabelParserImpl.FACTORY)); > f.checkScalar("DATE_PART(second, TIME '10:10:10')", > "10", "BIGINT NOT NULL"); > } > {code} > Note that this needs https://github.com/apache/calcite/pull/3445 to execute > correctly. > The stack trace is: > {code:java} > Suppressed: java.lang.RuntimeException: cannot translate call DATE_PART($t1, > $t2) > at > org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitCall(RexToLixTranslator.java:1160) > at > org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitCall(RexToLixTranslator.java:101) > at org.apache.calcite.rex.RexCall.accept(RexCall.java:189) > {code} > According to the documentation DATE_PART is just an alias for EXTRACT, which > is (mostly) implemented, so this should work. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-6052) reltosql writes FLOATING POINT literals as DECIMAL literals
Mihai Budiu created CALCITE-6052: Summary: reltosql writes FLOATING POINT literals as DECIMAL literals Key: CALCITE-6052 URL: https://issues.apache.org/jira/browse/CALCITE-6052 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.35.0 Reporter: Mihai Budiu This bug is already fixed in https://github.com/apache/calcite/pull/3411, but I plan to submit a smaller point fix for it, which doesn't require reworking the type families. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (CALCITE-6052) reltosql writes FLOATING POINT literals as DECIMAL literals
[ https://issues.apache.org/jira/browse/CALCITE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu reassigned CALCITE-6052: Assignee: Mihai Budiu > reltosql writes FLOATING POINT literals as DECIMAL literals > --- > > Key: CALCITE-6052 > URL: https://issues.apache.org/jira/browse/CALCITE-6052 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Assignee: Mihai Budiu >Priority: Minor > > This bug is already fixed in https://github.com/apache/calcite/pull/3411, but > I plan to submit a smaller point fix for it, which doesn't require reworking > the type families. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6030) DATE_PART is not handled by the RexToLixTranslator
[ https://issues.apache.org/jira/browse/CALCITE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihai Budiu updated CALCITE-6030: - Fix Version/s: 1.36.0 > DATE_PART is not handled by the RexToLixTranslator > -- > > Key: CALCITE-6030 > URL: https://issues.apache.org/jira/browse/CALCITE-6030 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Priority: Minor > Labels: pull-request-available > Fix For: 1.36.0 > > > The following test, when added to SqlOperatorTest, causes a RuntimeException: > {code:java} > @Test void testDatePart() { > final SqlOperatorFixture f = fixture().withLibrary(SqlLibrary.POSTGRESQL) > .withParserConfig(p -> > p.withParserFactory(SqlBabelParserImpl.FACTORY)); > f.checkScalar("DATE_PART(second, TIME '10:10:10')", > "10", "BIGINT NOT NULL"); > } > {code} > Note that this needs https://github.com/apache/calcite/pull/3445 to execute > correctly. > The stack trace is: > {code:java} > Suppressed: java.lang.RuntimeException: cannot translate call DATE_PART($t1, > $t2) > at > org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitCall(RexToLixTranslator.java:1160) > at > org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitCall(RexToLixTranslator.java:101) > at org.apache.calcite.rex.RexCall.accept(RexCall.java:189) > {code} > According to the documentation DATE_PART is just an alias for EXTRACT, which > is (mostly) implemented, so this should work. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5990) Explicit cast to numeric type doesn't check overflow
[ https://issues.apache.org/jira/browse/CALCITE-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775914#comment-17775914 ] Mihai Budiu commented on CALCITE-5990: -- I already have a working fix, which I will submit once we have solved CALCITE-5921. > Explicit cast to numeric type doesn't check overflow > > > Key: CALCITE-5990 > URL: https://issues.apache.org/jira/browse/CALCITE-5990 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.35.0 >Reporter: Runkang He >Assignee: Runkang He >Priority: Blocker > Fix For: 1.36.0 > > > Explicit cast to numeric type doesn't check overflow, and this issue can be > reproduced by sqlline: > {code:sql} > select cast(empno as tinyint), cast(130 as tinyint) from emps where > name='Alice'; -- empno is 130 > {code} > The empno is INT type. The result is wrong: > {code:sql} > -126, -126{code} > I think it should throw exception when overflow, instead of returning wrong > result to user. > At last, this issue was found when to turn on runtime check for > CalciteSqlOperatorTest in CALCITE-5921. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6001) Add useUtf8AsDefaultCharset flag to SqlConformanceEnum to allow encoding of non-ISO-8859-1 characters
[ https://issues.apache.org/jira/browse/CALCITE-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775894#comment-17775894 ] Julian Hyde commented on CALCITE-6001: -- Can you take a look at CALCITE-3933? I think it's related, albeit a bigger issue. If solving this issue (6001) goes some way to solving 3933 I think we should do it. Your PR looks good (it just needs a little cleanup). > Add useUtf8AsDefaultCharset flag to SqlConformanceEnum to allow encoding of > non-ISO-8859-1 characters > - > > Key: CALCITE-6001 > URL: https://issues.apache.org/jira/browse/CALCITE-6001 > Project: Calcite > Issue Type: New Feature >Reporter: Tanner Clary >Assignee: Tanner Clary >Priority: Major > Labels: pull-request-available > > Many dialects supported by Calcite encode their strings using a default > charset (most commonly UTF-8 or ISO-8859-1). For example, BigQuery uses > [UTF-8|https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type]. > I am proposing to add a dialect property to be referenced when converting > string literals so that the current dialect's default is used unless > otherwise specified. > Presently, if no charset is specified when converting to RexLiterals > [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rex/RexBuilder.java#L1618], > the CalciteSystemProperty {{DEFAULT_CHARSET}} is used > ([docs|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/config/CalciteSystemProperty.java#L300]) > which is set as ISO-8859-1. > This means that when converting a query like: > {{select 'ק' as result;}} > you will get the following the error: {{Failed to encode 'ק' in character > set 'ISO-8859-1'}}. > This failure is unexpected if you are using BigQuery conformance(or any > dialect whose default is UTF-8). > Of course an alternative solution would be to just change the Calcite default > to UTF-8 which supports encoding any UNICODE character while ISO-8859-1 can > only encode the first 256, but I imagine there are reasons against this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-3933) Incorrect SQL Emitted for Unicode for Several Dialects
[ https://issues.apache.org/jira/browse/CALCITE-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775893#comment-17775893 ] Julian Hyde commented on CALCITE-3933: -- A related issue is CALCITE-6001. If Calcite knows that a DB can handle a larger character set, it can generate literals in that character set, and won't need to use Unicode encoding. > Incorrect SQL Emitted for Unicode for Several Dialects > -- > > Key: CALCITE-3933 > URL: https://issues.apache.org/jira/browse/CALCITE-3933 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.22.0 > Environment: master with latest commit on April 15 ( > dfb842e55e1fa7037c8a731341010ed1c0cfb6f7) >Reporter: Aryeh Hillman >Priority: Major > > A string literal like "schön" should emit "schön" in SQL for many dialects, > but instead emits > {code:java} > u&'sch\\00f6n' {code} > which is (ISO-8859-1 ASCII). > It's possible that some of the above dialects may support ISO-8859, but in my > tests with *BigQuery Standard SQL*, *MySQL*, and *Redshift* engines, the > following fails: > {code:java} > select u&'sch\\00f6n';{code} > But this succeeds: > {code:java} > select 'schön'; {code} > Test that demonstrates (add to > `org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java` and run from > there): > {code:java} > @Test void testBigQueryUnicode() { > final Function relFn = b -> > b.scan("EMP") > .filter( > b.call(SqlStdOperatorTable.IN, b.field("ENAME"), > b.literal("schön"))) > .build(); > final String expectedSql = "SELECT *\n" + > "FROM scott.EMP\n" + > "WHERE ENAME IN ('schön')"; > relFn(relFn).withBigQuery().ok(expectedSql); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (CALCITE-6046) SQL parser failed when parsing a comment string start with ''
[ https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde resolved CALCITE-6046. -- Fix Version/s: (was: 1.36.0) Resolution: Duplicate > SQL parser failed when parsing a comment string start with '' > --- > > Key: CALCITE-6046 > URL: https://issues.apache.org/jira/browse/CALCITE-6046 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: xiaogang zhou >Priority: Major > > quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will > cause the SqlLiteral > > for example with a SQL > > {code:java} > // code placeholder > CREATE TABLE source ( > a BIGINT > ) comment '测试test' > WITH ( > 'connector' = 'test' > ); {code} > with a parsed Sqlnode, the toString will create a SQL like below, which is > not parsable again. > > {code:java} > // code placeholder > CREATE TABLE `source` ( > `a` BIGINT > ) > COMMENT u&'\5218\51eftest' WITH ( > 'connector' = 'test' > ) {code} > I think this is caused by > {code:java} > // code placeholder > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); {code} > not sure if I misconfigured something. Is it possiable to remove the > buf.append("u&'"); ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6046) SQL parser failed when parsing a comment string start with ''
[ https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775891#comment-17775891 ] Julian Hyde commented on CALCITE-6046: -- Also a duplicate of CALCITE-3933. > SQL parser failed when parsing a comment string start with '' > --- > > Key: CALCITE-6046 > URL: https://issues.apache.org/jira/browse/CALCITE-6046 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: xiaogang zhou >Priority: Major > Fix For: 1.36.0 > > > quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will > cause the SqlLiteral > > for example with a SQL > > {code:java} > // code placeholder > CREATE TABLE source ( > a BIGINT > ) comment '测试test' > WITH ( > 'connector' = 'test' > ); {code} > with a parsed Sqlnode, the toString will create a SQL like below, which is > not parsable again. > > {code:java} > // code placeholder > CREATE TABLE `source` ( > `a` BIGINT > ) > COMMENT u&'\5218\51eftest' WITH ( > 'connector' = 'test' > ) {code} > I think this is caused by > {code:java} > // code placeholder > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); {code} > not sure if I misconfigured something. Is it possiable to remove the > buf.append("u&'"); ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775890#comment-17775890 ] Julian Hyde commented on CALCITE-6051: -- I believe this is a duplicate of CALCITE-3933. > Incorrect translation for unicode strings in SqlDialect's > quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > The queries fail when we pass a query containing this encoding. > Also tested the same query you've shared on hive and spark: > Hive: > {code:java} > select u&'hello world'; > Error: Error while compiling statement: FAILED: SemanticException [Error > 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible > column names are: ) (state=42000,code=10004) > {code} > Spark: > {code:java} > select u&'hello world'; > User class threw exception: org.apache.spark.sql.AnalysisException: cannot > resolve 'u' given input columns: []; line 1 pos 7; > {code} > This is HiveSqlDialect: > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java > There is no overriding function in HiveSql dialect corresponding to > `quoteStringLiteralUnicode` method in SqlDialect. > Corresponding SparkSqlDialect: > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java > > *Ask:* > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde resolved CALCITE-6051. -- Resolution: Duplicate > Incorrect translation for unicode strings in SqlDialect's > quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > The queries fail when we pass a query containing this encoding. > Also tested the same query you've shared on hive and spark: > Hive: > {code:java} > select u&'hello world'; > Error: Error while compiling statement: FAILED: SemanticException [Error > 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible > column names are: ) (state=42000,code=10004) > {code} > Spark: > {code:java} > select u&'hello world'; > User class threw exception: org.apache.spark.sql.AnalysisException: cannot > resolve 'u' given input columns: []; line 1 pos 7; > {code} > This is HiveSqlDialect: > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java > There is no overriding function in HiveSql dialect corresponding to > `quoteStringLiteralUnicode` method in SqlDialect. > Corresponding SparkSqlDialect: > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java > > *Ask:* > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5763) Discontinue support for Guava < 20.0
[ https://issues.apache.org/jira/browse/CALCITE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775821#comment-17775821 ] Gian Merlino commented on CALCITE-5763: --- [~julianhyde] yes, please go ahead. Druid is on Calcite 1.35 now, and recently decided to drop support for Hadoop 2 and update to Guava 31.1-jre. That means we will be able to update to future Calcite releases that do not support older Guavas. > Discontinue support for Guava < 20.0 > > > Key: CALCITE-5763 > URL: https://issues.apache.org/jira/browse/CALCITE-5763 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Assignee: Julian Hyde >Priority: Major > Labels: pull-request-available > Fix For: 1.36.0 > > > Discontinue support for Guava versions before 20.0, and resume building on > the latest Guava. This reverses CALCITE-5477, which changes the build from > Guava 31.1-jre to 19.0, and CALCITE-5428, which moves the minimum supported > Guava version from 19.0 to 16.0.1. > This change will happen no earlier than "the first release after August", > therefore can be merged to main no earlier than 2023-09-01. I recommend that > it is merged very soon after that date. I have set fixVersion = 1.36 assuming > that 1.36 is the first release after August. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6046) SQL parser failed when parsing a comment string start with ''
[ https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775779#comment-17775779 ] LakeShen commented on CALCITE-6046: --- Hi [~zhoujira86] ,is this problem duplicate as CALCITE-6051 ? I find the problem's behavior is same as CALCITE-6051 > SQL parser failed when parsing a comment string start with '' > --- > > Key: CALCITE-6046 > URL: https://issues.apache.org/jira/browse/CALCITE-6046 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: xiaogang zhou >Priority: Major > Fix For: 1.36.0 > > > quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will > cause the SqlLiteral > > for example with a SQL > > {code:java} > // code placeholder > CREATE TABLE source ( > a BIGINT > ) comment '测试test' > WITH ( > 'connector' = 'test' > ); {code} > with a parsed Sqlnode, the toString will create a SQL like below, which is > not parsable again. > > {code:java} > // code placeholder > CREATE TABLE `source` ( > `a` BIGINT > ) > COMMENT u&'\5218\51eftest' WITH ( > 'connector' = 'test' > ) {code} > I think this is caused by > {code:java} > // code placeholder > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); {code} > not sure if I misconfigured something. Is it possiable to remove the > buf.append("u&'"); ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6046) SQL parser failed when parsing a comment string start with ''
[ https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaogang zhou updated CALCITE-6046: --- Summary: SQL parser failed when parsing a comment string start with '' (was: SQL parser failed when parsing a literal start with '') > SQL parser failed when parsing a comment string start with '' > --- > > Key: CALCITE-6046 > URL: https://issues.apache.org/jira/browse/CALCITE-6046 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: xiaogang zhou >Priority: Major > Fix For: 1.36.0 > > > quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will > cause the SqlLiteral > > for example with a SQL > > {code:java} > // code placeholder > CREATE TABLE source ( > a BIGINT > ) comment '测试test' > WITH ( > 'connector' = 'test' > ); {code} > with a parsed Sqlnode, the toString will create a SQL like below, which is > not parsable again. > > {code:java} > // code placeholder > CREATE TABLE `source` ( > `a` BIGINT > ) > COMMENT u&'\5218\51eftest' WITH ( > 'connector' = 'test' > ) {code} > I think this is caused by > {code:java} > // code placeholder > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); {code} > not sure if I misconfigured something. Is it possiable to remove the > buf.append("u&'"); ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6046) SQL parser failed when parsing a literal start with ''
[ https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaogang zhou updated CALCITE-6046: --- Summary: SQL parser failed when parsing a literal start with '' (was: QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral) > SQL parser failed when parsing a literal start with '' > > > Key: CALCITE-6046 > URL: https://issues.apache.org/jira/browse/CALCITE-6046 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: xiaogang zhou >Priority: Major > Fix For: 1.36.0 > > > quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will > cause the SqlLiteral > > for example with a SQL > > {code:java} > // code placeholder > CREATE TABLE source ( > a BIGINT > ) comment '测试test' > WITH ( > 'connector' = 'test' > ); {code} > with a parsed Sqlnode, the toString will create a SQL like below, which is > not parsable again. > > {code:java} > // code placeholder > CREATE TABLE `source` ( > `a` BIGINT > ) > COMMENT u&'\5218\51eftest' WITH ( > 'connector' = 'test' > ) {code} > I think this is caused by > {code:java} > // code placeholder > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); {code} > not sure if I misconfigured something. Is it possiable to remove the > buf.append("u&'"); ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-6046) QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral
[ https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775755#comment-17775755 ] xiaogang zhou edited comment on CALCITE-6046 at 10/16/23 1:16 PM: -- Hi [~julianhyde] , The behavior I thought was wrong is when I use below code {code:java} // code placeholder SqlParser.Config parserConfig = getCurrentSqlParserConfig(sqlDialect); SqlParser sqlParser = SqlParser.create(sqlContent, parserConfig); SqlNodeList sqlNodeList = sqlParser.parseStmtList(); sqlParser.parse(sqlNodeList.get(0)); {code} to parse {code:java} // code placeholder CREATE TABLE source ( a BIGINT ) comment '测试test' WITH ( 'connector' = 'test' ); {code} then unparse it , I get {code:java} // code placeholder CREATE TABLE `source` ( `a` BIGINT ) COMMENT u&'\5218\51eftest' WITH ( 'connector' = 'test' ) {code} which is not parsable by FLINK sql template {code:java} // code placeholder [ { String p = SqlParserUtil.parseString(token.image); comment = SqlLiteral.createCharString(p, getPos()); }] {code} Since you mentioned '' is Standard SQL DIALECT, I think there is nothing wrong in CALCITE. If the statement above makes sense to you, we can just close this CALCITE issue, and I will follow it in FLINK issue with FLINK TEAM. was (Author: zhoujira86): Hi [~julianhyde] , The behavior I thought was wrong is when I use below code {code:java} // code placeholder SqlParser.Config parserConfig = getCurrentSqlParserConfig(sqlDialect); SqlParser sqlParser = SqlParser.create(sqlContent, parserConfig); SqlNodeList sqlNodeList = sqlParser.parseStmtList(); sqlParser.parse(sqlNodeList.get(0)); {code} to parse {code:java} // code placeholder CREATE TABLE source ( a BIGINT ) comment '测试test' WITH ( 'connector' = 'test' ); {code} then unparse it , I will get {code:java} // code placeholder CREATE TABLE `source` ( `a` BIGINT ) COMMENT u&'\5218\51eftest' WITH ( 'connector' = 'test' ) {code} which is not parsable by FLINK sql template {code:java} // code placeholder [ { String p = SqlParserUtil.parseString(token.image); comment = SqlLiteral.createCharString(p, getPos()); }] {code} Since you mentioned '' is Standard SQL DIALECT, I think there is nothing wrong in CALCITE. If the statement above makes sense to you, we can just close this CALCITE issue, and I will follow it in FLINK issue with FLINK TEAM. > QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will > cause the SqlLiteral > -- > > Key: CALCITE-6046 > URL: https://issues.apache.org/jira/browse/CALCITE-6046 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: xiaogang zhou >Priority: Major > Fix For: 1.36.0 > > > quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will > cause the SqlLiteral > > for example with a SQL > > {code:java} > // code placeholder > CREATE TABLE source ( > a BIGINT > ) comment '测试test' > WITH ( > 'connector' = 'test' > ); {code} > with a parsed Sqlnode, the toString will create a SQL like below, which is > not parsable again. > > {code:java} > // code placeholder > CREATE TABLE `source` ( > `a` BIGINT > ) > COMMENT u&'\5218\51eftest' WITH ( > 'connector' = 'test' > ) {code} > I think this is caused by > {code:java} > // code placeholder > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); {code} > not sure if I misconfigured something. Is it possiable to remove the > buf.append("u&'"); ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6046) QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral
[ https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775755#comment-17775755 ] xiaogang zhou commented on CALCITE-6046: Hi [~julianhyde] , The behavior I thought was wrong is when I use below code {code:java} // code placeholder SqlParser.Config parserConfig = getCurrentSqlParserConfig(sqlDialect); SqlParser sqlParser = SqlParser.create(sqlContent, parserConfig); SqlNodeList sqlNodeList = sqlParser.parseStmtList(); sqlParser.parse(sqlNodeList.get(0)); {code} to parse {code:java} // code placeholder CREATE TABLE source ( a BIGINT ) comment '测试test' WITH ( 'connector' = 'test' ); {code} then unparse it , I will get {code:java} // code placeholder CREATE TABLE `source` ( `a` BIGINT ) COMMENT u&'\5218\51eftest' WITH ( 'connector' = 'test' ) {code} which is not parsable by FLINK sql template {code:java} // code placeholder [ { String p = SqlParserUtil.parseString(token.image); comment = SqlLiteral.createCharString(p, getPos()); }] {code} Since you mentioned '' is Standard SQL DIALECT, I think there is nothing wrong in CALCITE. If the statement above makes sense to you, we can just close this CALCITE issue, and I will follow it in FLINK issue with FLINK TEAM. > QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will > cause the SqlLiteral > -- > > Key: CALCITE-6046 > URL: https://issues.apache.org/jira/browse/CALCITE-6046 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: xiaogang zhou >Priority: Major > Fix For: 1.36.0 > > > quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will > cause the SqlLiteral > > for example with a SQL > > {code:java} > // code placeholder > CREATE TABLE source ( > a BIGINT > ) comment '测试test' > WITH ( > 'connector' = 'test' > ); {code} > with a parsed Sqlnode, the toString will create a SQL like below, which is > not parsable again. > > {code:java} > // code placeholder > CREATE TABLE `source` ( > `a` BIGINT > ) > COMMENT u&'\5218\51eftest' WITH ( > 'connector' = 'test' > ) {code} > I think this is caused by > {code:java} > // code placeholder > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); {code} > not sure if I misconfigured something. Is it possiable to remove the > buf.append("u&'"); ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775717#comment-17775717 ] Shivangi commented on CALCITE-6051: --- Makes sense [~shenlang]. I've updated the jira summary and description. > Incorrect translation for unicode strings in SqlDialect's > quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > The queries fail when we pass a query containing this encoding. > Also tested the same query you've shared on hive and spark: > Hive: > {code:java} > select u&'hello world'; > Error: Error while compiling statement: FAILED: SemanticException [Error > 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible > column names are: ) (state=42000,code=10004) > {code} > Spark: > {code:java} > select u&'hello world'; > User class threw exception: org.apache.spark.sql.AnalysisException: cannot > resolve 'u' given input columns: []; line 1 pos 7; > {code} > This is HiveSqlDialect: > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java > There is no overriding function in HiveSql dialect corresponding to > `quoteStringLiteralUnicode` method in SqlDialect. > Corresponding SparkSqlDialect: > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java > > *Ask:* > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivangi updated CALCITE-6051: -- Description: Hi, The unicodes returned by calcite have broken formats. For example, the string `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is coming from calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java file, `quoteStringLiteralUnicode` method: {code:java} /** * Converts a string into a unicode string literal. For example, * can't{tab}run\ becomes u'can''t\0009run\\'. */ public void quoteStringLiteralUnicode(StringBuilder buf, String val) { buf.append("u&'"); for (int i = 0; i < val.length(); i++) { char c = val.charAt(i); if (c < 32 || c >= 128) { buf.append('\\'); buf.append(HEXITS[(c >> 12) & 0xf]); buf.append(HEXITS[(c >> 8) & 0xf]); buf.append(HEXITS[(c >> 4) & 0xf]); buf.append(HEXITS[c & 0xf]); } else if (c == '\'' || c == '\\') { buf.append(c); buf.append(c); } else { buf.append(c); } } buf.append("'"); } {code} The queries fail when we pass a query containing this encoding. Also tested the same query you've shared on hive and spark: Hive: {code:java} select u&'hello world'; Error: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column names are: ) (state=42000,code=10004) {code} Spark: {code:java} select u&'hello world'; User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'u' given input columns: []; line 1 pos 7; {code} This is HiveSqlDialect: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java There is no overriding function in HiveSql dialect corresponding to `quoteStringLiteralUnicode` method in SqlDialect. Corresponding SparkSqlDialect: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java *Ask:* Why is `buf.append("u&'")` added in this method? I couldn't find relatable unicode conversion that contains `u&`, as a result, it breaks when read by the client. I wanted to understand the reason why `u&` is being used and what can break if we remove `&`. Thanks! was: Hi, The unicodes returned by calcite have broken formats. For example, the string `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is coming from calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java file, `quoteStringLiteralUnicode` method: {code:java} /** * Converts a string into a unicode string literal. For example, * can't{tab}run\ becomes u'can''t\0009run\\'. */ public void quoteStringLiteralUnicode(StringBuilder buf, String val) { buf.append("u&'"); for (int i = 0; i < val.length(); i++) { char c = val.charAt(i); if (c < 32 || c >= 128) { buf.append('\\'); buf.append(HEXITS[(c >> 12) & 0xf]); buf.append(HEXITS[(c >> 8) & 0xf]); buf.append(HEXITS[(c >> 4) & 0xf]); buf.append(HEXITS[c & 0xf]); } else if (c == '\'' || c == '\\') { buf.append(c); buf.append(c); } else { buf.append(c); } } buf.append("'"); } {code} The queries fail when we pass a query containing this encoding. For example in hive: {code:java} select * from somedb.some_table where city_id = u&'Conveni\00eancia'; {code} Response: {code:java} FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or column reference 'u': ( {code} This is HiveSqlDialect: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java There is no overriding function in HiveSql dialect corresponding to `quoteStringLiteralUnicode` method in SqlDialect. Corresponding SparkSqlDialect: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java *Ask:* Why is `buf.append("u&'")` added in this method? I couldn't find relatable unicode conversion that contains `u&`, as a result, it breaks when read by the client. I wanted to understand the reason why `u&` is being used and what can break if we remove `&`. Thanks! > Incorrect translation for unicode strings in SqlDialect's > quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments:
[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivangi updated CALCITE-6051: -- Description: Hi, The unicodes returned by calcite have broken formats. For example, the string `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is coming from calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java file, `quoteStringLiteralUnicode` method: {code:java} /** * Converts a string into a unicode string literal. For example, * can't{tab}run\ becomes u'can''t\0009run\\'. */ public void quoteStringLiteralUnicode(StringBuilder buf, String val) { buf.append("u&'"); for (int i = 0; i < val.length(); i++) { char c = val.charAt(i); if (c < 32 || c >= 128) { buf.append('\\'); buf.append(HEXITS[(c >> 12) & 0xf]); buf.append(HEXITS[(c >> 8) & 0xf]); buf.append(HEXITS[(c >> 4) & 0xf]); buf.append(HEXITS[c & 0xf]); } else if (c == '\'' || c == '\\') { buf.append(c); buf.append(c); } else { buf.append(c); } } buf.append("'"); } {code} The queries fail when we pass a query containing this encoding. For example in hive: {code:java} select * from somedb.some_table where city_id = u&'Conveni\00eancia'; {code} Response: {code:java} FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or column reference 'u': ( {code} This is HiveSqlDialect: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java There is no overriding function in HiveSql dialect corresponding to `quoteStringLiteralUnicode` method in SqlDialect. Corresponding SparkSqlDialect: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java *Ask:* Why is `buf.append("u&'")` added in this method? I couldn't find relatable unicode conversion that contains `u&`, as a result, it breaks when read by the client. I wanted to understand the reason why `u&` is being used and what can break if we remove `&`. Thanks! was: Hi, The unicodes returned by calcite have broken formats. For example, the string `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is coming from calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java file, `quoteStringLiteralUnicode` method: {code:java} /** * Converts a string into a unicode string literal. For example, * can't{tab}run\ becomes u'can''t\0009run\\'. */ public void quoteStringLiteralUnicode(StringBuilder buf, String val) { buf.append("u&'"); for (int i = 0; i < val.length(); i++) { char c = val.charAt(i); if (c < 32 || c >= 128) { buf.append('\\'); buf.append(HEXITS[(c >> 12) & 0xf]); buf.append(HEXITS[(c >> 8) & 0xf]); buf.append(HEXITS[(c >> 4) & 0xf]); buf.append(HEXITS[c & 0xf]); } else if (c == '\'' || c == '\\') { buf.append(c); buf.append(c); } else { buf.append(c); } } buf.append("'"); } {code} Why is `buf.append("u&'")` added in this method? I couldn't find relatable unicode conversion that contains `u&`, as a result, it breaks when read by the client. I wanted to understand the reason why `u&` is being used and what can break if we remove `&`. Thanks! > Incorrect translation for unicode strings in SqlDialect's > quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { >
[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivangi updated CALCITE-6051: -- Description: Hi, The unicodes returned by calcite have broken formats. For example, the string `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is coming from calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java file, `quoteStringLiteralUnicode` method: {code:java} /** * Converts a string into a unicode string literal. For example, * can't{tab}run\ becomes u'can''t\0009run\\'. */ public void quoteStringLiteralUnicode(StringBuilder buf, String val) { buf.append("u&'"); for (int i = 0; i < val.length(); i++) { char c = val.charAt(i); if (c < 32 || c >= 128) { buf.append('\\'); buf.append(HEXITS[(c >> 12) & 0xf]); buf.append(HEXITS[(c >> 8) & 0xf]); buf.append(HEXITS[(c >> 4) & 0xf]); buf.append(HEXITS[c & 0xf]); } else if (c == '\'' || c == '\\') { buf.append(c); buf.append(c); } else { buf.append(c); } } buf.append("'"); } {code} Why is `buf.append("u&'")` added in this method? I couldn't find relatable unicode conversion that contains `u&`, as a result, it breaks when read by the client. I wanted to understand the reason why `u&` is being used and what can break if we remove `&`. Thanks! was: Hi, The unicodes returned by calcite have broken formats. For example, the string `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is coming from calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java file, `quoteStringLiteralUnicode` method: {code:java} /** * Converts a string into a unicode string literal. For example, * can't{tab}run\ becomes u'can''t\0009run\\'. */ public void quoteStringLiteralUnicode(StringBuilder buf, String val) { buf.append("u&'"); for (int i = 0; i < val.length(); i++) { char c = val.charAt(i); if (c < 32 || c >= 128) { buf.append('\\'); buf.append(HEXITS[(c >> 12) & 0xf]); buf.append(HEXITS[(c >> 8) & 0xf]); buf.append(HEXITS[(c >> 4) & 0xf]); buf.append(HEXITS[c & 0xf]); } else if (c == '\'' || c == '\\') { buf.append(c); buf.append(c); } else { buf.append(c); } } buf.append("'"); } {code} Why is `buf.append("u&'")` added in this method? I couldn't find relatable unicode conversion that contains `u&`, as a result, it breaks when read by the client. I wanted to understand the reason why `u&` is being used and what can break if we remove `&`. Thanks! > Incorrect translation for unicode strings in SqlDialect's > quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivangi updated CALCITE-6051: -- Summary: Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect (was: Incorrect format for unicode strings ) > Incorrect translation for unicode strings in SqlDialect's > quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-6051) Incorrect format for unicode strings
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775696#comment-17775696 ] LakeShen edited comment on CALCITE-6051 at 10/16/23 11:13 AM: -- I'm sure that PG is ok for 'u&',for example: !image-2023-10-16-18-54-53-483.png|width=436,height=182! So the problem is that different engines or databases have different levels of support for 'u&',in hive or spark,they don't support the 'u&'. I think that jira's title could be more clearly about this problem.How about `Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect`? At the same time,you should make this JIRA description more clear about your problem. Maybe we could according to SqlDialect#databaseProduct's type, writing different behavior in `quoteStringLiteralUnicode` method. was (Author: shenlang): I'm sure that PG is ok for 'u&',for example: !image-2023-10-16-18-54-53-483.png|width=436,height=182! So the problem is that different engines or databases have different levels of support for 'u&',in hive or spark,they don't support the 'u&'. I think that jira's title could be more clearly about this problem.How about `Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect`? At the same time,you should make this JIRA description more clear about your problem. > Incorrect format for unicode strings > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6051) Incorrect format for unicode strings
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775696#comment-17775696 ] LakeShen commented on CALCITE-6051: --- I'm sure that PG is ok for 'u&',for example: !image-2023-10-16-18-54-53-483.png|width=436,height=182! So the problem is that different engines or databases have different levels of support for 'u&',in hive or spark,they don't support the 'u&'. I think that jira's title could be more clearly about this problem.How about `Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect`? At the same time,you should make this JIRA description more clear about your problem. > Incorrect format for unicode strings > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-6051) Incorrect format for unicode strings
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775690#comment-17775690 ] Shivangi edited comment on CALCITE-6051 at 10/16/23 11:00 AM: -- Also tested the same query you've shared on hive and spark: Hive: {code:java} select u&'hello world'; Error: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column names are: ) (state=42000,code=10004) {code} Spark: {code:java} select u&'hello world'; User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'u' given input columns: []; line 1 pos 7; {code} was (Author: shivincible): Also tested the same query you've shared on hive and spark: Hive: {code:java} select u&'hello world'; Error: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column names are: ) (state=42000,code=10004) {code} Spark: {code:java} select u&'hello world'; User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'u' given input columns: []; line 1 pos 7; {code} > Incorrect format for unicode strings > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6051) Incorrect format for unicode strings
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775690#comment-17775690 ] Shivangi commented on CALCITE-6051: --- Also tested the same query you've shared on hive and spark: Hive: {code:java} select u&'hello world'; Error: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column names are: ) (state=42000,code=10004) {code} Spark: {code:java} select u&'hello world'; User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'u' given input columns: []; line 1 pos 7; {code} > Incorrect format for unicode strings > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6051) Incorrect format for unicode strings
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LakeShen updated CALCITE-6051: -- Attachment: image-2023-10-16-18-54-53-483.png > Incorrect format for unicode strings > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > Attachments: image-2023-10-16-18-54-53-483.png > > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-6051) Incorrect format for unicode strings
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775678#comment-17775678 ] Shivangi edited comment on CALCITE-6051 at 10/16/23 10:40 AM: -- Thanks for the quick response [~shenlang]! We are using SQLDialect for Hive and Spark. For both the cases, the queries fail when we pass a query containing this encoding. For example in hive: {code:java} select * from somedb.some_table where city_id = u&'Conveni\00eancia'; {code} Response: {code:java} FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or column reference 'u': ( {code} This is HiveSqlDialect: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java There is no overriding function in HiveSql dialect corresponding to `quoteStringLiteralUnicode` method in SqlDialect. So, is the output returned by SqlDialect containing `u&'` valid wrt to Postgres? Am I missing something here? was (Author: shivincible): Thanks for the quick response [~shenlang]! We are using SQLDialect for Hive and Spark. For both the cases, the queries fail when we pass a query containing this encoding. For example in hive: {code:java} select * from somedb.some_table where city_id = u&'Conveni\00eancia'; {code} Response: {code:java} FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or column reference 'u': ( {code} This is HiveSqlDialect: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java So, is the output returned by SqlDialect containing `u&'` valid wrt to Presto? Am I missing something here? > Incorrect format for unicode strings > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6051) Incorrect format for unicode strings
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775678#comment-17775678 ] Shivangi commented on CALCITE-6051: --- Thanks for the quick response [~shenlang]! We are using SQLDialect for Hive and Spark. For both the cases, the queries fail when we pass a query containing this encoding. For example in hive: {code:java} select * from somedb.some_table where city_id = u&'Conveni\00eancia'; {code} Response: {code:java} FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or column reference 'u': ( {code} This is HiveSqlDialect: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java So, is the output returned by SqlDialect containing `u&'` valid wrt to Presto? Am I missing something here? > Incorrect format for unicode strings > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6040) The operand type inference of SqlMapValueConstructor is incorrect
[ https://issues.apache.org/jira/browse/CALCITE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775663#comment-17775663 ] Ran Tao commented on CALCITE-6040: -- I have set operandTypeInference to 'null' to fix this case. because MAP allows null and no need to deduce null type. The SqlMapQueryConstructor has set to 'null' either. > The operand type inference of SqlMapValueConstructor is incorrect > - > > Key: CALCITE-6040 > URL: https://issues.apache.org/jira/browse/CALCITE-6040 > Project: Calcite > Issue Type: Bug > Components: tests >Affects Versions: 1.35.0 >Reporter: Ran Tao >Assignee: Ran Tao >Priority: Major > Labels: pull-request-available > Fix For: 1.36.0 > > > we have a simple test case: > {code:java} > f.checkScalar("map[1, null]", "{1=null}", > "(INTEGER NOT NULL, NULL) MAP NOT NULL"); {code} > The result is: > {noformat} > java.lang.AssertionError: Query: values (map[1, null]) > Expected: is "(INTEGER NOT NULL, NULL) MAP NOT NULL" > but: was "(INTEGER NOT NULL, INTEGER) MAP NOT NULL" > {noformat} > however, the asserted actual result "(INTEGER NOT NULL, INTEGER) MAP NOT > NULL" for this case is wrong. If we switch to this asserted actual result it > throws another exception: > {noformat} > java.lang.AssertionError: Query: select map[p0, null] from (values (1)) as > t(p0) > Expected: is "(INTEGER NOT NULL, INTEGER) MAP NOT NULL" > but: was "(INTEGER NOT NULL, NULL) MAP NOT NULL" > {noformat} > No matter how you write this result type in this test case, it is wrong. > by checking the plan, it seems the deduced value type of NULL has converted > to INTEGER. > More serious scenario, if it is `map[1, 'x', 2, null]`, an exception will be > thrown directly and fail. > because the null converted to FIRST_KNOWN INTEGER(however it should keep > NULL, then leaseRestrictive type will be char). > the form such as `map[1, null, 2,'x']` has same problem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6051) Incorrect format for unicode strings
[ https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775659#comment-17775659 ] LakeShen commented on CALCITE-6051: --- The 'u&' tells the database that string constants with unicode escapes,it is usually used in SQL statements. More details could see `PG String Constants With Unicode Escapes`: [https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE] Because 'u&' appears in the SqlDialect, which transforms the SqlNode to Sql, so I think that it is correct. > Incorrect format for unicode strings > - > > Key: CALCITE-6051 > URL: https://issues.apache.org/jira/browse/CALCITE-6051 > Project: Calcite > Issue Type: Bug >Reporter: Shivangi >Priority: Major > > Hi, > The unicodes returned by calcite have broken formats. For example, the string > `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is > coming from > calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java > file, `quoteStringLiteralUnicode` method: > {code:java} > /** >* Converts a string into a unicode string literal. For example, >* can't{tab}run\ becomes u'can''t\0009run\\'. >*/ > public void quoteStringLiteralUnicode(StringBuilder buf, String val) { > buf.append("u&'"); > for (int i = 0; i < val.length(); i++) { > char c = val.charAt(i); > if (c < 32 || c >= 128) { > buf.append('\\'); > buf.append(HEXITS[(c >> 12) & 0xf]); > buf.append(HEXITS[(c >> 8) & 0xf]); > buf.append(HEXITS[(c >> 4) & 0xf]); > buf.append(HEXITS[c & 0xf]); > } else if (c == '\'' || c == '\\') { > buf.append(c); > buf.append(c); > } else { > buf.append(c); > } > } > buf.append("'"); > } > {code} > Why is `buf.append("u&'")` added in this method? I couldn't find relatable > unicode conversion that contains `u&`, as a result, it breaks when read by > the client. I wanted to understand the reason why `u&` is being used and what > can break if we remove `&`. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (CALCITE-6014) Create a SqlOperatorFixture that parses, unparses, and then parses again before executing
[ https://issues.apache.org/jira/browse/CALCITE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruben Q L resolved CALCITE-6014. Resolution: Fixed Fixed via [{{5151168}}|https://github.com/apache/calcite/commit/5151168e9a9035595939c2ae0f21a06984229209] Thanks [~mbudiu] for your contribution! > Create a SqlOperatorFixture that parses, unparses, and then parses again > before executing > - > > Key: CALCITE-6014 > URL: https://issues.apache.org/jira/browse/CALCITE-6014 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: Mihai Budiu >Priority: Minor > Labels: pull-request-available > Fix For: 1.36.0 > > > Such a fixture will help catch bugs in the unparsing code. > Several bugs were found using this technique, e.g., CALCITE-5997. > This is related to CALCITE-5891, CALCITE-6000. > The SqlParserFixture UnparsingTesterImpl provides a similar service, but > since it does not validate the code after unparsing, it will catch fewer bugs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-6051) Incorrect format for unicode strings
Shivangi created CALCITE-6051: - Summary: Incorrect format for unicode strings Key: CALCITE-6051 URL: https://issues.apache.org/jira/browse/CALCITE-6051 Project: Calcite Issue Type: Bug Reporter: Shivangi Hi, The unicodes returned by calcite have broken formats. For example, the string `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is coming from calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java file, `quoteStringLiteralUnicode` method: {code:java} /** * Converts a string into a unicode string literal. For example, * can't{tab}run\ becomes u'can''t\0009run\\'. */ public void quoteStringLiteralUnicode(StringBuilder buf, String val) { buf.append("u&'"); for (int i = 0; i < val.length(); i++) { char c = val.charAt(i); if (c < 32 || c >= 128) { buf.append('\\'); buf.append(HEXITS[(c >> 12) & 0xf]); buf.append(HEXITS[(c >> 8) & 0xf]); buf.append(HEXITS[(c >> 4) & 0xf]); buf.append(HEXITS[c & 0xf]); } else if (c == '\'' || c == '\\') { buf.append(c); buf.append(c); } else { buf.append(c); } } buf.append("'"); } {code} Why is `buf.append("u&'")` added in this method? I couldn't find relatable unicode conversion that contains `u&`, as a result, it breaks when read by the client. I wanted to understand the reason why `u&` is being used and what can break if we remove `&`. Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-5607) Serialize return type during RelJson.toJson(RexNode node) serialization
[ https://issues.apache.org/jira/browse/CALCITE-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775551#comment-17775551 ] Oliver Lee commented on CALCITE-5607: - Hey [~julianhyde] , I finally got around to following up on this. I see that in my change that CAST was actually [handled separately|https://github.com/apache/calcite/pull/3129/files#diff-673904825afdbc42629c1eeb5abc0d713687722fcdead867cdbe460ebddc1e9cL563] by adding in the "type" to the JSON serialization. My change took that part out of the switch-statement and made it happen for all {{{}RexCall{}}}s. Now that I think about it, I should keep the switch-statement for the CAST and add in another clause specifically for {{SqlKind.MINUS}} , to not abandon deriving the type from the arguments for the rest of RexCalls. If you could give me confirmation, I can go ahead and update the PR. > Serialize return type during RelJson.toJson(RexNode node) serialization > > > Key: CALCITE-5607 > URL: https://issues.apache.org/jira/browse/CALCITE-5607 > Project: Calcite > Issue Type: Improvement >Reporter: Oliver Lee >Assignee: Oliver Lee >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > We found a bug in {{RelJson#toRex}} for the {{TIMESTAMP_DIFF}} call for Big > Query dialect. > {{TIMESTAMP_DIFF}} is translated to the {{MINUS_DATE}} > [operator|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/sql2rel/StandardConvertletTable.java#L2113-L2116] > with a return type explicitly declared as the interval. > {{MINUS_DATE}} uses an > {{[ARG2_NULLABLE|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/sql/type/ReturnTypes.java#L241]}} > return type inference which requires 3 operands. This is fine in most cases > where the RexCall is then used to generate SQL or for native implementations. > However, in {{{}RelJson#toRex{}}}, when it tries to reconstruct the entire > call to a RexNode, it attempts to derive the return type of the > {{MINUS_DATE}} operator using the {{ARG2_NULLABLE}} inference. This throws an > error as there are only 2 operands given to the {{MINUS_DATE}} operator. > We'd like to now add in the "type" when serializing the JSON so that > {{[jsonType|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/rel/externalize/RelJson.java#L712]}} > will be defined in {{{}toRex{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-5607) Serialize return type during RelJson.toJson(RexNode node) serialization
[ https://issues.apache.org/jira/browse/CALCITE-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775551#comment-17775551 ] Oliver Lee edited comment on CALCITE-5607 at 10/16/23 6:35 AM: --- Hey [~julianhyde] , I finally got around to following up on this. I see that prior to my change that CAST was actually [handled separately|https://github.com/apache/calcite/pull/3129/files#diff-673904825afdbc42629c1eeb5abc0d713687722fcdead867cdbe460ebddc1e9cL563] by adding in the "type" to the JSON serialization. My change took that part out of the switch-statement and made it happen for all {{{}RexCall{}}}s. Now that I think about it, I should keep the switch-statement for the CAST and add in another clause specifically for {{SqlKind.MINUS}} , to not abandon deriving the type from the arguments for the rest of RexCalls. If you could give me confirmation, I can go ahead and update the PR. was (Author: JIRAUSER297744): Hey [~julianhyde] , I finally got around to following up on this. I see that in my change that CAST was actually [handled separately|https://github.com/apache/calcite/pull/3129/files#diff-673904825afdbc42629c1eeb5abc0d713687722fcdead867cdbe460ebddc1e9cL563] by adding in the "type" to the JSON serialization. My change took that part out of the switch-statement and made it happen for all {{{}RexCall{}}}s. Now that I think about it, I should keep the switch-statement for the CAST and add in another clause specifically for {{SqlKind.MINUS}} , to not abandon deriving the type from the arguments for the rest of RexCalls. If you could give me confirmation, I can go ahead and update the PR. > Serialize return type during RelJson.toJson(RexNode node) serialization > > > Key: CALCITE-5607 > URL: https://issues.apache.org/jira/browse/CALCITE-5607 > Project: Calcite > Issue Type: Improvement >Reporter: Oliver Lee >Assignee: Oliver Lee >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > We found a bug in {{RelJson#toRex}} for the {{TIMESTAMP_DIFF}} call for Big > Query dialect. > {{TIMESTAMP_DIFF}} is translated to the {{MINUS_DATE}} > [operator|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/sql2rel/StandardConvertletTable.java#L2113-L2116] > with a return type explicitly declared as the interval. > {{MINUS_DATE}} uses an > {{[ARG2_NULLABLE|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/sql/type/ReturnTypes.java#L241]}} > return type inference which requires 3 operands. This is fine in most cases > where the RexCall is then used to generate SQL or for native implementations. > However, in {{{}RelJson#toRex{}}}, when it tries to reconstruct the entire > call to a RexNode, it attempts to derive the return type of the > {{MINUS_DATE}} operator using the {{ARG2_NULLABLE}} inference. This throws an > error as there are only 2 operands given to the {{MINUS_DATE}} operator. > We'd like to now add in the "type" when serializing the JSON so that > {{[jsonType|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/rel/externalize/RelJson.java#L712]}} > will be defined in {{{}toRex{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)