[jira] [Commented] (CALCITE-3933) Incorrect SQL Emitted for Unicode for Several Dialects

2023-10-16 Thread LakeShen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776007#comment-17776007
 ] 

LakeShen commented on CALCITE-3933:
---

I have saw the source code,and I think that I could try to fix this problem.

Hi [~aryeh] ,do you want to fix this problem?If not ,maybe I could try to fix 
this problem.

> Incorrect SQL Emitted for Unicode for Several Dialects
> --
>
> Key: CALCITE-3933
> URL: https://issues.apache.org/jira/browse/CALCITE-3933
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.22.0
> Environment: master with latest commit on April 15 (
> dfb842e55e1fa7037c8a731341010ed1c0cfb6f7)
>Reporter: Aryeh Hillman
>Priority: Major
>
> A string literal like "schön" should emit "schön" in SQL for many dialects, 
> but instead emits
> {code:java}
> u&'sch\\00f6n' {code}
> which is (ISO-8859-1 ASCII). 
> It's possible that some of the above dialects may support ISO-8859, but in my 
> tests with *BigQuery Standard SQL*, *MySQL*, and *Redshift* engines, the 
> following fails:
> {code:java}
> select u&'sch\\00f6n';{code}
> But this succeeds:
> {code:java}
> select 'schön'; {code}
> Test that demonstrates (add to 
> `org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java` and run from 
> there):
> {code:java}
> @Test void testBigQueryUnicode() {
>   final Function relFn = b ->
>   b.scan("EMP")
>   .filter(
>   b.call(SqlStdOperatorTable.IN, b.field("ENAME"),
>   b.literal("schön")))
>   .build();
>   final String expectedSql = "SELECT *\n" +
>   "FROM scott.EMP\n" +
>   "WHERE ENAME IN ('schön')";
>   relFn(relFn).withBigQuery().ok(expectedSql);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-6041) Map query failed with NullPointerException in runtime phase

2023-10-16 Thread Ran Tao (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ran Tao reassigned CALCITE-6041:


Assignee: Ran Tao

> Map query failed with NullPointerException in runtime phase
> ---
>
> Key: CALCITE-6041
> URL: https://issues.apache.org/jira/browse/CALCITE-6041
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Ran Tao
>Assignee: Ran Tao
>Priority: Major
>
> calcite support array/map/multiset query constructor, but if we run map query 
> such as:
> {code:java}
> select map(select 1, 2)
> select map(select empno, deptno from emps); {code}
> If will cause exception:
> {noformat}
> java.sql.SQLException: Error while executing SQL "select map(select 1, 2)": 
> Unable to implement EnumerableNestedLoopJoin(condition=[true], 
> joinType=[semi]): rowcount = 1.0, cumulative cost = \{13.0 rows, 3.0 cpu, 0.0 
> io}, id = 72
>   EnumerableCollect(field=[x]): rowcount = 1.0, cumulative cost = \{2.0 rows, 
> 2.0 cpu, 0.0 io}, id = 69
>     EnumerableValues(tuples=[[\\{ 1, 2 }]]): rowcount = 1.0, cumulative cost 
> = \{1.0 rows, 1.0 cpu, 0.0 io}, id = 38
>   EnumerableValues(tuples=[[\\{ 0 }]]): rowcount = 1.0, cumulative cost = 
> \{1.0 rows, 1.0 cpu, 0.0 io}, id = 35
>     at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>     at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>     at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:164)
>     at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:228)
>     at 
> org.apache.calcite.test.SqlOperatorTest$TesterImpl.check(SqlOperatorTest.java:13107)
>     at 
> org.apache.calcite.sql.test.SqlOperatorFixture.check(SqlOperatorFixture.java:439)
>     at 
> org.apache.calcite.sql.test.SqlOperatorFixture.check(SqlOperatorFixture.java:415)
>     at 
> org.apache.calcite.sql.test.SqlOperatorFixture.check(SqlOperatorFixture.java:420)
>     at 
> org.apache.calcite.test.SqlOperatorTest.testMapQueryConstructor(SqlOperatorTest.java:10235)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:217)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:213)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:138)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:68)
>     at 
> 

[jira] [Commented] (CALCITE-6052) SqlImplementor writes FLOATING POINT literals as DECIMAL literals

2023-10-16 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775982#comment-17775982
 ] 

Julian Hyde commented on CALCITE-6052:
--

Good idea. I like small changes. I suggest changing 'FLOATING POINT' to 'REAL, 
FLOAT and DOUBLE' in the summary, since those are the specific SQL type names.

One test in the Pig module isn't sufficient. Can you add a test, with one 
column for each type, in SqlToRelConverterTest.

Do a quick experiment to see whether IEEE special values work (+inf, -inf, nan, 
-0), and if so, add them to the test. I don't recall whether SQL supports those 
values, but the test could at least document our current behavior.

> SqlImplementor writes FLOATING POINT literals as DECIMAL literals
> -
>
> Key: CALCITE-6052
> URL: https://issues.apache.org/jira/browse/CALCITE-6052
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> This bug is already fixed in https://github.com/apache/calcite/pull/3411, but 
> I plan to submit a smaller point fix for it, which doesn't require reworking 
> the type families.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6045) CURRENT_TIMESTAMP has incorrect return type

2023-10-16 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775977#comment-17775977
 ] 

Julian Hyde commented on CALCITE-6045:
--

I think that Calcite's {{CURRENT_TIMESTAMP}} function should have type 
{{TIMESTAMP WITH LOCAL TIME ZONE}}, which means that (after type alias 
translation) it will have the requisite type for BigQuery (what BigQuery calls 
{{TIMESTAMP}}). (That is a change to current behavior, and a departure from the 
ISO standard, but still the best type, in my opinion.)

If people want a {{TIMESTAMP}} (what BigQuery calls {{DATETIME}}) they can call 
{{LOCALTIMESTAMP}}, whose behavior will be unchanged.

> CURRENT_TIMESTAMP has incorrect return type
> ---
>
> Key: CALCITE-6045
> URL: https://issues.apache.org/jira/browse/CALCITE-6045
> Project: Calcite
>  Issue Type: Bug
>Reporter: Tanner Clary
>Priority: Major
>
> When trying to work on CALCITE-6021, I noticed that {{CURRENT_TIMESTAMP}} 
> currently returns type {{TIMESTAMP}} when it should be 
> {{TIMESTAMP_WITH_LOCAL_TIME_ZONE}}.
> After modifying it, I noticed function was returning the time from (UTC - 
> System TZ) hours ago. For example, I am in {{America/Los_Angeles}} and if I 
> called the function at {{2023-10-10 13:28:00 America/Los_Angeles}}, it would 
> return {{2023-10-10 06:28:00 America/Los_Angeles}}. 
> I think this is because the DataContext {{CURRENT_TIMESTAMP}} variable, which 
> is meant to represent milliseconds since epoch UTC, actually has the timezone 
> offset applied in {{CalciteConnectionImpl#DataContextImpl}} 
> [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L442].
>  To be clear: it is meant to represent millis since epoch UTC, but instead it 
> is millis since epoch [system tz], as I understand it. 
> Additionally, I believe the {{getString()}} method for timestamps in 
> AvaticaResultSet should behave similarly to 
> [{{SqlFunctions#timestampWithLocalTimezoneToString()}}|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4021]
>  when dealing with a {{TIMESTAMP WITH LOCAL TIME ZONE}}. Right now, it does 
> not take the timezone into consideration so although it represents the 
> accurate instant in time, it displays differently than 
> {{CAST(CURRENT_TIMESTAMP AS VARCHAR)}}.
> For example, {{SELECT CURRENT_TIMESTAMP, CAST(CURRENT_TIMESTAMP AS 
> VARCHAR)}}, with the correct return type, returns something like:
> {{2023-10-10 13:28:00 |  2023-10-10 06:28:00.000 America/Los_Angeles}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-3933) Incorrect SQL Emitted for Unicode for Several Dialects

2023-10-16 Thread LakeShen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775957#comment-17775957
 ] 

LakeShen commented on CALCITE-3933:
---

Maybe we could according to SqlDialect#databaseProduct's type, writing 
different behavior in `SqlDialect#quoteStringLiteralUnicode` method.

> Incorrect SQL Emitted for Unicode for Several Dialects
> --
>
> Key: CALCITE-3933
> URL: https://issues.apache.org/jira/browse/CALCITE-3933
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.22.0
> Environment: master with latest commit on April 15 (
> dfb842e55e1fa7037c8a731341010ed1c0cfb6f7)
>Reporter: Aryeh Hillman
>Priority: Major
>
> A string literal like "schön" should emit "schön" in SQL for many dialects, 
> but instead emits
> {code:java}
> u&'sch\\00f6n' {code}
> which is (ISO-8859-1 ASCII). 
> It's possible that some of the above dialects may support ISO-8859, but in my 
> tests with *BigQuery Standard SQL*, *MySQL*, and *Redshift* engines, the 
> following fails:
> {code:java}
> select u&'sch\\00f6n';{code}
> But this succeeds:
> {code:java}
> select 'schön'; {code}
> Test that demonstrates (add to 
> `org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java` and run from 
> there):
> {code:java}
> @Test void testBigQueryUnicode() {
>   final Function relFn = b ->
>   b.scan("EMP")
>   .filter(
>   b.call(SqlStdOperatorTable.IN, b.field("ENAME"),
>   b.literal("schön")))
>   .build();
>   final String expectedSql = "SELECT *\n" +
>   "FROM scott.EMP\n" +
>   "WHERE ENAME IN ('schön')";
>   relFn(relFn).withBigQuery().ok(expectedSql);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6045) CURRENT_TIMESTAMP has incorrect return type

2023-10-16 Thread Tanner Clary (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775934#comment-17775934
 ] 

Tanner Clary commented on CALCITE-6045:
---

For FLOOR and CEIL, we added some logic to the parser that checks the current 
conformance and if it is BQ, then use the BQ-specific operator. I think Jerin 
is dealing with something similar and a lot of the suggestions (that I have 
had, at least) involve checking the conformance. Then maybe you could have an 
operator like CURRENT_TIMESTAMP_BQ or something similar. See FLOOR/CEIL like I 
mentioned or SUBSTR. If there's another difference between the operators, like 
operand count, which I don't think is applicable here, you could also use that 
instead.

> CURRENT_TIMESTAMP has incorrect return type
> ---
>
> Key: CALCITE-6045
> URL: https://issues.apache.org/jira/browse/CALCITE-6045
> Project: Calcite
>  Issue Type: Bug
>Reporter: Tanner Clary
>Priority: Major
>
> When trying to work on CALCITE-6021, I noticed that {{CURRENT_TIMESTAMP}} 
> currently returns type {{TIMESTAMP}} when it should be 
> {{TIMESTAMP_WITH_LOCAL_TIME_ZONE}}.
> After modifying it, I noticed function was returning the time from (UTC - 
> System TZ) hours ago. For example, I am in {{America/Los_Angeles}} and if I 
> called the function at {{2023-10-10 13:28:00 America/Los_Angeles}}, it would 
> return {{2023-10-10 06:28:00 America/Los_Angeles}}. 
> I think this is because the DataContext {{CURRENT_TIMESTAMP}} variable, which 
> is meant to represent milliseconds since epoch UTC, actually has the timezone 
> offset applied in {{CalciteConnectionImpl#DataContextImpl}} 
> [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L442].
>  To be clear: it is meant to represent millis since epoch UTC, but instead it 
> is millis since epoch [system tz], as I understand it. 
> Additionally, I believe the {{getString()}} method for timestamps in 
> AvaticaResultSet should behave similarly to 
> [{{SqlFunctions#timestampWithLocalTimezoneToString()}}|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4021]
>  when dealing with a {{TIMESTAMP WITH LOCAL TIME ZONE}}. Right now, it does 
> not take the timezone into consideration so although it represents the 
> accurate instant in time, it displays differently than 
> {{CAST(CURRENT_TIMESTAMP AS VARCHAR)}}.
> For example, {{SELECT CURRENT_TIMESTAMP, CAST(CURRENT_TIMESTAMP AS 
> VARCHAR)}}, with the correct return type, returns something like:
> {{2023-10-10 13:28:00 |  2023-10-10 06:28:00.000 America/Los_Angeles}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6045) CURRENT_TIMESTAMP has incorrect return type

2023-10-16 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775930#comment-17775930
 ] 

Will Noble commented on CALCITE-6045:
-

This would seem to be a case where the standard SQL function 
{{CURRENT_TIMESTAMP}} (meant to return a {{{}TIMESTAMP WITH TIME ZONE}}) has a 
name collision with the BigQuery-specific function {{CURRENT_TIMESTAMP}} (mean 
to return a {{{}TIMESTAMP WITH LOCAL TIME ZONE}} a.k.a. BigQuery-specific 
{{TIMESTAMP}}). Is there a standard procedure for handling function name 
collisions between standard SQL and particular dialects?

> CURRENT_TIMESTAMP has incorrect return type
> ---
>
> Key: CALCITE-6045
> URL: https://issues.apache.org/jira/browse/CALCITE-6045
> Project: Calcite
>  Issue Type: Bug
>Reporter: Tanner Clary
>Priority: Major
>
> When trying to work on CALCITE-6021, I noticed that {{CURRENT_TIMESTAMP}} 
> currently returns type {{TIMESTAMP}} when it should be 
> {{TIMESTAMP_WITH_LOCAL_TIME_ZONE}}.
> After modifying it, I noticed function was returning the time from (UTC - 
> System TZ) hours ago. For example, I am in {{America/Los_Angeles}} and if I 
> called the function at {{2023-10-10 13:28:00 America/Los_Angeles}}, it would 
> return {{2023-10-10 06:28:00 America/Los_Angeles}}. 
> I think this is because the DataContext {{CURRENT_TIMESTAMP}} variable, which 
> is meant to represent milliseconds since epoch UTC, actually has the timezone 
> offset applied in {{CalciteConnectionImpl#DataContextImpl}} 
> [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L442].
>  To be clear: it is meant to represent millis since epoch UTC, but instead it 
> is millis since epoch [system tz], as I understand it. 
> Additionally, I believe the {{getString()}} method for timestamps in 
> AvaticaResultSet should behave similarly to 
> [{{SqlFunctions#timestampWithLocalTimezoneToString()}}|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4021]
>  when dealing with a {{TIMESTAMP WITH LOCAL TIME ZONE}}. Right now, it does 
> not take the timezone into consideration so although it represents the 
> accurate instant in time, it displays differently than 
> {{CAST(CURRENT_TIMESTAMP AS VARCHAR)}}.
> For example, {{SELECT CURRENT_TIMESTAMP, CAST(CURRENT_TIMESTAMP AS 
> VARCHAR)}}, with the correct return type, returns something like:
> {{2023-10-10 13:28:00 |  2023-10-10 06:28:00.000 America/Los_Angeles}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6052) SqlImplementor writes FLOATING POINT literals as DECIMAL literals

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu updated CALCITE-6052:
-
Summary: SqlImplementor writes FLOATING POINT literals as DECIMAL literals  
(was: reltosql writes FLOATING POINT literals as DECIMAL literals)

> SqlImplementor writes FLOATING POINT literals as DECIMAL literals
> -
>
> Key: CALCITE-6052
> URL: https://issues.apache.org/jira/browse/CALCITE-6052
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> This bug is already fixed in https://github.com/apache/calcite/pull/3411, but 
> I plan to submit a smaller point fix for it, which doesn't require reworking 
> the type families.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6052) reltosql writes FLOATING POINT literals as DECIMAL literals

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu updated CALCITE-6052:
-
Fix Version/s: 1.36.0

> reltosql writes FLOATING POINT literals as DECIMAL literals
> ---
>
> Key: CALCITE-6052
> URL: https://issues.apache.org/jira/browse/CALCITE-6052
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> This bug is already fixed in https://github.com/apache/calcite/pull/3411, but 
> I plan to submit a smaller point fix for it, which doesn't require reworking 
> the type families.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6052) reltosql writes FLOATING POINT literals as DECIMAL literals

2023-10-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-6052:

Labels: pull-request-available  (was: )

> reltosql writes FLOATING POINT literals as DECIMAL literals
> ---
>
> Key: CALCITE-6052
> URL: https://issues.apache.org/jira/browse/CALCITE-6052
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
>
> This bug is already fixed in https://github.com/apache/calcite/pull/3411, but 
> I plan to submit a smaller point fix for it, which doesn't require reworking 
> the type families.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5987) SqlImplementor loses type information for literals

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu reassigned CALCITE-5987:


Assignee: Mihai Budiu

> SqlImplementor loses type information for literals
> --
>
> Key: CALCITE-5987
> URL: https://issues.apache.org/jira/browse/CALCITE-5987
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>
> When converting a SqlNode to a String query, the conversion can produce SQL 
> that computes different results. This happens because literals do not carry 
> type information in the result string. For example, this plan:
> {code}
> rel#7:LogicalValues.(type=RecordType(VARCHAR(3) EXPR$0),tuples=[{ 'A' }])
> {code}
> will generate a SQL query:
> {code}
> SELECT 'A'
> {code}
> While the type of the former result is VARCHAR(3), the latter query produces 
> a CHAR(1) result.
> It would be nice if SqlImplementor had an option to produce a query that 
> preserves the output type, e.g.:
> {code}
> SELECT (CAST 'A' as VARCHAR(3))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5891) Create a test fixture that would apply PROJECT_REDUCE_EXPRESSIONS to all tests in SqlOperatorTest

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu reassigned CALCITE-5891:


Assignee: Mihai Budiu

> Create a test fixture that would apply PROJECT_REDUCE_EXPRESSIONS to all 
> tests in SqlOperatorTest 
> --
>
> Key: CALCITE-5891
> URL: https://issues.apache.org/jira/browse/CALCITE-5891
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Major
>
> SqlOperatorTest has many tests, including end-to-end tests.
> However, none of these tests exercise the PROJECT_REDUCE_EXPRESSION rules, 
> which often produce different results than these tests for constant 
> expressions.
> Ideally we should be able to subclass SqlOperatorTest and use a fixture that 
> also applies this optimization prior to evaluation.
> I have marked this as a {*}major priority{*}, because I suspect it would 
> catch many bugs with minimal effort. (I have found at least 10 so far.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5884) Description of ARRAY_TO_STRING function is incomplete

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu reassigned CALCITE-5884:


Assignee: Mihai Budiu

> Description of ARRAY_TO_STRING function is incomplete
> -
>
> Key: CALCITE-5884
> URL: https://issues.apache.org/jira/browse/CALCITE-5884
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> This is the current definition of the function ARRAY_TO_STRING in 
> SqlLibraryOperators:
> {code:java}
>   /** The "ARRAY_TO_STRING(array, delimiter [, nullText ])" function. */
>   @LibraryOperator(libraries = {BIG_QUERY})
>   public static final SqlFunction ARRAY_TO_STRING =
>   SqlBasicFunction.create(SqlKind.ARRAY_TO_STRING,
>   ReturnTypes.VARCHAR_NULLABLE,
>   OperandTypes.STRING_ARRAY_CHARACTER_OPTIONAL_CHARACTER);
> {code}
> So the result is nullable if any of the arguments is nullable. However, the 
> nullability of the last argument does not influence the result nullabillity: 
> a NULL value for the third optional argument will not cause a NULL value to 
> be output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5998) The SAFE_OFFSET operator can cause an index out of bounds exception

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu reassigned CALCITE-5998:


Assignee: Tanner Clary

> The SAFE_OFFSET operator can cause an index out of bounds exception
> ---
>
> Key: CALCITE-5998
> URL: https://issues.apache.org/jira/browse/CALCITE-5998
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Tanner Clary
>Priority: Minor
>
> The following query, when added as a SqlOperatorTest:
> {code:sql}
> select ARRAY[p3,p2,p1][SAFE_OFFSET(p0)] from (values (-1, 6, 4, 2)) as t(p0, 
> p1, p2, p3)
> {code}
> causes an exception. Here is the top of the stack trace:
> {code:java}
> Array index -1 is out of bounds
> org.apache.calcite.runtime.CalciteException: Array index -1 is out of bounds
>   at 
> java.base@11.0.18/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method)
>   at 
> java.base@11.0.18/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> java.base@11.0.18/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at 
> java.base@11.0.18/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
>   at 
> app//org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:507)
>   at 
> app//org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:601)
>   at 
> app//org.apache.calcite.runtime.SqlFunctions.arrayItem(SqlFunctions.java:4742)
>   at 
> app//org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(SqlFunctions.java:4780)
>   at Baz$1$1.current(Unknown Source)
>   at 
> app//org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.next(Linq4j.java:687)
>   at 
> app//org.apache.calcite.avatica.util.IteratorCursor.next(IteratorCursor.java:46)
>   at 
> app//org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:219)
>   at 
> app//org.apache.calcite.sql.test.ResultCheckers.compareResultSet(ResultCheckers.java:128)
>   at 
> app//org.apache.calcite.sql.test.ResultCheckers$RefSetResultChecker.checkResult(ResultCheckers.java:336)
>   at 
> app//org.apache.calcite.test.SqlOperatorTest$TesterImpl.check(SqlOperatorTest.java:12987)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5986) The typeFamily property of SqlTypeName is used inconsistently

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu reassigned CALCITE-5986:


Assignee: Mihai Budiu

> The typeFamily property of SqlTypeName is used inconsistently
> -
>
> Key: CALCITE-5986
> URL: https://issues.apache.org/jira/browse/CALCITE-5986
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
>
> In SqlTypeFamily we have this code:
> {code:java}
> private static final Map JDBC_TYPE_TO_FAMILY =
> ...
>   .put(Types.FLOAT, NUMERIC)
>   .put(Types.REAL, NUMERIC)
>   .put(Types.DOUBLE, NUMERIC)
> {code}
> But it looks to me like the type family should be APPROXIMATE_NUMERIC.
> This impacts the way RelToSqlConverter works, for instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-6014) Create a SqlOperatorFixture that parses, unparses, and then parses again before executing

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu reassigned CALCITE-6014:


Assignee: Mihai Budiu

> Create a SqlOperatorFixture that parses, unparses, and then parses again 
> before executing
> -
>
> Key: CALCITE-6014
> URL: https://issues.apache.org/jira/browse/CALCITE-6014
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> Such a fixture will help catch bugs in the unparsing code.
> Several bugs were found using this technique, e.g., CALCITE-5997.
> This is related to CALCITE-5891, CALCITE-6000.
> The SqlParserFixture UnparsingTesterImpl provides a similar service, but 
> since it does not validate the code after unparsing, it will catch fewer bugs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-6029) SqlOperatorTest cannot test operators that require the Babel parser

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu reassigned CALCITE-6029:


Assignee: Mihai Budiu

> SqlOperatorTest cannot test operators that require the Babel parser
> ---
>
> Key: CALCITE-6029
> URL: https://issues.apache.org/jira/browse/CALCITE-6029
> Project: Calcite
>  Issue Type: Bug
>  Components: babel, core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
>
> In SqlOperatorTest one can write code like this:
> {code:java}
> @Test void testDatePart() {
> final SqlOperatorFixture f = fixture().withLibrary(SqlLibrary.POSTGRESQL)
> .withParserConfig(p -> 
> p.withParserFactory(SqlBabelParserImpl.FACTORY));
> {code}
> This almost works, but the SqlOperatorTest.check function makes a connection 
> ignores the parserFactory, so parsing will fail:
> {code:java}
> @Override public void check(SqlTestFactory factory, String query,
> SqlTester.TypeChecker typeChecker,
> SqlTester.ParameterChecker parameterChecker,
> SqlTester.ResultChecker resultChecker) {
>   super.check(factory, query, typeChecker, parameterChecker, 
> resultChecker);
>   final RelDataTypeSystem typeSystem =
>   factory.typeSystemTransform.apply(RelDataTypeSystem.DEFAULT);
>   final ConnectionFactory connectionFactory =
>   factory.connectionFactory
>   .with(CalciteConnectionProperty.TYPE_SYSTEM, uri(FIELD));  /// 
> NO PARSER_FACTORY HERE
> {code}
> I am trying to fix this by adding a PARSER_FACTORY argument to the 
> connection, but then I get a class loader error from 
> AvaticaUtils.instantiatePlugin, which, in this case, cannot find the 
> SqlBabelParserImpl#FACTORY in the classpath.
> I would appreciate some help solving this last bit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-6030) DATE_PART is not handled by the RexToLixTranslator

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu reassigned CALCITE-6030:


Assignee: Mihai Budiu

> DATE_PART is not handled by the RexToLixTranslator
> --
>
> Key: CALCITE-6030
> URL: https://issues.apache.org/jira/browse/CALCITE-6030
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> The following test, when added to SqlOperatorTest, causes a RuntimeException:
> {code:java}
> @Test void testDatePart() {
> final SqlOperatorFixture f = fixture().withLibrary(SqlLibrary.POSTGRESQL)
> .withParserConfig(p -> 
> p.withParserFactory(SqlBabelParserImpl.FACTORY));
> f.checkScalar("DATE_PART(second, TIME '10:10:10')",
> "10", "BIGINT NOT NULL");
>   }
> {code}
> Note that this needs https://github.com/apache/calcite/pull/3445 to execute 
> correctly.
> The stack trace is:
> {code:java}
> Suppressed: java.lang.RuntimeException: cannot translate call DATE_PART($t1, 
> $t2)
>   at 
> org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitCall(RexToLixTranslator.java:1160)
>   at 
> org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitCall(RexToLixTranslator.java:101)
>   at org.apache.calcite.rex.RexCall.accept(RexCall.java:189)
> {code}
> According to the documentation DATE_PART is just an alias for EXTRACT, which 
> is (mostly) implemented, so this should work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6052) reltosql writes FLOATING POINT literals as DECIMAL literals

2023-10-16 Thread Mihai Budiu (Jira)
Mihai Budiu created CALCITE-6052:


 Summary: reltosql writes FLOATING POINT literals as DECIMAL 
literals
 Key: CALCITE-6052
 URL: https://issues.apache.org/jira/browse/CALCITE-6052
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.35.0
Reporter: Mihai Budiu


This bug is already fixed in https://github.com/apache/calcite/pull/3411, but I 
plan to submit a smaller point fix for it, which doesn't require reworking the 
type families.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-6052) reltosql writes FLOATING POINT literals as DECIMAL literals

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu reassigned CALCITE-6052:


Assignee: Mihai Budiu

> reltosql writes FLOATING POINT literals as DECIMAL literals
> ---
>
> Key: CALCITE-6052
> URL: https://issues.apache.org/jira/browse/CALCITE-6052
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>
> This bug is already fixed in https://github.com/apache/calcite/pull/3411, but 
> I plan to submit a smaller point fix for it, which doesn't require reworking 
> the type families.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6030) DATE_PART is not handled by the RexToLixTranslator

2023-10-16 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu updated CALCITE-6030:
-
Fix Version/s: 1.36.0

> DATE_PART is not handled by the RexToLixTranslator
> --
>
> Key: CALCITE-6030
> URL: https://issues.apache.org/jira/browse/CALCITE-6030
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> The following test, when added to SqlOperatorTest, causes a RuntimeException:
> {code:java}
> @Test void testDatePart() {
> final SqlOperatorFixture f = fixture().withLibrary(SqlLibrary.POSTGRESQL)
> .withParserConfig(p -> 
> p.withParserFactory(SqlBabelParserImpl.FACTORY));
> f.checkScalar("DATE_PART(second, TIME '10:10:10')",
> "10", "BIGINT NOT NULL");
>   }
> {code}
> Note that this needs https://github.com/apache/calcite/pull/3445 to execute 
> correctly.
> The stack trace is:
> {code:java}
> Suppressed: java.lang.RuntimeException: cannot translate call DATE_PART($t1, 
> $t2)
>   at 
> org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitCall(RexToLixTranslator.java:1160)
>   at 
> org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitCall(RexToLixTranslator.java:101)
>   at org.apache.calcite.rex.RexCall.accept(RexCall.java:189)
> {code}
> According to the documentation DATE_PART is just an alias for EXTRACT, which 
> is (mostly) implemented, so this should work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5990) Explicit cast to numeric type doesn't check overflow

2023-10-16 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775914#comment-17775914
 ] 

Mihai Budiu commented on CALCITE-5990:
--

I already have a working fix, which I will submit once we have solved 
CALCITE-5921. 

> Explicit cast to numeric type doesn't check overflow
> 
>
> Key: CALCITE-5990
> URL: https://issues.apache.org/jira/browse/CALCITE-5990
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Runkang He
>Assignee: Runkang He
>Priority: Blocker
> Fix For: 1.36.0
>
>
> Explicit cast to numeric type doesn't check overflow, and this issue can be 
> reproduced by sqlline:
> {code:sql}
> select cast(empno as tinyint), cast(130 as tinyint) from emps where 
> name='Alice'; -- empno is 130
> {code}
> The empno is INT type. The result is wrong:
> {code:sql}
> -126, -126{code}
> I think it should throw exception when overflow, instead of returning wrong 
> result to user.
> At last, this issue was found when to turn on runtime check for 
> CalciteSqlOperatorTest in CALCITE-5921.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6001) Add useUtf8AsDefaultCharset flag to SqlConformanceEnum to allow encoding of non-ISO-8859-1 characters

2023-10-16 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775894#comment-17775894
 ] 

Julian Hyde commented on CALCITE-6001:
--

Can you take a look at CALCITE-3933? I think it's related, albeit a bigger 
issue. If solving this issue (6001) goes some way to solving 3933 I think we 
should do it. Your PR looks good (it just needs a little cleanup).

> Add useUtf8AsDefaultCharset flag to SqlConformanceEnum to allow encoding of 
> non-ISO-8859-1 characters
> -
>
> Key: CALCITE-6001
> URL: https://issues.apache.org/jira/browse/CALCITE-6001
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Tanner Clary
>Assignee: Tanner Clary
>Priority: Major
>  Labels: pull-request-available
>
> Many dialects supported by Calcite encode their strings using a default 
> charset (most commonly UTF-8 or ISO-8859-1). For example, BigQuery uses 
> [UTF-8|https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type].
>  I am proposing to add a dialect property to be referenced when converting 
> string literals so that the current dialect's default is used unless 
> otherwise specified.
> Presently, if no charset is specified when converting to RexLiterals 
> [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rex/RexBuilder.java#L1618],
>  the CalciteSystemProperty {{DEFAULT_CHARSET}} is used 
> ([docs|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/config/CalciteSystemProperty.java#L300])
>  which is set as ISO-8859-1.
> This means that when converting a query like:
> {{select 'ק' as result;}}
>  you will get the following the error: {{Failed to encode 'ק' in character 
> set 'ISO-8859-1'}}.
> This failure is unexpected if you are using BigQuery conformance(or any 
> dialect whose default is UTF-8).
> Of course an alternative solution would be to just change the Calcite default 
> to UTF-8 which supports encoding any UNICODE character while ISO-8859-1 can 
> only encode the first 256, but I imagine there are reasons against this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-3933) Incorrect SQL Emitted for Unicode for Several Dialects

2023-10-16 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775893#comment-17775893
 ] 

Julian Hyde commented on CALCITE-3933:
--

A related issue is CALCITE-6001. If Calcite knows that a DB can handle a larger 
character set, it can generate literals in that character set, and won't need 
to use Unicode encoding.

> Incorrect SQL Emitted for Unicode for Several Dialects
> --
>
> Key: CALCITE-3933
> URL: https://issues.apache.org/jira/browse/CALCITE-3933
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.22.0
> Environment: master with latest commit on April 15 (
> dfb842e55e1fa7037c8a731341010ed1c0cfb6f7)
>Reporter: Aryeh Hillman
>Priority: Major
>
> A string literal like "schön" should emit "schön" in SQL for many dialects, 
> but instead emits
> {code:java}
> u&'sch\\00f6n' {code}
> which is (ISO-8859-1 ASCII). 
> It's possible that some of the above dialects may support ISO-8859, but in my 
> tests with *BigQuery Standard SQL*, *MySQL*, and *Redshift* engines, the 
> following fails:
> {code:java}
> select u&'sch\\00f6n';{code}
> But this succeeds:
> {code:java}
> select 'schön'; {code}
> Test that demonstrates (add to 
> `org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java` and run from 
> there):
> {code:java}
> @Test void testBigQueryUnicode() {
>   final Function relFn = b ->
>   b.scan("EMP")
>   .filter(
>   b.call(SqlStdOperatorTable.IN, b.field("ENAME"),
>   b.literal("schön")))
>   .build();
>   final String expectedSql = "SELECT *\n" +
>   "FROM scott.EMP\n" +
>   "WHERE ENAME IN ('schön')";
>   relFn(relFn).withBigQuery().ok(expectedSql);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CALCITE-6046) SQL parser failed when parsing a comment string start with ''

2023-10-16 Thread Julian Hyde (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde resolved CALCITE-6046.
--
Fix Version/s: (was: 1.36.0)
   Resolution: Duplicate

> SQL parser failed when parsing a comment string start with ''
> ---
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6046) SQL parser failed when parsing a comment string start with ''

2023-10-16 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775891#comment-17775891
 ] 

Julian Hyde commented on CALCITE-6046:
--

Also a duplicate of CALCITE-3933.

> SQL parser failed when parsing a comment string start with ''
> ---
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775890#comment-17775890
 ] 

Julian Hyde commented on CALCITE-6051:
--

I believe this is a duplicate of CALCITE-3933. 

> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> The queries fail when we pass a query containing this encoding. 
> Also tested the same query you've shared on hive and spark:
> Hive:
> {code:java}
> select u&'hello world';
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible 
> column names are: ) (state=42000,code=10004)
> {code}
> Spark:
> {code:java}
> select u&'hello world';
> User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
> resolve 'u' given input columns: []; line 1 pos 7;
> {code}
> This is HiveSqlDialect: 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
> There is no overriding function in HiveSql dialect corresponding to 
> `quoteStringLiteralUnicode` method in SqlDialect.
> Corresponding SparkSqlDialect: 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
>  
> *Ask:*
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Julian Hyde (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde resolved CALCITE-6051.
--
Resolution: Duplicate

> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> The queries fail when we pass a query containing this encoding. 
> Also tested the same query you've shared on hive and spark:
> Hive:
> {code:java}
> select u&'hello world';
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible 
> column names are: ) (state=42000,code=10004)
> {code}
> Spark:
> {code:java}
> select u&'hello world';
> User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
> resolve 'u' given input columns: []; line 1 pos 7;
> {code}
> This is HiveSqlDialect: 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
> There is no overriding function in HiveSql dialect corresponding to 
> `quoteStringLiteralUnicode` method in SqlDialect.
> Corresponding SparkSqlDialect: 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
>  
> *Ask:*
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5763) Discontinue support for Guava < 20.0

2023-10-16 Thread Gian Merlino (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775821#comment-17775821
 ] 

Gian Merlino commented on CALCITE-5763:
---

[~julianhyde] yes, please go ahead. Druid is on Calcite 1.35 now, and recently 
decided to drop support for Hadoop 2 and update to Guava 31.1-jre. That means 
we will be able to update to future Calcite releases that do not support older 
Guavas.

> Discontinue support for Guava < 20.0
> 
>
> Key: CALCITE-5763
> URL: https://issues.apache.org/jira/browse/CALCITE-5763
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> Discontinue support for Guava versions before 20.0, and resume building on 
> the latest Guava. This reverses CALCITE-5477, which changes the build from 
> Guava 31.1-jre to 19.0, and CALCITE-5428, which moves the minimum supported 
> Guava version from 19.0 to 16.0.1.
> This change will happen no earlier than "the first release after August", 
> therefore can be merged to main no earlier than 2023-09-01. I recommend that 
> it is merged very soon after that date. I have set fixVersion = 1.36 assuming 
> that 1.36 is the first release after August.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6046) SQL parser failed when parsing a comment string start with ''

2023-10-16 Thread LakeShen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775779#comment-17775779
 ] 

LakeShen commented on CALCITE-6046:
---

Hi [~zhoujira86] ,is this problem duplicate as CALCITE-6051 ? 


I find the problem's behavior is same as CALCITE-6051 

> SQL parser failed when parsing a comment string start with ''
> ---
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6046) SQL parser failed when parsing a comment string start with ''

2023-10-16 Thread xiaogang zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaogang zhou updated CALCITE-6046:
---
Summary: SQL parser failed when parsing a comment string start with ''  
(was: SQL parser failed when parsing a literal start with '')

> SQL parser failed when parsing a comment string start with ''
> ---
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6046) SQL parser failed when parsing a literal start with ''

2023-10-16 Thread xiaogang zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaogang zhou updated CALCITE-6046:
---
Summary: SQL parser failed when parsing a literal start with ''  (was: 
QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
cause the SqlLiteral)

> SQL parser failed when parsing a literal start with ''
> 
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6046) QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral

2023-10-16 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775755#comment-17775755
 ] 

xiaogang zhou edited comment on CALCITE-6046 at 10/16/23 1:16 PM:
--

Hi [~julianhyde] ,

The behavior I thought was wrong is when I use below code 

 
{code:java}
// code placeholder

SqlParser.Config parserConfig = getCurrentSqlParserConfig(sqlDialect);
SqlParser sqlParser = SqlParser.create(sqlContent, parserConfig);
SqlNodeList sqlNodeList = sqlParser.parseStmtList(); 

sqlParser.parse(sqlNodeList.get(0)); {code}
to parse 

 
{code:java}
// code placeholder
CREATE TABLE source (
    a BIGINT
) comment '测试test'
WITH (
  'connector' = 'test'
);   {code}
then unparse it , I get 

 

 
{code:java}
// code placeholder
CREATE TABLE `source` (
  `a` BIGINT
)
COMMENT u&'\5218\51eftest' WITH (
  'connector' = 'test'
)  {code}
which is not parsable by FLINK sql template 
{code:java}
// code placeholder
[   {
String p = SqlParserUtil.parseString(token.image);
comment = SqlLiteral.createCharString(p, getPos());
}] {code}
 

 

Since you mentioned '' is Standard SQL DIALECT, I think there is nothing 
wrong in CALCITE. If the statement above makes sense to you,  we can just close 
this CALCITE issue, and I will follow it in FLINK issue with FLINK TEAM.

 


was (Author: zhoujira86):
Hi [~julianhyde] ,

The behavior I thought was wrong is when I use below code 

 
{code:java}
// code placeholder

SqlParser.Config parserConfig = getCurrentSqlParserConfig(sqlDialect);
SqlParser sqlParser = SqlParser.create(sqlContent, parserConfig);
SqlNodeList sqlNodeList = sqlParser.parseStmtList(); 

sqlParser.parse(sqlNodeList.get(0)); {code}
to parse 

 
{code:java}
// code placeholder
CREATE TABLE source (
    a BIGINT
) comment '测试test'
WITH (
  'connector' = 'test'
);   {code}
then unparse it , I will get 

 

 
{code:java}
// code placeholder
CREATE TABLE `source` (
  `a` BIGINT
)
COMMENT u&'\5218\51eftest' WITH (
  'connector' = 'test'
)  {code}
which is not parsable by FLINK sql template 
{code:java}
// code placeholder
[   {
String p = SqlParserUtil.parseString(token.image);
comment = SqlLiteral.createCharString(p, getPos());
}] {code}
 

 

Since you mentioned '' is Standard SQL DIALECT, I think there is nothing 
wrong in CALCITE. If the statement above makes sense to you,  we can just close 
this CALCITE issue, and I will follow it in FLINK issue with FLINK TEAM.

 

> QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral
> --
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6046) QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral

2023-10-16 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775755#comment-17775755
 ] 

xiaogang zhou commented on CALCITE-6046:


Hi [~julianhyde] ,

The behavior I thought was wrong is when I use below code 

 
{code:java}
// code placeholder

SqlParser.Config parserConfig = getCurrentSqlParserConfig(sqlDialect);
SqlParser sqlParser = SqlParser.create(sqlContent, parserConfig);
SqlNodeList sqlNodeList = sqlParser.parseStmtList(); 

sqlParser.parse(sqlNodeList.get(0)); {code}
to parse 

 
{code:java}
// code placeholder
CREATE TABLE source (
    a BIGINT
) comment '测试test'
WITH (
  'connector' = 'test'
);   {code}
then unparse it , I will get 

 

 
{code:java}
// code placeholder
CREATE TABLE `source` (
  `a` BIGINT
)
COMMENT u&'\5218\51eftest' WITH (
  'connector' = 'test'
)  {code}
which is not parsable by FLINK sql template 
{code:java}
// code placeholder
[   {
String p = SqlParserUtil.parseString(token.image);
comment = SqlLiteral.createCharString(p, getPos());
}] {code}
 

 

Since you mentioned '' is Standard SQL DIALECT, I think there is nothing 
wrong in CALCITE. If the statement above makes sense to you,  we can just close 
this CALCITE issue, and I will follow it in FLINK issue with FLINK TEAM.

 

> QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral
> --
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Shivangi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775717#comment-17775717
 ] 

Shivangi commented on CALCITE-6051:
---

Makes sense [~shenlang]. I've updated the jira summary and description. 

> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> The queries fail when we pass a query containing this encoding. 
> Also tested the same query you've shared on hive and spark:
> Hive:
> {code:java}
> select u&'hello world';
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible 
> column names are: ) (state=42000,code=10004)
> {code}
> Spark:
> {code:java}
> select u&'hello world';
> User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
> resolve 'u' given input columns: []; line 1 pos 7;
> {code}
> This is HiveSqlDialect: 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
> There is no overriding function in HiveSql dialect corresponding to 
> `quoteStringLiteralUnicode` method in SqlDialect.
> Corresponding SparkSqlDialect: 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
>  
> *Ask:*
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Shivangi (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivangi updated CALCITE-6051:
--
Description: 
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}

The queries fail when we pass a query containing this encoding. 
Also tested the same query you've shared on hive and spark:
Hive:
{code:java}
select u&'hello world';
Error: Error while compiling statement: FAILED: SemanticException [Error 
10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column 
names are: ) (state=42000,code=10004)
{code}

Spark:

{code:java}
select u&'hello world';
User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
resolve 'u' given input columns: []; line 1 pos 7;
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
There is no overriding function in HiveSql dialect corresponding to 
`quoteStringLiteralUnicode` method in SqlDialect.

Corresponding SparkSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
 

*Ask:*

Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 


  was:
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}

The queries fail when we pass a query containing this encoding. For example in 
hive:


{code:java}
select * from somedb.some_table where city_id = u&'Conveni\00eancia';
{code}

Response:
{code:java}
FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or 
column reference 'u': (
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
There is no overriding function in HiveSql dialect corresponding to 
`quoteStringLiteralUnicode` method in SqlDialect.

Corresponding SparkSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
 

*Ask:*

Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 



> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: 

[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Shivangi (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivangi updated CALCITE-6051:
--
Description: 
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}

The queries fail when we pass a query containing this encoding. For example in 
hive:


{code:java}
select * from somedb.some_table where city_id = u&'Conveni\00eancia';
{code}

Response:
{code:java}
FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or 
column reference 'u': (
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
There is no overriding function in HiveSql dialect corresponding to 
`quoteStringLiteralUnicode` method in SqlDialect.

Corresponding SparkSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
 

*Ask:*

Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 


  was:
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}



Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 



> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> 

[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Shivangi (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivangi updated CALCITE-6051:
--
Description: 
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}



Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 


  was:
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}

Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 



> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Shivangi (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivangi updated CALCITE-6051:
--
Summary: Incorrect translation for unicode strings in SqlDialect's 
quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect  (was: 
Incorrect format for unicode strings )

> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread LakeShen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775696#comment-17775696
 ] 

LakeShen edited comment on CALCITE-6051 at 10/16/23 11:13 AM:
--

I'm sure that PG is ok for 'u&',for example:

!image-2023-10-16-18-54-53-483.png|width=436,height=182!

So the problem is that different engines or databases have different levels of 
support  for 'u&',in hive or spark,they don't support the 'u&'.

I think that jira's title could be more clearly about this problem.How about  
`Incorrect translation for unicode strings in SqlDialect's 
quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect`?

At the same time,you should make this JIRA description more clear about your 
problem.

Maybe we could according to SqlDialect#databaseProduct's type, writing 
different behavior in `quoteStringLiteralUnicode` method.


was (Author: shenlang):
I'm sure that PG is ok for 'u&',for example:

!image-2023-10-16-18-54-53-483.png|width=436,height=182!

So the problem is that different engines or databases have different levels of 
support  for 'u&',in hive or spark,they don't support the 'u&'.

I think that jira's title could be more clearly about this problem.How about  
`Incorrect translation for unicode strings in SqlDialect's 
quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect`?

At the same time,you should make this JIRA description more clear about your 
problem.

> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread LakeShen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775696#comment-17775696
 ] 

LakeShen commented on CALCITE-6051:
---

I'm sure that PG is ok for 'u&',for example:

!image-2023-10-16-18-54-53-483.png|width=436,height=182!

So the problem is that different engines or databases have different levels of 
support  for 'u&',in hive or spark,they don't support the 'u&'.

I think that jira's title could be more clearly about this problem.How about  
`Incorrect translation for unicode strings in SqlDialect's 
quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect`?

At the same time,you should make this JIRA description more clear about your 
problem.

> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread Shivangi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775690#comment-17775690
 ] 

Shivangi edited comment on CALCITE-6051 at 10/16/23 11:00 AM:
--

Also tested the same query you've shared on hive and spark:
Hive:

{code:java}
select u&'hello world';
Error: Error while compiling statement: FAILED: SemanticException [Error 
10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column 
names are: ) (state=42000,code=10004)
{code}

Spark: 

{code:java}
select u&'hello world';
User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
resolve 'u' given input columns: []; line 1 pos 7;
{code}



was (Author: shivincible):
Also tested the same query you've shared on hive and spark:
Hive:

{code:java}
select u&'hello world';
Error: Error while compiling statement: FAILED: SemanticException [Error 
10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column 
names are: ) (state=42000,code=10004)
{code}

Spark: 

{code:java}
 select u&'hello world';
User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
resolve 'u' given input columns: []; line 1 pos 7;
{code}


> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread Shivangi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775690#comment-17775690
 ] 

Shivangi commented on CALCITE-6051:
---

Also tested the same query you've shared on hive and spark:
Hive:

{code:java}
select u&'hello world';
Error: Error while compiling statement: FAILED: SemanticException [Error 
10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column 
names are: ) (state=42000,code=10004)
{code}

Spark: 

{code:java}
 select u&'hello world';
User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
resolve 'u' given input columns: []; line 1 pos 7;
{code}


> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread LakeShen (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LakeShen updated CALCITE-6051:
--
Attachment: image-2023-10-16-18-54-53-483.png

> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread Shivangi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775678#comment-17775678
 ] 

Shivangi edited comment on CALCITE-6051 at 10/16/23 10:40 AM:
--

Thanks for the quick response [~shenlang]! 
We are using SQLDialect for Hive and Spark. For both the cases, the queries 
fail when we pass a query containing this encoding. For example in hive:

{code:java}
select * from somedb.some_table where city_id = u&'Conveni\00eancia';
{code}

Response:
{code:java}
FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or 
column reference 'u': (
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
 
There is no overriding function in HiveSql dialect corresponding to 
`quoteStringLiteralUnicode` method in SqlDialect.  

So, is the output returned by SqlDialect containing `u&'` valid wrt to 
Postgres? Am I missing something here? 


was (Author: shivincible):
Thanks for the quick response [~shenlang]! 
We are using SQLDialect for Hive and Spark. For both the cases, the queries 
fail when we pass a query containing this encoding. For example in hive:

{code:java}
select * from somedb.some_table where city_id = u&'Conveni\00eancia';
{code}

Response:
{code:java}
FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or 
column reference 'u': (
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
 

So, is the output returned by SqlDialect containing `u&'` valid wrt to Presto? 
Am I missing something here? 

> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread Shivangi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775678#comment-17775678
 ] 

Shivangi commented on CALCITE-6051:
---

Thanks for the quick response [~shenlang]! 
We are using SQLDialect for Hive and Spark. For both the cases, the queries 
fail when we pass a query containing this encoding. For example in hive:

{code:java}
select * from somedb.some_table where city_id = u&'Conveni\00eancia';
{code}

Response:
{code:java}
FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or 
column reference 'u': (
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
 

So, is the output returned by SqlDialect containing `u&'` valid wrt to Presto? 
Am I missing something here? 

> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6040) The operand type inference of SqlMapValueConstructor is incorrect

2023-10-16 Thread Ran Tao (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775663#comment-17775663
 ] 

Ran Tao commented on CALCITE-6040:
--

I have set operandTypeInference to 'null' to fix this case.  because MAP allows 
null and no need to deduce null type.
The SqlMapQueryConstructor has set to 'null' either.

> The operand type inference of SqlMapValueConstructor is incorrect
> -
>
> Key: CALCITE-6040
> URL: https://issues.apache.org/jira/browse/CALCITE-6040
> Project: Calcite
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.35.0
>Reporter: Ran Tao
>Assignee: Ran Tao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> we have a simple test case:
> {code:java}
> f.checkScalar("map[1, null]", "{1=null}",
> "(INTEGER NOT NULL, NULL) MAP NOT NULL"); {code}
> The result is:
> {noformat}
> java.lang.AssertionError: Query: values (map[1, null])
> Expected: is "(INTEGER NOT NULL, NULL) MAP NOT NULL"
>      but: was "(INTEGER NOT NULL, INTEGER) MAP NOT NULL"
> {noformat}
> however, the asserted actual result "(INTEGER NOT NULL, INTEGER) MAP NOT 
> NULL" for this case is wrong. If we switch to  this asserted actual result it 
> throws another exception:
> {noformat}
> java.lang.AssertionError: Query: select map[p0, null] from (values (1)) as 
> t(p0)
> Expected: is "(INTEGER NOT NULL, INTEGER) MAP NOT NULL"
>      but: was "(INTEGER NOT NULL, NULL) MAP NOT NULL"
> {noformat}
> No matter how you write this result type in this test case, it is wrong. 
> by checking the plan, it seems the deduced value type of NULL has converted 
> to INTEGER.
> More serious scenario, if it is `map[1, 'x', 2, null]`, an exception will be 
> thrown directly and fail.
> because the null converted to FIRST_KNOWN INTEGER(however it should keep 
> NULL, then leaseRestrictive type will be char). 
> the form such as `map[1, null, 2,'x']` has same problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread LakeShen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775659#comment-17775659
 ] 

LakeShen commented on CALCITE-6051:
---

The 'u&'  tells the database that string constants with unicode escapes,it is 
usually used in SQL statements.

More details could see `PG String Constants With Unicode Escapes`: 
[https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE]

Because 'u&' appears in the SqlDialect, which transforms the SqlNode to Sql,  
so I think that it is correct.

> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CALCITE-6014) Create a SqlOperatorFixture that parses, unparses, and then parses again before executing

2023-10-16 Thread Ruben Q L (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruben Q L resolved CALCITE-6014.

Resolution: Fixed

Fixed via 
[{{5151168}}|https://github.com/apache/calcite/commit/5151168e9a9035595939c2ae0f21a06984229209]
 

Thanks [~mbudiu] for your contribution!

> Create a SqlOperatorFixture that parses, unparses, and then parses again 
> before executing
> -
>
> Key: CALCITE-6014
> URL: https://issues.apache.org/jira/browse/CALCITE-6014
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> Such a fixture will help catch bugs in the unparsing code.
> Several bugs were found using this technique, e.g., CALCITE-5997.
> This is related to CALCITE-5891, CALCITE-6000.
> The SqlParserFixture UnparsingTesterImpl provides a similar service, but 
> since it does not validate the code after unparsing, it will catch fewer bugs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread Shivangi (Jira)
Shivangi created CALCITE-6051:
-

 Summary: Incorrect format for unicode strings 
 Key: CALCITE-6051
 URL: https://issues.apache.org/jira/browse/CALCITE-6051
 Project: Calcite
  Issue Type: Bug
Reporter: Shivangi


Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}

Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5607) Serialize return type during RelJson.toJson(RexNode node) serialization

2023-10-16 Thread Oliver Lee (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775551#comment-17775551
 ] 

Oliver Lee commented on CALCITE-5607:
-

Hey [~julianhyde] ,

I finally got around to following up on this.

I see that in my change that CAST was actually [handled 
separately|https://github.com/apache/calcite/pull/3129/files#diff-673904825afdbc42629c1eeb5abc0d713687722fcdead867cdbe460ebddc1e9cL563]
 by adding in the "type" to the JSON serialization. 

My change took that part out of the switch-statement and made it happen for all 
{{{}RexCall{}}}s.

 

Now that I think about it, I should keep the switch-statement for the CAST and 
add in another clause specifically for {{SqlKind.MINUS}} , to not abandon 
deriving the type from the arguments for the rest of RexCalls.

If you could give me confirmation, I can go ahead and update the PR. 

> Serialize return type during RelJson.toJson(RexNode node) serialization 
> 
>
> Key: CALCITE-5607
> URL: https://issues.apache.org/jira/browse/CALCITE-5607
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Oliver Lee
>Assignee: Oliver Lee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We found a bug in {{RelJson#toRex}} for the {{TIMESTAMP_DIFF}} call for Big 
> Query dialect.
> {{TIMESTAMP_DIFF}} is translated to the {{MINUS_DATE}} 
> [operator|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/sql2rel/StandardConvertletTable.java#L2113-L2116]
>  with a return type explicitly declared as the interval.
> {{MINUS_DATE}} uses an 
> {{[ARG2_NULLABLE|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/sql/type/ReturnTypes.java#L241]}}
>  return type inference which requires 3 operands. This is fine in most cases 
> where the RexCall is then used to generate SQL or for native implementations.
> However, in {{{}RelJson#toRex{}}}, when it tries to reconstruct the entire 
> call to a RexNode, it attempts to derive the return type of the 
> {{MINUS_DATE}} operator using the {{ARG2_NULLABLE}} inference. This throws an 
> error as there are only 2 operands given to the {{MINUS_DATE}} operator.
> We'd like to now add in the "type" when serializing the JSON so that 
> {{[jsonType|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/rel/externalize/RelJson.java#L712]}}
>  will be defined in {{{}toRex{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5607) Serialize return type during RelJson.toJson(RexNode node) serialization

2023-10-16 Thread Oliver Lee (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775551#comment-17775551
 ] 

Oliver Lee edited comment on CALCITE-5607 at 10/16/23 6:35 AM:
---

Hey [~julianhyde] ,

I finally got around to following up on this.

I see that prior to my change that CAST was actually [handled 
separately|https://github.com/apache/calcite/pull/3129/files#diff-673904825afdbc42629c1eeb5abc0d713687722fcdead867cdbe460ebddc1e9cL563]
 by adding in the "type" to the JSON serialization. 

My change took that part out of the switch-statement and made it happen for all 
{{{}RexCall{}}}s.

 

Now that I think about it, I should keep the switch-statement for the CAST and 
add in another clause specifically for {{SqlKind.MINUS}} , to not abandon 
deriving the type from the arguments for the rest of RexCalls.

If you could give me confirmation, I can go ahead and update the PR. 


was (Author: JIRAUSER297744):
Hey [~julianhyde] ,

I finally got around to following up on this.

I see that in my change that CAST was actually [handled 
separately|https://github.com/apache/calcite/pull/3129/files#diff-673904825afdbc42629c1eeb5abc0d713687722fcdead867cdbe460ebddc1e9cL563]
 by adding in the "type" to the JSON serialization. 

My change took that part out of the switch-statement and made it happen for all 
{{{}RexCall{}}}s.

 

Now that I think about it, I should keep the switch-statement for the CAST and 
add in another clause specifically for {{SqlKind.MINUS}} , to not abandon 
deriving the type from the arguments for the rest of RexCalls.

If you could give me confirmation, I can go ahead and update the PR. 

> Serialize return type during RelJson.toJson(RexNode node) serialization 
> 
>
> Key: CALCITE-5607
> URL: https://issues.apache.org/jira/browse/CALCITE-5607
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Oliver Lee
>Assignee: Oliver Lee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We found a bug in {{RelJson#toRex}} for the {{TIMESTAMP_DIFF}} call for Big 
> Query dialect.
> {{TIMESTAMP_DIFF}} is translated to the {{MINUS_DATE}} 
> [operator|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/sql2rel/StandardConvertletTable.java#L2113-L2116]
>  with a return type explicitly declared as the interval.
> {{MINUS_DATE}} uses an 
> {{[ARG2_NULLABLE|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/sql/type/ReturnTypes.java#L241]}}
>  return type inference which requires 3 operands. This is fine in most cases 
> where the RexCall is then used to generate SQL or for native implementations.
> However, in {{{}RelJson#toRex{}}}, when it tries to reconstruct the entire 
> call to a RexNode, it attempts to derive the return type of the 
> {{MINUS_DATE}} operator using the {{ARG2_NULLABLE}} inference. This throws an 
> error as there are only 2 operands given to the {{MINUS_DATE}} operator.
> We'd like to now add in the "type" when serializing the JSON so that 
> {{[jsonType|https://github.com/apache/calcite/blob/c28d1dcbc34e748b7bea9712ef6bcf43793a91e8/core/src/main/java/org/apache/calcite/rel/externalize/RelJson.java#L712]}}
>  will be defined in {{{}toRex{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)