[jira] [Commented] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Shivangi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775717#comment-17775717
 ] 

Shivangi commented on CALCITE-6051:
---

Makes sense [~shenlang]. I've updated the jira summary and description. 

> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> The queries fail when we pass a query containing this encoding. 
> Also tested the same query you've shared on hive and spark:
> Hive:
> {code:java}
> select u&'hello world';
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible 
> column names are: ) (state=42000,code=10004)
> {code}
> Spark:
> {code:java}
> select u&'hello world';
> User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
> resolve 'u' given input columns: []; line 1 pos 7;
> {code}
> This is HiveSqlDialect: 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
> There is no overriding function in HiveSql dialect corresponding to 
> `quoteStringLiteralUnicode` method in SqlDialect.
> Corresponding SparkSqlDialect: 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
>  
> *Ask:*
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Shivangi (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivangi updated CALCITE-6051:
--
Description: 
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}

The queries fail when we pass a query containing this encoding. 
Also tested the same query you've shared on hive and spark:
Hive:
{code:java}
select u&'hello world';
Error: Error while compiling statement: FAILED: SemanticException [Error 
10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column 
names are: ) (state=42000,code=10004)
{code}

Spark:

{code:java}
select u&'hello world';
User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
resolve 'u' given input columns: []; line 1 pos 7;
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
There is no overriding function in HiveSql dialect corresponding to 
`quoteStringLiteralUnicode` method in SqlDialect.

Corresponding SparkSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
 

*Ask:*

Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 


  was:
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}

The queries fail when we pass a query containing this encoding. For example in 
hive:


{code:java}
select * from somedb.some_table where city_id = u&'Conveni\00eancia';
{code}

Response:
{code:java}
FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or 
column reference 'u': (
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
There is no overriding function in HiveSql dialect corresponding to 
`quoteStringLiteralUnicode` method in SqlDialect.

Corresponding SparkSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
 

*Ask:*

Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 



> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-1

[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Shivangi (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivangi updated CALCITE-6051:
--
Description: 
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}

The queries fail when we pass a query containing this encoding. For example in 
hive:


{code:java}
select * from somedb.some_table where city_id = u&'Conveni\00eancia';
{code}

Response:
{code:java}
FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or 
column reference 'u': (
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
There is no overriding function in HiveSql dialect corresponding to 
`quoteStringLiteralUnicode` method in SqlDialect.

Corresponding SparkSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
 

*Ask:*

Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 


  was:
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}



Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 



> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.a

[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Shivangi (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivangi updated CALCITE-6051:
--
Description: 
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}



Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 


  was:
Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}

Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 



> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6051) Incorrect translation for unicode strings in SqlDialect's quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect

2023-10-16 Thread Shivangi (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivangi updated CALCITE-6051:
--
Summary: Incorrect translation for unicode strings in SqlDialect's 
quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect  (was: 
Incorrect format for unicode strings )

> Incorrect translation for unicode strings in SqlDialect's 
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread Shivangi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775690#comment-17775690
 ] 

Shivangi edited comment on CALCITE-6051 at 10/16/23 11:00 AM:
--

Also tested the same query you've shared on hive and spark:
Hive:

{code:java}
select u&'hello world';
Error: Error while compiling statement: FAILED: SemanticException [Error 
10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column 
names are: ) (state=42000,code=10004)
{code}

Spark: 

{code:java}
select u&'hello world';
User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
resolve 'u' given input columns: []; line 1 pos 7;
{code}



was (Author: shivincible):
Also tested the same query you've shared on hive and spark:
Hive:

{code:java}
select u&'hello world';
Error: Error while compiling statement: FAILED: SemanticException [Error 
10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column 
names are: ) (state=42000,code=10004)
{code}

Spark: 

{code:java}
 select u&'hello world';
User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
resolve 'u' given input columns: []; line 1 pos 7;
{code}


> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread Shivangi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775690#comment-17775690
 ] 

Shivangi commented on CALCITE-6051:
---

Also tested the same query you've shared on hive and spark:
Hive:

{code:java}
select u&'hello world';
Error: Error while compiling statement: FAILED: SemanticException [Error 
10004]: Line 1:7 Invalid table alias or column reference 'u': (possible column 
names are: ) (state=42000,code=10004)
{code}

Spark: 

{code:java}
 select u&'hello world';
User class threw exception: org.apache.spark.sql.AnalysisException: cannot 
resolve 'u' given input columns: []; line 1 pos 7;
{code}


> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread Shivangi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775678#comment-17775678
 ] 

Shivangi edited comment on CALCITE-6051 at 10/16/23 10:40 AM:
--

Thanks for the quick response [~shenlang]! 
We are using SQLDialect for Hive and Spark. For both the cases, the queries 
fail when we pass a query containing this encoding. For example in hive:

{code:java}
select * from somedb.some_table where city_id = u&'Conveni\00eancia';
{code}

Response:
{code:java}
FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or 
column reference 'u': (
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
 
There is no overriding function in HiveSql dialect corresponding to 
`quoteStringLiteralUnicode` method in SqlDialect.  

So, is the output returned by SqlDialect containing `u&'` valid wrt to 
Postgres? Am I missing something here? 


was (Author: shivincible):
Thanks for the quick response [~shenlang]! 
We are using SQLDialect for Hive and Spark. For both the cases, the queries 
fail when we pass a query containing this encoding. For example in hive:

{code:java}
select * from somedb.some_table where city_id = u&'Conveni\00eancia';
{code}

Response:
{code:java}
FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or 
column reference 'u': (
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
 

So, is the output returned by SqlDialect containing `u&'` valid wrt to Presto? 
Am I missing something here? 

> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6051) Incorrect format for unicode strings

2023-10-16 Thread Shivangi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775678#comment-17775678
 ] 

Shivangi commented on CALCITE-6051:
---

Thanks for the quick response [~shenlang]! 
We are using SQLDialect for Hive and Spark. For both the cases, the queries 
fail when we pass a query containing this encoding. For example in hive:

{code:java}
select * from somedb.some_table where city_id = u&'Conveni\00eancia';
{code}

Response:
{code:java}
FAILED: SemanticException [Error 10004]: Line 1:43 Invalid table alias or 
column reference 'u': (
{code}

This is HiveSqlDialect: 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
 

So, is the output returned by SqlDialect containing `u&'` valid wrt to Presto? 
Am I missing something here? 

> Incorrect format for unicode strings 
> -
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
>  Issue Type: Bug
>Reporter: Shivangi
>Priority: Major
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string 
> `Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is 
> coming from 
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
>  file, `quoteStringLiteralUnicode` method:
> {code:java}
>   /**
>* Converts a string into a unicode string literal. For example,
>* can't{tab}run\ becomes u'can''t\0009run\\'.
>*/
>   public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
>   char c = val.charAt(i);
>   if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
>   } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
>   } else {
> buf.append(c);
>   }
> }
> buf.append("'");
>   }
> {code}
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
> unicode conversion that contains  `u&`, as a result, it breaks when read by 
> the client. I wanted to understand the reason why `u&` is being used and what 
> can break if we remove `&`.
> Thanks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6051) Incorrect format for unicode strings

2023-10-15 Thread Shivangi (Jira)
Shivangi created CALCITE-6051:
-

 Summary: Incorrect format for unicode strings 
 Key: CALCITE-6051
 URL: https://issues.apache.org/jira/browse/CALCITE-6051
 Project: Calcite
  Issue Type: Bug
Reporter: Shivangi


Hi,
The unicodes returned by calcite have broken formats. For example, the string 
`Conveniência` is converted into   `u&'Conveni\00eancia'`. Here `u&` is coming 
from 
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
 file, `quoteStringLiteralUnicode` method:

{code:java}
  /**
   * Converts a string into a unicode string literal. For example,
   * can't{tab}run\ becomes u'can''t\0009run\\'.
   */
  public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
  char c = val.charAt(i);
  if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
  } else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
  } else {
buf.append(c);
  }
}
buf.append("'");
  }
{code}

Why is `buf.append("u&'")` added in this method? I couldn't find relatable 
unicode conversion that contains  `u&`, as a result, it breaks when read by the 
client. I wanted to understand the reason why `u&` is being used and what can 
break if we remove `&`.

Thanks! 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)