[jira] [Comment Edited] (CALCITE-3933) Incorrect SQL Emitted for Unicode for Several Dialects

2023-10-17 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776153#comment-17776153
 ] 

xiaogang zhou edited comment on CALCITE-3933 at 10/17/23 11:18 AM:
---

[~shenlang] [~julianhyde] Hi guys, I went through CALCITE-6001. and found the 
charset added by that PR can be used to decide how to convert the nonascii 
literals. I overwrited the quoteStringLiteral method in bigqueryDialect. 

 


was (Author: zhoujira86):
[~shenlang] [~julianhyde] I guys, I went through CALCITE-6001. and found the 
charset added by that PR can be used to decide how to convert the nonascii 
literals. I overwrited the quoteStringLiteral method in bigqueryDialect. 

 

> Incorrect SQL Emitted for Unicode for Several Dialects
> --
>
> Key: CALCITE-3933
> URL: https://issues.apache.org/jira/browse/CALCITE-3933
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.22.0
> Environment: master with latest commit on April 15 (
> dfb842e55e1fa7037c8a731341010ed1c0cfb6f7)
>Reporter: Aryeh Hillman
>Priority: Major
>  Labels: pull-request-available
>
> A string literal like "schön" should emit "schön" in SQL for many dialects, 
> but instead emits
> {code:java}
> u&'sch\\00f6n' {code}
> which is (ISO-8859-1 ASCII). 
> It's possible that some of the above dialects may support ISO-8859, but in my 
> tests with *BigQuery Standard SQL*, *MySQL*, and *Redshift* engines, the 
> following fails:
> {code:java}
> select u&'sch\\00f6n';{code}
> But this succeeds:
> {code:java}
> select 'schön'; {code}
> Test that demonstrates (add to 
> `org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java` and run from 
> there):
> {code:java}
> @Test void testBigQueryUnicode() {
>   final Function relFn = b ->
>   b.scan("EMP")
>   .filter(
>   b.call(SqlStdOperatorTable.IN, b.field("ENAME"),
>   b.literal("schön")))
>   .build();
>   final String expectedSql = "SELECT *\n" +
>   "FROM scott.EMP\n" +
>   "WHERE ENAME IN ('schön')";
>   relFn(relFn).withBigQuery().ok(expectedSql);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-3933) Incorrect SQL Emitted for Unicode for Several Dialects

2023-10-17 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776153#comment-17776153
 ] 

xiaogang zhou commented on CALCITE-3933:


[~shenlang] [~julianhyde] I guys, I went through CALCITE-6001. and found the 
charset added by that PR can be used to decide how to convert the nonascii 
literals. I overwrited the quoteStringLiteral method in bigqueryDialect. 

 

> Incorrect SQL Emitted for Unicode for Several Dialects
> --
>
> Key: CALCITE-3933
> URL: https://issues.apache.org/jira/browse/CALCITE-3933
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.22.0
> Environment: master with latest commit on April 15 (
> dfb842e55e1fa7037c8a731341010ed1c0cfb6f7)
>Reporter: Aryeh Hillman
>Priority: Major
>  Labels: pull-request-available
>
> A string literal like "schön" should emit "schön" in SQL for many dialects, 
> but instead emits
> {code:java}
> u&'sch\\00f6n' {code}
> which is (ISO-8859-1 ASCII). 
> It's possible that some of the above dialects may support ISO-8859, but in my 
> tests with *BigQuery Standard SQL*, *MySQL*, and *Redshift* engines, the 
> following fails:
> {code:java}
> select u&'sch\\00f6n';{code}
> But this succeeds:
> {code:java}
> select 'schön'; {code}
> Test that demonstrates (add to 
> `org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java` and run from 
> there):
> {code:java}
> @Test void testBigQueryUnicode() {
>   final Function relFn = b ->
>   b.scan("EMP")
>   .filter(
>   b.call(SqlStdOperatorTable.IN, b.field("ENAME"),
>   b.literal("schön")))
>   .build();
>   final String expectedSql = "SELECT *\n" +
>   "FROM scott.EMP\n" +
>   "WHERE ENAME IN ('schön')";
>   relFn(relFn).withBigQuery().ok(expectedSql);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


issues@calcite.apache.org

2023-10-16 Thread xiaogang zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaogang zhou updated CALCITE-6046:
---
Summary: SQL parser failed when parsing a comment string start with '&u'  
(was: SQL parser failed when parsing a literal start with '&u')

> SQL parser failed when parsing a comment string start with '&u'
> ---
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


issues@calcite.apache.org

2023-10-16 Thread xiaogang zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaogang zhou updated CALCITE-6046:
---
Summary: SQL parser failed when parsing a literal start with '&u'  (was: 
QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
cause the SqlLiteral)

> SQL parser failed when parsing a literal start with '&u'
> 
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6046) QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral

2023-10-16 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775755#comment-17775755
 ] 

xiaogang zhou edited comment on CALCITE-6046 at 10/16/23 1:16 PM:
--

Hi [~julianhyde] ,

The behavior I thought was wrong is when I use below code 

 
{code:java}
// code placeholder

SqlParser.Config parserConfig = getCurrentSqlParserConfig(sqlDialect);
SqlParser sqlParser = SqlParser.create(sqlContent, parserConfig);
SqlNodeList sqlNodeList = sqlParser.parseStmtList(); 

sqlParser.parse(sqlNodeList.get(0)); {code}
to parse 

 
{code:java}
// code placeholder
CREATE TABLE source (
    a BIGINT
) comment '测试test'
WITH (
  'connector' = 'test'
);   {code}
then unparse it , I get 

 

 
{code:java}
// code placeholder
CREATE TABLE `source` (
  `a` BIGINT
)
COMMENT u&'\5218\51eftest' WITH (
  'connector' = 'test'
)  {code}
which is not parsable by FLINK sql template 
{code:java}
// code placeholder
[   {
String p = SqlParserUtil.parseString(token.image);
comment = SqlLiteral.createCharString(p, getPos());
}] {code}
 

 

Since you mentioned '&u' is Standard SQL DIALECT, I think there is nothing 
wrong in CALCITE. If the statement above makes sense to you,  we can just close 
this CALCITE issue, and I will follow it in FLINK issue with FLINK TEAM.

 


was (Author: zhoujira86):
Hi [~julianhyde] ,

The behavior I thought was wrong is when I use below code 

 
{code:java}
// code placeholder

SqlParser.Config parserConfig = getCurrentSqlParserConfig(sqlDialect);
SqlParser sqlParser = SqlParser.create(sqlContent, parserConfig);
SqlNodeList sqlNodeList = sqlParser.parseStmtList(); 

sqlParser.parse(sqlNodeList.get(0)); {code}
to parse 

 
{code:java}
// code placeholder
CREATE TABLE source (
    a BIGINT
) comment '测试test'
WITH (
  'connector' = 'test'
);   {code}
then unparse it , I will get 

 

 
{code:java}
// code placeholder
CREATE TABLE `source` (
  `a` BIGINT
)
COMMENT u&'\5218\51eftest' WITH (
  'connector' = 'test'
)  {code}
which is not parsable by FLINK sql template 
{code:java}
// code placeholder
[   {
String p = SqlParserUtil.parseString(token.image);
comment = SqlLiteral.createCharString(p, getPos());
}] {code}
 

 

Since you mentioned '&u' is Standard SQL DIALECT, I think there is nothing 
wrong in CALCITE. If the statement above makes sense to you,  we can just close 
this CALCITE issue, and I will follow it in FLINK issue with FLINK TEAM.

 

> QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral
> --
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6046) QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral

2023-10-16 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775755#comment-17775755
 ] 

xiaogang zhou commented on CALCITE-6046:


Hi [~julianhyde] ,

The behavior I thought was wrong is when I use below code 

 
{code:java}
// code placeholder

SqlParser.Config parserConfig = getCurrentSqlParserConfig(sqlDialect);
SqlParser sqlParser = SqlParser.create(sqlContent, parserConfig);
SqlNodeList sqlNodeList = sqlParser.parseStmtList(); 

sqlParser.parse(sqlNodeList.get(0)); {code}
to parse 

 
{code:java}
// code placeholder
CREATE TABLE source (
    a BIGINT
) comment '测试test'
WITH (
  'connector' = 'test'
);   {code}
then unparse it , I will get 

 

 
{code:java}
// code placeholder
CREATE TABLE `source` (
  `a` BIGINT
)
COMMENT u&'\5218\51eftest' WITH (
  'connector' = 'test'
)  {code}
which is not parsable by FLINK sql template 
{code:java}
// code placeholder
[   {
String p = SqlParserUtil.parseString(token.image);
comment = SqlLiteral.createCharString(p, getPos());
}] {code}
 

 

Since you mentioned '&u' is Standard SQL DIALECT, I think there is nothing 
wrong in CALCITE. If the statement above makes sense to you,  we can just close 
this CALCITE issue, and I will follow it in FLINK issue with FLINK TEAM.

 

> QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral
> --
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6046) QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral

2023-10-15 Thread xiaogang zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaogang zhou updated CALCITE-6046:
---
Summary: QuoteStringLiteralUnicode returns unparsed string with u&' prefix, 
which will cause the SqlLiteral  (was: quoteStringLiteralUnicode returns 
unparsed string with u&' prefix, which will cause the SqlLiteral)

> QuoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral
> --
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6046) quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral

2023-10-12 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774763#comment-17774763
 ] 

xiaogang zhou commented on CALCITE-6046:


[~julianhyde] 

Although I think it should be more proper to replace the  with 
StringLiteral(), I still want to consult you on the &u. It seems not found in 

[https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical]

Literal part. May I ask which DIALECT we use this &u prefix?

> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral
> --
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6046) quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral

2023-10-11 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774311#comment-17774311
 ] 

xiaogang zhou commented on CALCITE-6046:


I understood, the problem is the parser should not parse the comment as a  

[  

and it should be parsed by 

StringLiteral() :
 

> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral
> --
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is caused by 
> {code:java}
> // code placeholder
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
>   buf.append("u&'"); {code}
> not sure if I misconfigured something. Is it possiable to remove the 
> buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6046) quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral

2023-10-11 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774304#comment-17774304
 ] 

xiaogang zhou commented on CALCITE-6046:


[~julianhyde] 

Hi, I found this problem when I used below code to split SQL statements. the 
process is SQL string -> SqlNode -> SQL String
{code:java}
// code placeholder
SqlParser.Config parserConfig = getCurrentSqlParserConfig(sqlDialect);
SqlParser sqlParser = SqlParser.create(sqlContent, parserConfig);
SqlNodeList sqlNodeList = sqlParser.parseStmtList(); 

sqlParser.parse(sqlNodeList.get(0));{code}
the Dialect/ SqlConformance is a costumed one:

[https://github.com/apache/flink/blob/master/flink-table/flink-sql-parser/src/main/java/org/apache/flink/sql/parser/validate/FlinkSqlConformance.java]

 

 

then I found below SQL
{code:java}

// code placeholder
CREATE TABLE source (
    a BIGINT
) comment '测试test'
WITH (
  'connector' = 'test'
);  {code}
transformed to
{code:java}
// code placeholder
CREATE TABLE `source` (
  `a` BIGINT
)
COMMENT u&'\5218\51eftest' WITH (
  'connector' = 'test'
)  {code}
 

and the SQL parser template is like
{code:java}
// code placeholder
SqlCreate SqlCreateTable(Span s, boolean replace, boolean isTemporary) :
{
final SqlParserPos startPos = s.pos();
boolean ifNotExists = false;
SqlIdentifier tableName;
List constraints = new ArrayList();
SqlWatermark watermark = null;
SqlNodeList columnList = SqlNodeList.EMPTY;
   SqlCharStringLiteral comment = null;
   SqlTableLike tableLike = null;
SqlNode asQuery = null;

SqlNodeList propertyList = SqlNodeList.EMPTY;
SqlNodeList partitionColumns = SqlNodeList.EMPTY;
SqlParserPos pos = startPos;
}
{


ifNotExists = IfNotExistsOpt()

tableName = CompoundIdentifier()
[
 { pos = getPos(); TableCreationContext ctx = new 
TableCreationContext();}
TableColumn(ctx)
(
 TableColumn(ctx)
)*
{
pos = pos.plus(getPos());
columnList = new SqlNodeList(ctx.columnList, pos);
constraints = ctx.constraints;
watermark = ctx.watermark;
}

]
[   {
String p = SqlParserUtil.parseString(token.image);
comment = SqlLiteral.createCharString(p, getPos());
}]
[
 
partitionColumns = ParenthesizedSimpleIdentifierList()
]
[

propertyList = TableProperties()
]
[

tableLike = SqlTableLike(getPos())
{
return new SqlCreateTableLike(startPos.plus(getPos()),
tableName,
columnList,
constraints,
propertyList,
partitionColumns,
watermark,
comment,
tableLike,
isTemporary,
ifNotExists);
}
|

asQuery = OrderedQueryOrExpr(ExprContext.ACCEPT_QUERY)
{
return new SqlCreateTableAs(startPos.plus(getPos()),
tableName,
columnList,
constraints,
propertyList,
partitionColumns,
watermark,
comment,
asQuery,
isTemporary,
ifNotExists);
}
]
{
return new SqlCreateTable(startPos.plus(getPos()),
tableName,
columnList,
constraints,
propertyList,
partitionColumns,
watermark,
comment,
isTemporary,
ifNotExists);
}
} {code}
will give a exception :

Caused by: org.apache.calcite.sql.parser.SqlParseException: Encountered 
"u&\'\\5218\\51eftest\'" at line 4, column 9.
Was expecting:
     ...

 

> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral
> --
>
> Key: CALCITE-6046
> URL: https://issues.apache.org/jira/browse/CALCITE-6046
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Major
> Fix For: 1.36.0
>
>
> quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
> cause the SqlLiteral 
>  
> for example with a SQL
>  
> {code:java}
> // code placeholder
> CREATE TABLE source (
>     a BIGINT
> ) comment '测试test'
> WITH (
>   'connector' = 'test'
> ); {code}
> with a parsed Sqlnode, the toString will create a SQL like below, which is 
> not parsable again.
>  
> {code:java}
> // code placeholder
> CREATE TABLE `source` (
>   `a` BIGINT
> )
> COMMENT u&'\5218\51eftest' WITH (
>   'connector' = 'test'
> ) {code}
> I think this is c

[jira] [Created] (CALCITE-6046) quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will cause the SqlLiteral

2023-10-11 Thread xiaogang zhou (Jira)
xiaogang zhou created CALCITE-6046:
--

 Summary: quoteStringLiteralUnicode returns unparsed string with 
u&' prefix, which will cause the SqlLiteral
 Key: CALCITE-6046
 URL: https://issues.apache.org/jira/browse/CALCITE-6046
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.35.0
Reporter: xiaogang zhou
 Fix For: 1.36.0


quoteStringLiteralUnicode returns unparsed string with u&' prefix, which will 
cause the SqlLiteral 

 

for example with a SQL

 
{code:java}
// code placeholder
CREATE TABLE source (
    a BIGINT
) comment '测试test'
WITH (
  'connector' = 'test'
); {code}
with a parsed Sqlnode, the toString will create a SQL like below, which is not 
parsable again.

 
{code:java}
// code placeholder
CREATE TABLE `source` (
  `a` BIGINT
)
COMMENT u&'\5218\51eftest' WITH (
  'connector' = 'test'
) {code}
I think this is caused by 
{code:java}
// code placeholder
public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
  buf.append("u&'"); {code}
not sure if I misconfigured something. Is it possiable to remove the 
buf.append("u&'"); ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-1776) Implement CORR and REGR_* aggregate functions

2023-10-10 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773902#comment-17773902
 ] 

xiaogang zhou commented on CALCITE-1776:


[~Sergey Nuyanzin] Hi Sergey, Not sure if I can help to solve the conflicts for 
this issue?

> Implement CORR and REGR_* aggregate functions
> -
>
> Key: CALCITE-1776
> URL: https://issues.apache.org/jira/browse/CALCITE-1776
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Dmitri Shtilman
>Assignee: Sergey Nuyanzin
>Priority: Minor
>  Labels: pull-request-available
>
> Implement correlation coefficient aggregate: CORR
> As well as the missing regression aggregates:
> REGR_SLOPE, REGR_INTERCEPT, REGR_COUNT, REGR_R2, REGR_AVGX, REGR_AVGY, 
> REGR_SXY
> For reference, REGR_SXX and REGR_SYY have been added in [CALCITE-422]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5971) Add the RelRule to rewrite the bernoulli sample as Filter

2023-09-27 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769888#comment-17769888
 ] 

xiaogang zhou commented on CALCITE-5971:


[~shenlang] Hi Master, do you have a plan to contribute this issue to flink? 
otherwise I can take it

> Add the RelRule to rewrite the bernoulli  sample as Filter
> --
>
> Key: CALCITE-5971
> URL: https://issues.apache.org/jira/browse/CALCITE-5971
> Project: Calcite
>  Issue Type: New Feature
>  Components: core
>Reporter: LakeShen
>Assignee: LakeShen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.36.0
>
>
> For the following SQL:
> {code:java}
> select deptno from "scott".dept tablesample bernoulli(50); {code}
> We could rewrite it to:
> {code:java}
> select deptno from "scott".dept where rand() < 0.5;  {code}
> The sql :
> {code:java}
> select deptno from "scott".dept tablesample bernoulli(50) REPEATABLE(10);  
> {code}
> We could rewrite it to:
> {code:java}
> select deptno from "scott".dept where rand(10) < 0.5;  {code}
> This rule only rewrite the tablesample bernoulli,and this rule is like 
> presto/trino's 
> [ImplementBernoulliSampleAsFilter|https://github.com/prestodb/presto/blob/6eef062bdd3777936fa29127e728edde86a681d4/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/ImplementBernoulliSampleAsFilter.java#L47C1-L48C35]
>  rule



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-20 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767362#comment-17767362
 ] 

xiaogang zhou commented on CALCITE-5995:


I have enabled the .editorconfig and reformatted code I have edited. So I think 
it might conform to the whole project. And I have made tests passed in 
[https://github.com/apache/calcite/pull/3432]. Do you mind reviewing it? 
[~julianhyde] 

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-18 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766397#comment-17766397
 ] 

xiaogang zhou edited comment on CALCITE-5995 at 9/18/23 2:13 PM:
-

jsonApiCommonSyntaxWithCache(String input, String pathSpec) is used by 5 
functions

 

JSON_EXISTS  JSON_VALUE JSON_QUERY can be called multiple times in one query, 
so enabled cache for these three functions. 

 

And can I get some docs on how to set up IDE for calcite coding 
styles?[~julianhyde] 


was (Author: zhoujira86):
jsonApiCommonSyntaxWithCache(String input, String pathSpec) is used by 5 
functions

 

JSON_EXISTS  JSON_VALUE JSON_QUERY can be called multiple times in one query, 
so enabled cache for these three functions. 

 

And can I get some docs on how to set up IDE for calcite styles?[~julianhyde] 

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-18 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766397#comment-17766397
 ] 

xiaogang zhou commented on CALCITE-5995:


jsonApiCommonSyntaxWithCache(String input, String pathSpec) is used by 5 
functions

 

JSON_EXISTS  JSON_VALUE JSON_QUERY can be called multiple times in one query, 
so enabled cache for these three functions. 

 

And can I get some docs on how to set up IDE for calcite styles?[~julianhyde] 

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-14 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765222#comment-17765222
 ] 

xiaogang zhou commented on CALCITE-5995:


Hi [~julianhyde] , would you please have a quick glance at 
[https://github.com/apache/calcite/compare/main...zhougit86:calcite:feature/json_cach]
 

 

if this makes sense, I go ahead and focus on the detail and optimize several 
json related functions.

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-11 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763991#comment-17763991
 ] 

xiaogang zhou edited comment on CALCITE-5995 at 9/12/23 3:37 AM:
-

[~julianhyde] yes, I think this is very similar to 
https://issues.apache.org/jira/browse/CALCITE-5914

 

I don't understand how to convert the expression to constant, as the second 
input which stand for various json field   is different and A is different in 
every data row. I think the expression need to be calculated at runtime. Please 
correct me if I am wrong

 

And I tried a few alternatives to solve this issue like:
 # extract the dejsonized object in the generated code projection operator 
(performance is not ideal as there are a lot of convertion for flink string)
 # convert multiple json_value field to table function using a optimization 
rule (too complicate to traverse all the call , filter parts, and no 
significant improvement compared to cache solution)

 

if anybody is interested, I can attach some evidence. But in brief it turned 
out that using cache is the most economic solution. 


was (Author: zhoujira86):
[~julianhyde] yes, I think this is very similar to 
https://issues.apache.org/jira/browse/CALCITE-5914

and I don't understand how to convert the expression to constant, as the second 
input which stand for various json field   is different and A is different in 
every data row. I think the expression need to be calculated at runtime. Please 
correct me if I am wrong

 

And I tried a few alternatives to solve this issue like:
 # extract the dejsonized object in the generated code projection operator 
(performance is not ideal as there are a lot of convertion for flink string)
 # convert multiple json_value field to table function using a optimization 
rule (too complicate to traverse all the call , filter parts, and no 
significant improvement compared to cache solution)

 

if anybody is interested, I can attach some evidence. But in brief it turned 
out that using cache is the most economic solution. 

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-11 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763991#comment-17763991
 ] 

xiaogang zhou edited comment on CALCITE-5995 at 9/12/23 2:55 AM:
-

[~julianhyde] yes, I think this is very similar to 
https://issues.apache.org/jira/browse/CALCITE-5914

and I don't understand how to convert the expression to constant, as the second 
input which stand for various json field   is different and A is different in 
every data row. I think the expression need to be calculated at runtime. Please 
correct me if I am wrong

 

And I tried a few alternatives to solve this issue like:
 # extract the dejsonized object in the generated code projection operator 
(performance is not ideal as there are a lot of convertion for flink string)
 # convert multiple json_value field to table function using a optimization 
rule (too complicate to traverse all the call , filter parts, and no 
significant improvement compared to cache solution)

 

if anybody is interested, I can attach some evidence. But in brief it turned 
out that using cache is the most economic solution. 


was (Author: zhoujira86):
[~julianhyde] yes, I think this is very similar to 
https://issues.apache.org/jira/browse/CALCITE-5914

and I don't understand how to convert the expression to constant, as the second 
input which stand for various json field   is different and A is different in 
every data row. I think the expression need to be calculated at runtime. Please 
correct me if I am wrong

 

And I tried a few alternatives to solve this issue like:
 # extract the dejsonized object in the generated code projection operator 
(performance is not ideal as there are a lot of convertion for flink string)
 # convert multiple json_value field to table function using a optimization 
rule (too complicate to traverse all the call , filter parts, and no 
significant improvement compared to cache solution)

 

if anybody is interested, I can attach some evidence. But it turned out that 
using cache is the most economic solution. 

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-11 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763991#comment-17763991
 ] 

xiaogang zhou commented on CALCITE-5995:


[~julianhyde] yes, I think this is very similar to 
https://issues.apache.org/jira/browse/CALCITE-5914

and I don't understand how to convert the expression to constant, as the second 
input which stand for various json field   is different and A is different in 
every data row. I think the expression need to be calculated at runtime. Please 
correct me if I am wrong

 

And I tried a few alternatives to solve this issue like:
 # extract the dejsonized object in the generated code projection operator 
(performance is not ideal as there are a lot of convertion for flink string)
 # convert multiple json_value field to table function using a optimization 
rule (too complicate to traverse all the call , filter parts, and no 
significant improvement compared to cache solution)

 

if anybody is interested, I can attach some evidence. But it turned out that 
using cache is the most economic solution. 

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-11 Thread xiaogang zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaogang zhou updated CALCITE-5995:
---
Description: 
I used the json_value function to parse json values. And I found calcite's 
json_value function does not cache the dejsonized objects, which could cause 
some performance issue in situation below as the dejsonize function being 
called repeatedly unnecessarily.  

 
{code:java}
select 
json_value(A, 'xxx'),
json_value(A, 'yyy'),
json_value(A, 'zzz'),...
from some_table;

{code}
 

 

As project like flink uses the json_value to codegen it's own json_value 
function, I think this could cause a bad performance for users. So I suggest to 
introduce a cache in  

 

org.apache.calcite.runtime.JsonFunctions#dejsonize

 

and the solution is very common in projects like hive

[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]

 

and of course, this feature can be turned on only some certain config is 
setted. And if this is acceptable, I think I can take the ticket. thx

 

  was:
I used the json_value function to parse json values. And I found calcite's 
json_value function does not cache the dejsonized objects, which could cause 
some performance issue in situation below. 

 
{code:java}
select 
json_value(A, 'xxx'),
json_value(A, 'yyy'),
json_value(A, 'zzz'),...
from some_table;

{code}
 

 

As project like flink uses the json_value to codegen it's own json_value 
function, I think this could cause a bad performance for users. So I suggest to 
introduce a cache in  

 

org.apache.calcite.runtime.JsonFunctions#dejsonize

 

and the solution is very common in projects like hive

[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]

 

and of course, this feature can be turned on only some certain config is 
setted. And if this is acceptable, I think I can take the ticket. thx

 


> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-11 Thread xiaogang zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaogang zhou updated CALCITE-5995:
---
Description: 
I used the json_value function to parse json values. And I found calcite's 
json_value function does not cache the dejsonized objects, which could cause 
some performance issue in situation below. 

 
{code:java}
select 
json_value(A, 'xxx'),
json_value(A, 'yyy'),
json_value(A, 'zzz'),...
from some_table;

{code}
 

 

As project like flink uses the json_value to codegen it's own json_value 
function, I think this could cause a bad performance for users. So I suggest to 
introduce a cache in  

 

org.apache.calcite.runtime.JsonFunctions#dejsonize

 

and the solution is very common in projects like hive

[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]

 

and of course, this feature can be turned on only some certain config is 
setted. And if this is acceptable, I think I can take the ticket. thx

 

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below. 
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-11 Thread xiaogang zhou (Jira)
xiaogang zhou created CALCITE-5995:
--

 Summary: add cache to dejsonize function in JsonFunctions
 Key: CALCITE-5995
 URL: https://issues.apache.org/jira/browse/CALCITE-5995
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.35.0
Reporter: xiaogang zhou
 Fix For: 1.36.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)