[jira] [Comment Edited] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-18 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766397#comment-17766397
 ] 

xiaogang zhou edited comment on CALCITE-5995 at 9/18/23 2:13 PM:
-

jsonApiCommonSyntaxWithCache(String input, String pathSpec) is used by 5 
functions

 

JSON_EXISTS  JSON_VALUE JSON_QUERY can be called multiple times in one query, 
so enabled cache for these three functions. 

 

And can I get some docs on how to set up IDE for calcite coding 
styles?[~julianhyde] 


was (Author: zhoujira86):
jsonApiCommonSyntaxWithCache(String input, String pathSpec) is used by 5 
functions

 

JSON_EXISTS  JSON_VALUE JSON_QUERY can be called multiple times in one query, 
so enabled cache for these three functions. 

 

And can I get some docs on how to set up IDE for calcite styles?[~julianhyde] 

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-11 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763991#comment-17763991
 ] 

xiaogang zhou edited comment on CALCITE-5995 at 9/12/23 3:37 AM:
-

[~julianhyde] yes, I think this is very similar to 
https://issues.apache.org/jira/browse/CALCITE-5914

 

I don't understand how to convert the expression to constant, as the second 
input which stand for various json field   is different and A is different in 
every data row. I think the expression need to be calculated at runtime. Please 
correct me if I am wrong

 

And I tried a few alternatives to solve this issue like:
 # extract the dejsonized object in the generated code projection operator 
(performance is not ideal as there are a lot of convertion for flink string)
 # convert multiple json_value field to table function using a optimization 
rule (too complicate to traverse all the call , filter parts, and no 
significant improvement compared to cache solution)

 

if anybody is interested, I can attach some evidence. But in brief it turned 
out that using cache is the most economic solution. 


was (Author: zhoujira86):
[~julianhyde] yes, I think this is very similar to 
https://issues.apache.org/jira/browse/CALCITE-5914

and I don't understand how to convert the expression to constant, as the second 
input which stand for various json field   is different and A is different in 
every data row. I think the expression need to be calculated at runtime. Please 
correct me if I am wrong

 

And I tried a few alternatives to solve this issue like:
 # extract the dejsonized object in the generated code projection operator 
(performance is not ideal as there are a lot of convertion for flink string)
 # convert multiple json_value field to table function using a optimization 
rule (too complicate to traverse all the call , filter parts, and no 
significant improvement compared to cache solution)

 

if anybody is interested, I can attach some evidence. But in brief it turned 
out that using cache is the most economic solution. 

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5995) add cache to dejsonize function in JsonFunctions

2023-09-11 Thread xiaogang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763991#comment-17763991
 ] 

xiaogang zhou edited comment on CALCITE-5995 at 9/12/23 2:55 AM:
-

[~julianhyde] yes, I think this is very similar to 
https://issues.apache.org/jira/browse/CALCITE-5914

and I don't understand how to convert the expression to constant, as the second 
input which stand for various json field   is different and A is different in 
every data row. I think the expression need to be calculated at runtime. Please 
correct me if I am wrong

 

And I tried a few alternatives to solve this issue like:
 # extract the dejsonized object in the generated code projection operator 
(performance is not ideal as there are a lot of convertion for flink string)
 # convert multiple json_value field to table function using a optimization 
rule (too complicate to traverse all the call , filter parts, and no 
significant improvement compared to cache solution)

 

if anybody is interested, I can attach some evidence. But in brief it turned 
out that using cache is the most economic solution. 


was (Author: zhoujira86):
[~julianhyde] yes, I think this is very similar to 
https://issues.apache.org/jira/browse/CALCITE-5914

and I don't understand how to convert the expression to constant, as the second 
input which stand for various json field   is different and A is different in 
every data row. I think the expression need to be calculated at runtime. Please 
correct me if I am wrong

 

And I tried a few alternatives to solve this issue like:
 # extract the dejsonized object in the generated code projection operator 
(performance is not ideal as there are a lot of convertion for flink string)
 # convert multiple json_value field to table function using a optimization 
rule (too complicate to traverse all the call , filter parts, and no 
significant improvement compared to cache solution)

 

if anybody is interested, I can attach some evidence. But it turned out that 
using cache is the most economic solution. 

> add cache to dejsonize function in JsonFunctions
> 
>
> Key: CALCITE-5995
> URL: https://issues.apache.org/jira/browse/CALCITE-5995
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.35.0
>Reporter: xiaogang zhou
>Priority: Minor
> Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)