[ 
https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaogang zhou updated CALCITE-5995:
-----------------------------------
    Description: 
I used the json_value function to parse json values. And I found calcite's 
json_value function does not cache the dejsonized objects, which could cause 
some performance issue in situation below as the dejsonize function being 
called repeatedly unnecessarily.  

 
{code:java}
select 
json_value(A, 'xxx'),
json_value(A, 'yyy'),
json_value(A, 'zzz'),...
from some_table;

{code}
 

 

As project like flink uses the json_value to codegen it's own json_value 
function, I think this could cause a bad performance for users. So I suggest to 
introduce a cache in  

 

org.apache.calcite.runtime.JsonFunctions#dejsonize

 

and the solution is very common in projects like hive

[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]

 

and of course, this feature can be turned on only some certain config is 
setted. And if this is acceptable, I think I can take the ticket. thx

 

  was:
I used the json_value function to parse json values. And I found calcite's 
json_value function does not cache the dejsonized objects, which could cause 
some performance issue in situation below. 

 
{code:java}
select 
json_value(A, 'xxx'),
json_value(A, 'yyy'),
json_value(A, 'zzz'),...
from some_table;

{code}
 

 

As project like flink uses the json_value to codegen it's own json_value 
function, I think this could cause a bad performance for users. So I suggest to 
introduce a cache in  

 

org.apache.calcite.runtime.JsonFunctions#dejsonize

 

and the solution is very common in projects like hive

[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]

 

and of course, this feature can be turned on only some certain config is 
setted. And if this is acceptable, I think I can take the ticket. thx

 


> add cache to dejsonize function in JsonFunctions
> ------------------------------------------------
>
>                 Key: CALCITE-5995
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5995
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.35.0
>            Reporter: xiaogang zhou
>            Priority: Minor
>             Fix For: 1.36.0
>
>
> I used the json_value function to parse json values. And I found calcite's 
> json_value function does not cache the dejsonized objects, which could cause 
> some performance issue in situation below as the dejsonize function being 
> called repeatedly unnecessarily.  
>  
> {code:java}
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),...
> from some_table;
> {code}
>  
>  
> As project like flink uses the json_value to codegen it's own json_value 
> function, I think this could cause a bad performance for users. So I suggest 
> to introduce a cache in  
>  
> org.apache.calcite.runtime.JsonFunctions#dejsonize
>  
> and the solution is very common in projects like hive
> [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> and of course, this feature can be turned on only some certain config is 
> setted. And if this is acceptable, I think I can take the ticket. thx
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to