[jira] [Comment Edited] (CALCITE-5995) add cache to dejsonize function in JsonFunctions
[ https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766397#comment-17766397 ] xiaogang zhou edited comment on CALCITE-5995 at 9/18/23 2:13 PM: - jsonApiCommonSyntaxWithCache(String input, String pathSpec) is used by 5 functions JSON_EXISTS JSON_VALUE JSON_QUERY can be called multiple times in one query, so enabled cache for these three functions. And can I get some docs on how to set up IDE for calcite coding styles?[~julianhyde] was (Author: zhoujira86): jsonApiCommonSyntaxWithCache(String input, String pathSpec) is used by 5 functions JSON_EXISTS JSON_VALUE JSON_QUERY can be called multiple times in one query, so enabled cache for these three functions. And can I get some docs on how to set up IDE for calcite styles?[~julianhyde] > add cache to dejsonize function in JsonFunctions > > > Key: CALCITE-5995 > URL: https://issues.apache.org/jira/browse/CALCITE-5995 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: xiaogang zhou >Priority: Minor > Fix For: 1.36.0 > > > I used the json_value function to parse json values. And I found calcite's > json_value function does not cache the dejsonized objects, which could cause > some performance issue in situation below as the dejsonize function being > called repeatedly unnecessarily. > > {code:java} > select > json_value(A, 'xxx'), > json_value(A, 'yyy'), > json_value(A, 'zzz'),... > from some_table; > {code} > > > As project like flink uses the json_value to codegen it's own json_value > function, I think this could cause a bad performance for users. So I suggest > to introduce a cache in > > org.apache.calcite.runtime.JsonFunctions#dejsonize > > and the solution is very common in projects like hive > [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java] > > and of course, this feature can be turned on only some certain config is > setted. And if this is acceptable, I think I can take the ticket. thx > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-5995) add cache to dejsonize function in JsonFunctions
[ https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763991#comment-17763991 ] xiaogang zhou edited comment on CALCITE-5995 at 9/12/23 3:37 AM: - [~julianhyde] yes, I think this is very similar to https://issues.apache.org/jira/browse/CALCITE-5914 I don't understand how to convert the expression to constant, as the second input which stand for various json field is different and A is different in every data row. I think the expression need to be calculated at runtime. Please correct me if I am wrong And I tried a few alternatives to solve this issue like: # extract the dejsonized object in the generated code projection operator (performance is not ideal as there are a lot of convertion for flink string) # convert multiple json_value field to table function using a optimization rule (too complicate to traverse all the call , filter parts, and no significant improvement compared to cache solution) if anybody is interested, I can attach some evidence. But in brief it turned out that using cache is the most economic solution. was (Author: zhoujira86): [~julianhyde] yes, I think this is very similar to https://issues.apache.org/jira/browse/CALCITE-5914 and I don't understand how to convert the expression to constant, as the second input which stand for various json field is different and A is different in every data row. I think the expression need to be calculated at runtime. Please correct me if I am wrong And I tried a few alternatives to solve this issue like: # extract the dejsonized object in the generated code projection operator (performance is not ideal as there are a lot of convertion for flink string) # convert multiple json_value field to table function using a optimization rule (too complicate to traverse all the call , filter parts, and no significant improvement compared to cache solution) if anybody is interested, I can attach some evidence. But in brief it turned out that using cache is the most economic solution. > add cache to dejsonize function in JsonFunctions > > > Key: CALCITE-5995 > URL: https://issues.apache.org/jira/browse/CALCITE-5995 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: xiaogang zhou >Priority: Minor > Fix For: 1.36.0 > > > I used the json_value function to parse json values. And I found calcite's > json_value function does not cache the dejsonized objects, which could cause > some performance issue in situation below as the dejsonize function being > called repeatedly unnecessarily. > > {code:java} > select > json_value(A, 'xxx'), > json_value(A, 'yyy'), > json_value(A, 'zzz'),... > from some_table; > {code} > > > As project like flink uses the json_value to codegen it's own json_value > function, I think this could cause a bad performance for users. So I suggest > to introduce a cache in > > org.apache.calcite.runtime.JsonFunctions#dejsonize > > and the solution is very common in projects like hive > [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java] > > and of course, this feature can be turned on only some certain config is > setted. And if this is acceptable, I think I can take the ticket. thx > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-5995) add cache to dejsonize function in JsonFunctions
[ https://issues.apache.org/jira/browse/CALCITE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763991#comment-17763991 ] xiaogang zhou edited comment on CALCITE-5995 at 9/12/23 2:55 AM: - [~julianhyde] yes, I think this is very similar to https://issues.apache.org/jira/browse/CALCITE-5914 and I don't understand how to convert the expression to constant, as the second input which stand for various json field is different and A is different in every data row. I think the expression need to be calculated at runtime. Please correct me if I am wrong And I tried a few alternatives to solve this issue like: # extract the dejsonized object in the generated code projection operator (performance is not ideal as there are a lot of convertion for flink string) # convert multiple json_value field to table function using a optimization rule (too complicate to traverse all the call , filter parts, and no significant improvement compared to cache solution) if anybody is interested, I can attach some evidence. But in brief it turned out that using cache is the most economic solution. was (Author: zhoujira86): [~julianhyde] yes, I think this is very similar to https://issues.apache.org/jira/browse/CALCITE-5914 and I don't understand how to convert the expression to constant, as the second input which stand for various json field is different and A is different in every data row. I think the expression need to be calculated at runtime. Please correct me if I am wrong And I tried a few alternatives to solve this issue like: # extract the dejsonized object in the generated code projection operator (performance is not ideal as there are a lot of convertion for flink string) # convert multiple json_value field to table function using a optimization rule (too complicate to traverse all the call , filter parts, and no significant improvement compared to cache solution) if anybody is interested, I can attach some evidence. But it turned out that using cache is the most economic solution. > add cache to dejsonize function in JsonFunctions > > > Key: CALCITE-5995 > URL: https://issues.apache.org/jira/browse/CALCITE-5995 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.35.0 >Reporter: xiaogang zhou >Priority: Minor > Fix For: 1.36.0 > > > I used the json_value function to parse json values. And I found calcite's > json_value function does not cache the dejsonized objects, which could cause > some performance issue in situation below as the dejsonize function being > called repeatedly unnecessarily. > > {code:java} > select > json_value(A, 'xxx'), > json_value(A, 'yyy'), > json_value(A, 'zzz'),... > from some_table; > {code} > > > As project like flink uses the json_value to codegen it's own json_value > function, I think this could cause a bad performance for users. So I suggest > to introduce a cache in > > org.apache.calcite.runtime.JsonFunctions#dejsonize > > and the solution is very common in projects like hive > [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java] > > and of course, this feature can be turned on only some certain config is > setted. And if this is acceptable, I think I can take the ticket. thx > -- This message was sent by Atlassian Jira (v8.20.10#820010)