[ https://issues.apache.org/jira/browse/FLINK-32115 ]


    xiaogang zhou deleted comment on FLINK-32115:
    ---------------------------------------

was (Author: zhoujira86):
[~luoyuxia] Hi yuxia, can you please help review this?

> json_value support cache
> ------------------------
>
>                 Key: FLINK-32115
>                 URL: https://issues.apache.org/jira/browse/FLINK-32115
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>    Affects Versions: 1.16.1
>            Reporter: xiaogang zhou
>            Priority: Major
>
> +underlined 
> text+[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> hive support json object cache for previous deserialized value, could we 
> consider use a cache objects in JsonValueCallGen? 
>  
> This optimize can improve performance of SQL like
>  
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),
> ...
> a lot
>  
> I added a static LRU cache into SqlJsonUtils, and refactor the 
> jsonValueExpression1 like 
> {code:java}
> private static JsonValueContext jsonValueExpression1(String input) {
>     JsonValueContext parsedJsonContext = EXTRACT_OBJECT_CACHE.get(input);
>     if (parsedJsonContext != null) {
>         return parsedJsonContext;
>     }
>     try {
>         parsedJsonContext = JsonValueContext.withJavaObj(dejsonize(input));
>     } catch (Exception e) {
>         parsedJsonContext = JsonValueContext.withException(e);
>     }
>     EXTRACT_OBJECT_CACHE.put(input, parsedJsonContext);
>     return parsedJsonContext;
> } {code}
>  
> and benchmarked like:
> {code:java}
> public static void main(String[] args) {
> String input = 
> "{\"social\":[{\"weibo\":\"https://weibo.com/xiaoming\"},{\"github\":\"https://github.com/xiaoming\"}]}";;
> Long start = System.currentTimeMillis();
> for (int i = 0; i < 1000000; i++) {
> Object dejsonize = jsonValueExpression1(input);
> }
> System.err.println(System.currentTimeMillis() - start);
> } {code}
>  
> time 2 benchmark takes is:
> ||case||milli second taken||
> |cache|33|
> |no cache|1591|
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to