[ 
https://issues.apache.org/jira/browse/KYLIN-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832355#comment-17832355
 ] 

pengfei.zhan edited comment on KYLIN-5766 at 3/30/24 2:19 AM:
--------------------------------------------------------------

h1. Design

Parses the sql with javaCC, gets the "normalized" sql, and uses that sql as the 
key. Among them, "normalization" specific form:
 * Remove general comments (already implemented in the previous sql parsing 
step)
 * Replacing any number of spaces, line feeds, tabs, returns, and page breaks 
with a single whitespace character;
 * Replace "+", "-", "*", "/", "%", "=", ">=", "<=", "! =", "<>", "||" Single 
operators are replaced with one space to the left and one space to the right;
 * Replace ( ), the parentheses, with a single space to the left and right of 
each;
 * Converting , i.e. English comma to the left and replacing it with a single 
space on the right, in the form of test ,test1 to test, test1.
 * For strings with escaped identifiers, such as `2 + 3 `, no changes will be 
made, leaving them as they are, so `2 + 3 ` and `2 + 3 ` are different sql, and 
can't hit each other's caches.

For example, these two queries are the same after transformation.
{code:sql}
-- sql1
select  user   ,
count(*) from   /*comments

comments

*/        demo  group by user

-- sql2
select   user, count(*) -- comments from  demo group       by user 
{code}

the normalized cache key is

{code:sql}
select user, count ( * ) from demo group by user 
{code}





was (Author: JIRAUSER294653):
h1. Design

Parses the sql with javaCC, gets the "normalized" sql, and uses that sql as the 
key. Among them, "normalization" specific form:
 * Remove general comments (already implemented in the previous sql parsing 
step)
 * Replacing any number of spaces, line feeds, tabs, returns, and page breaks 
with a single whitespace character;
 * Replace "+", "-", "*", "/", "%", "=", ">=", "<=", "! =", "<>", "||" Single 
operators are replaced with one space to the left and one space to the right;
 * Replace ( ), the parentheses, with a single space to the left and right of 
each;
 * Converting , i.e. English comma to the left and replacing it with a single 
space on the right, in the form of test ,test1 to test, test1.
 * For strings with escaped identifiers, such as `2 + 3 `, no changes will be 
made, leaving them as they are, so `2 + 3 ` and `2 + 3 ` are different sql, and 
can't hit each other's caches.

> Normalize query cache key
> -------------------------
>
>                 Key: KYLIN-5766
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5766
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine
>    Affects Versions: 5.0-beta
>            Reporter: pengfei.zhan
>            Assignee: pengfei.zhan
>            Priority: Major
>             Fix For: 5.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to