Hi, Since ANTLR v4 sacrifices its performance to give way to readability and maintainability, for the time being, I plan to continue using ANTLR v3 for parsing. I improved the previous version of ANTLR v3 by reorganizing grammar definition. The pull request number is 440.
I compared the old version (before full-digit path support), version1 (after full-digit path support, current version), and version2 (reorganized version). I first tested more than 50 statements, including inserting, updating, altering user, etc. It shows that the performance of most statements are improved. The statement execution time includes AST tree construction and logical plan generation. Focusing on insert statements, which will be widely used in real world,I tested insert statement on 100 columns, 500 columns and 1000 columns. It shows that version2 performs better than version1. The result is shown below. I will continue working on it. Best, --------------------------- Yuyuan KANG > -----原始邮件----- > 发件人: "康愈圆" <[email protected]> > 发送时间: 2019-09-20 16:35:30 (星期五) > 收件人: [email protected] > 抄送: > 主题: Re: Re: Re: [jira] [Created] (IOTDB-201) Query parsing runs slower when > using ANTLR v4 > > Hi, > > It seems that the performance problem of parsing SQL lies in logical plan > generation. > > I tested insert statements, such as "insert into root.w1.l1.s1 (timestamp, > d1, d2, ... , d3000) values (1000, 123.123, ..., 123.123)". > > It shows that ANTLR v3 performs better during logical plan generation. For > ANTLR_v3, "get tree" means parsing the SQL to AST tree, > while in ANTLR_v4, it is a ParseTree. Actually, ANTLR_v4 worksfaster in this > period. However, when analysing the tree to generate > the logical plan, the efficiency of ANTLR_v4 drops dramatically with the > increment of column/point number, which negatively > affects the overall performance. > __________________________________________________________________ > ANTLR_v3 ANTLR_v4 > #column --------------------------------------------------------- > get tree get plan get tree get plan > ------------------------------------------------------------------ > 30 16 3 14 4 > 300 56 8 24 47 > 3000 393 91 228 4305 > ------------------------------------------------------------------ > (Unit: ms. Each statement is repeated 100 times.) > > Best, > ------------------- > Yuyuan KANG > > > -----原始邮件----- > > 发件人: "Xiangdong Huang" <[email protected]> > > 发送时间: 2019-09-17 07:43:34 (星期二) > > 收件人: [email protected] > > 抄送: > > 主题: Re: Re: [jira] [Created] (IOTDB-201) Query parsing runs slower when > > using ANTLR v4 > > > > Hi, > > > > > However, If prefixPath is not a leaf node, a StringBuilder will be > > created instead of reference access. > > > > In your example, prefixPath is a leaf node, is that right? > > > > Maybe it is the incorrect of the API call that lead to the bad performance. > > Can we do some unit tests? e.g. just implement 1 ~ 2 grammars using both > > Antlr3 and 4 and test the performance? > > > > By the way, I noticed that Calcite uses JavaCC... > > > > Best, > > ----------------------------------- > > Xiangdong Huang > > School of Software, Tsinghua University > > > > 黄向东 > > 清华大学 软件学院 > > > > > > 康愈圆 <[email protected]> 于2019年9月9日周一 上午11:43写道: > > > > > Hi, > > > > > > Yes, antlr3.g file have the same detailed definition.However, ANTLR v3 > > > allows users to explicitly define the structure of the tree. > > > > > > For example, > > > > > > setStorageGroup > > > : KW_SET KW_STORAGE KW_GROUP KW_TO prefixPath > > > -> ^(TOK_SET ^(TOK_STORAGEGROUP prefixPath)) > > > ; > > > > > > the structure of the tree is like: > > > > > > 'SET' > > > | > > > 'STORAGEGROUP' > > > | > > > prefixPath > > > > > > The prefixPath is another tree. Users can recursively analyse the AST node > > > by function like analyze(prefixPath). Data are accessed by reference. > > > > > > However, in ANTLR v4, the '->' operator is omitted.So the statement of > > > setting storage group is defined as > > > > > > setStorageGroup > > > : KW_SET KW_STORAGE KW_GROUP KW_TO prefixPath > > > > > > If we need to get the string info of prefixPath, we can use > > > prefixPath.getText(), which is actually more clear and direct for > > > developers. However, If > > > prefixPath is not a leaf node, a StringBuilder will be created instead of > > > reference access. Although operations on StringBuilder is faster than on > > > String, > > > creating StringBuilder too frequenly is a heavy overhead, which impairs > > > the benefits and even reduce the overall performance. > > > > > > Currently, I think this is what leads to the problem. > > > > > > Best, > > > --------------------- > > > Yuyuan KANG > > > > > > > > > > > > > -----原始邮件----- > > > > 发件人: "Xiangdong Huang" <[email protected]> > > > > 发送时间: 2019-09-09 00:08:00 (星期一) > > > > 收件人: [email protected] > > > > 抄送: > > > > 主题: Re: [jira] [Created] (IOTDB-201) Query parsing runs slower when > > > using ANTLR v4 > > > > > > > > Hi, > > > > > > > > > There are some grammar definitions that are too detailed, such as > > > decimal > > > > numbers, which are categorized into many types. I think making the rules > > > > more general may decrease the times of calling getText() method. > > > > > > > > One question, does the antlr3.g file have the same detailed definition, > > > > e.g., the decimal numbers? > > > > > > > > Best, > > > > > > > > ----------------------------------- > > > > Xiangdong Huang > > > > School of Software, Tsinghua University > > > > > > > > 黄向东 > > > > 清华大学 软件学院 > > > > > > > > > > > > 康愈圆 <[email protected]> 于2019年9月5日周四 下午11:11写道: > > > > > > > > > Hi, > > > > > > > > > > I've been working on JIRA issue [IOTDB-190 switch to ANTLR v4] these > > > days. > > > > > > > > > > I implemented the SQL parsing module. However, it seems that the > > > parsing > > > > > efficiency reduces a lot when using ANTLR v4. > > > > > > > > > > It turns out that RuleContext.getText() is frequently called, which > > > takes > > > > > more than 90% of the CPU time. > > > > > > > > > > The grammer definition (.g4 file) here is a continuation of previous > > > > > version (ANTLR v3). There are some grammar definitions that are too > > > > > detailed, such as decimal numbers, which are categorized into many > > > types. I > > > > > think making the rules more general may decrease the times of calling > > > > > getText() method. > > > > > > > > > > I plan to reconstruct the grammer definition to improve the parsing > > > > > efficiency. > > > > > > > > > > ---- > > > > > Yuyuan KANG > > > > > > > > > > 在2019-09-06 13:30:00,Yuyuan KANG (Jira)<[email protected]>写道: > > > > > > Yuyuan KANG created IOTDB-201: > > > > > > --------------------------------- > > > > > > > > > > > > Summary: Query parsing runs slower when using ANTLR v4 > > > > > > Key: IOTDB-201 > > > > > > URL: > > > https://issues.apache.org/jira/browse/IOTDB-201 > > > > > > Project: Apache IoTDB > > > > > > Issue Type: Improvement > > > > > > Reporter: Yuyuan KANG > > > > > > > > > > > > > > > > > > The system now uses ANTLR v3. When transformed to ANTLR v4 using > > > > > previous grammar definition, experiment result shows that the > > > efficiency of > > > > > logical plan generation is negatively impacted. > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > This message was sent by Atlassian Jira > > > > > > (v8.3.2#803003) > > > > > > > > > > > > >
