Hi,
It seems that the performance problem of parsing SQL lies in logical plan
generation.
I tested insert statements, such as "insert into root.w1.l1.s1 (timestamp, d1,
d2, ... , d3000) values (1000, 123.123, ..., 123.123)".
It shows that ANTLR v3 performs better during logical plan generation. For
ANTLR_v3, "get tree" means parsing the SQL to AST tree,
while in ANTLR_v4, it is a ParseTree. Actually, ANTLR_v4 worksfaster in this
period. However, when analysing the tree to generate
the logical plan, the efficiency of ANTLR_v4 drops dramatically with the
increment of column/point number, which negatively
affects the overall performance.
__________________________________________________________________
ANTLR_v3 ANTLR_v4
#column ---------------------------------------------------------
get tree get plan get tree get plan
------------------------------------------------------------------
30 16 3 14 4
300 56 8 24 47
3000 393 91 228 4305
------------------------------------------------------------------
(Unit: ms. Each statement is repeated 100 times.)
Best,
-------------------
Yuyuan KANG
> -----原始邮件-----
> 发件人: "Xiangdong Huang" <[email protected]>
> 发送时间: 2019-09-17 07:43:34 (星期二)
> 收件人: [email protected]
> 抄送:
> 主题: Re: Re: [jira] [Created] (IOTDB-201) Query parsing runs slower when using
> ANTLR v4
>
> Hi,
>
> > However, If prefixPath is not a leaf node, a StringBuilder will be
> created instead of reference access.
>
> In your example, prefixPath is a leaf node, is that right?
>
> Maybe it is the incorrect of the API call that lead to the bad performance.
> Can we do some unit tests? e.g. just implement 1 ~ 2 grammars using both
> Antlr3 and 4 and test the performance?
>
> By the way, I noticed that Calcite uses JavaCC...
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
> 黄向东
> 清华大学 软件学院
>
>
> 康愈圆 <[email protected]> 于2019年9月9日周一 上午11:43写道:
>
> > Hi,
> >
> > Yes, antlr3.g file have the same detailed definition.However, ANTLR v3
> > allows users to explicitly define the structure of the tree.
> >
> > For example,
> >
> > setStorageGroup
> > : KW_SET KW_STORAGE KW_GROUP KW_TO prefixPath
> > -> ^(TOK_SET ^(TOK_STORAGEGROUP prefixPath))
> > ;
> >
> > the structure of the tree is like:
> >
> > 'SET'
> > |
> > 'STORAGEGROUP'
> > |
> > prefixPath
> >
> > The prefixPath is another tree. Users can recursively analyse the AST node
> > by function like analyze(prefixPath). Data are accessed by reference.
> >
> > However, in ANTLR v4, the '->' operator is omitted.So the statement of
> > setting storage group is defined as
> >
> > setStorageGroup
> > : KW_SET KW_STORAGE KW_GROUP KW_TO prefixPath
> >
> > If we need to get the string info of prefixPath, we can use
> > prefixPath.getText(), which is actually more clear and direct for
> > developers. However, If
> > prefixPath is not a leaf node, a StringBuilder will be created instead of
> > reference access. Although operations on StringBuilder is faster than on
> > String,
> > creating StringBuilder too frequenly is a heavy overhead, which impairs
> > the benefits and even reduce the overall performance.
> >
> > Currently, I think this is what leads to the problem.
> >
> > Best,
> > ---------------------
> > Yuyuan KANG
> >
> >
> >
> > > -----原始邮件-----
> > > 发件人: "Xiangdong Huang" <[email protected]>
> > > 发送时间: 2019-09-09 00:08:00 (星期一)
> > > 收件人: [email protected]
> > > 抄送:
> > > 主题: Re: [jira] [Created] (IOTDB-201) Query parsing runs slower when
> > using ANTLR v4
> > >
> > > Hi,
> > >
> > > > There are some grammar definitions that are too detailed, such as
> > decimal
> > > numbers, which are categorized into many types. I think making the rules
> > > more general may decrease the times of calling getText() method.
> > >
> > > One question, does the antlr3.g file have the same detailed definition,
> > > e.g., the decimal numbers?
> > >
> > > Best,
> > >
> > > -----------------------------------
> > > Xiangdong Huang
> > > School of Software, Tsinghua University
> > >
> > > 黄向东
> > > 清华大学 软件学院
> > >
> > >
> > > 康愈圆 <[email protected]> 于2019年9月5日周四 下午11:11写道:
> > >
> > > > Hi,
> > > >
> > > > I've been working on JIRA issue [IOTDB-190 switch to ANTLR v4] these
> > days.
> > > >
> > > > I implemented the SQL parsing module. However, it seems that the
> > parsing
> > > > efficiency reduces a lot when using ANTLR v4.
> > > >
> > > > It turns out that RuleContext.getText() is frequently called, which
> > takes
> > > > more than 90% of the CPU time.
> > > >
> > > > The grammer definition (.g4 file) here is a continuation of previous
> > > > version (ANTLR v3). There are some grammar definitions that are too
> > > > detailed, such as decimal numbers, which are categorized into many
> > types. I
> > > > think making the rules more general may decrease the times of calling
> > > > getText() method.
> > > >
> > > > I plan to reconstruct the grammer definition to improve the parsing
> > > > efficiency.
> > > >
> > > > ----
> > > > Yuyuan KANG
> > > >
> > > > 在2019-09-06 13:30:00,Yuyuan KANG (Jira)<[email protected]>写道:
> > > > > Yuyuan KANG created IOTDB-201:
> > > > > ---------------------------------
> > > > >
> > > > > Summary: Query parsing runs slower when using ANTLR v4
> > > > > Key: IOTDB-201
> > > > > URL:
> > https://issues.apache.org/jira/browse/IOTDB-201
> > > > > Project: Apache IoTDB
> > > > > Issue Type: Improvement
> > > > > Reporter: Yuyuan KANG
> > > > >
> > > > >
> > > > > The system now uses ANTLR v3. When transformed to ANTLR v4 using
> > > > previous grammar definition, experiment result shows that the
> > efficiency of
> > > > logical plan generation is negatively impacted.
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > This message was sent by Atlassian Jira
> > > > > (v8.3.2#803003)
> > > >
> > > >
> >