Hi,

It seems that the performance problem of parsing SQL lies in logical plan 
generation.

I tested insert statements, such as "insert into root.w1.l1.s1 (timestamp, d1, 
d2, ... , d3000) values (1000, 123.123, ..., 123.123)".

It shows that ANTLR v3 performs better during logical plan generation. For 
ANTLR_v3, "get tree" means parsing the SQL to AST tree, 
while in ANTLR_v4, it is a ParseTree. Actually, ANTLR_v4 worksfaster in this 
period. However, when analysing the tree to generate
the logical plan, the efficiency of ANTLR_v4 drops dramatically with the 
increment of column/point number, which negatively 
affects the overall performance. 
__________________________________________________________________
                ANTLR_v3                        ANTLR_v4
#column  ---------------------------------------------------------
        get tree        get plan        get tree        get plan
------------------------------------------------------------------
30      16                3             14              4
300     56                8             24              47
3000    393               91            228             4305
------------------------------------------------------------------
(Unit: ms. Each statement is repeated 100 times.)

Best,
-------------------
Yuyuan KANG

> -----原始邮件-----
> 发件人: "Xiangdong Huang" <[email protected]>
> 发送时间: 2019-09-17 07:43:34 (星期二)
> 收件人: [email protected]
> 抄送: 
> 主题: Re: Re: [jira] [Created] (IOTDB-201) Query parsing runs slower when using 
> ANTLR v4
> 
> Hi,
> 
> > However, If prefixPath is not a leaf node, a StringBuilder will be
> created instead of reference access.
> 
> In your example, prefixPath is a leaf node, is that right?
> 
> Maybe it is the incorrect of the API call that lead to the bad performance.
> Can we do some unit tests? e.g. just implement 1 ~ 2 grammars using both
> Antlr3 and 4 and test the performance?
> 
> By the way, I noticed that Calcite uses JavaCC...
> 
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  黄向东
> 清华大学 软件学院
> 
> 
> 康愈圆 <[email protected]> 于2019年9月9日周一 上午11:43写道:
> 
> > Hi,
> >
> > Yes, antlr3.g file have the same detailed definition.However, ANTLR v3
> > allows users to explicitly define the structure of the tree.
> >
> > For example,
> >
> > setStorageGroup
> >   : KW_SET KW_STORAGE KW_GROUP KW_TO prefixPath
> >   -> ^(TOK_SET ^(TOK_STORAGEGROUP prefixPath))
> >   ;
> >
> > the structure of the tree is like:
> >
> >             'SET'
> >               |
> >         'STORAGEGROUP'
> >               |
> >          prefixPath
> >
> > The prefixPath is another tree. Users can recursively analyse the AST node
> > by function like analyze(prefixPath). Data are accessed by reference.
> >
> > However, in ANTLR v4, the '->' operator is omitted.So the statement of
> > setting storage group is defined as
> >
> > setStorageGroup
> >   : KW_SET KW_STORAGE KW_GROUP KW_TO prefixPath
> >
> > If we need to get the string info of prefixPath, we can use
> > prefixPath.getText(), which is actually more clear and direct for
> > developers. However, If
> > prefixPath is not a leaf node, a StringBuilder will be created instead of
> > reference access. Although operations on StringBuilder is faster than on
> > String,
> > creating StringBuilder too frequenly is a heavy overhead, which impairs
> > the benefits and even reduce the overall performance.
> >
> > Currently, I think this is what leads to the problem.
> >
> > Best,
> > ---------------------
> > Yuyuan KANG
> >
> >
> >
> > > -----原始邮件-----
> > > 发件人: "Xiangdong Huang" <[email protected]>
> > > 发送时间: 2019-09-09 00:08:00 (星期一)
> > > 收件人: [email protected]
> > > 抄送:
> > > 主题: Re: [jira] [Created] (IOTDB-201) Query parsing runs slower when
> > using ANTLR v4
> > >
> > > Hi,
> > >
> > > > There are some grammar definitions that are too detailed, such as
> > decimal
> > > numbers, which are categorized into many types. I think making the rules
> > > more general may decrease the times of calling getText() method.
> > >
> > > One question, does the antlr3.g file have the same detailed definition,
> > > e.g., the decimal numbers?
> > >
> > > Best,
> > >
> > > -----------------------------------
> > > Xiangdong Huang
> > > School of Software, Tsinghua University
> > >
> > >  黄向东
> > > 清华大学 软件学院
> > >
> > >
> > > 康愈圆 <[email protected]> 于2019年9月5日周四 下午11:11写道:
> > >
> > > > Hi,
> > > >
> > > > I've been working on JIRA issue [IOTDB-190 switch to ANTLR v4] these
> > days.
> > > >
> > > > I implemented the SQL parsing module. However, it seems that the
> > parsing
> > > > efficiency reduces a lot when using ANTLR v4.
> > > >
> > > > It turns out that RuleContext.getText() is frequently called, which
> > takes
> > > > more than 90% of the CPU time.
> > > >
> > > > The grammer definition (.g4 file) here is a continuation of previous
> > > > version (ANTLR v3). There are some grammar definitions that are too
> > > > detailed, such as decimal numbers, which are categorized into many
> > types. I
> > > > think making the rules more general may decrease the times of calling
> > > > getText() method.
> > > >
> > > > I plan to reconstruct the grammer definition to improve the parsing
> > > > efficiency.
> > > >
> > > > ----
> > > > Yuyuan KANG
> > > >
> > > > 在2019-09-06 13:30:00,Yuyuan KANG (Jira)<[email protected]>写道:
> > > > > Yuyuan KANG created IOTDB-201:
> > > > > ---------------------------------
> > > > >
> > > > >              Summary: Query parsing runs slower when using ANTLR v4
> > > > >                  Key: IOTDB-201
> > > > >                  URL:
> > https://issues.apache.org/jira/browse/IOTDB-201
> > > > >              Project: Apache IoTDB
> > > > >           Issue Type: Improvement
> > > > >             Reporter: Yuyuan KANG
> > > > >
> > > > >
> > > > > The system now uses ANTLR v3. When transformed to ANTLR v4 using
> > > > previous grammar definition, experiment result shows that the
> > efficiency of
> > > > logical plan generation is negatively impacted.
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > This message was sent by Atlassian Jira
> > > > > (v8.3.2#803003)
> > > >
> > > >
> >

Reply via email to