I am working on https://issues.apache.org/jira/browse/CALCITE-5541
(upgrading JavaCC) and the Babel parser is having problems deducing
whether a keyword is reserved. Investigating this, I took a look at
the generated code, and found something interesting.
Here are the NonReservedKeyWord and NonReservedKeyWord0of3 methods in
Babel
(babel/build/javacc/javaCCMain/org/apache/calcite/sql/parser/babel/SqlBabelParserImpl.java):
final public String NonReservedKeyWord() throws ParseException {
if (jj_2_1116(2)) {
NonReservedKeyWord0of3();
} else if (jj_2_1117(2)) {
NonReservedKeyWord1of3();
} else if (jj_2_1118(2)) {
NonReservedKeyWord2of3();
} else {
jj_consume_token(-1);
throw new ParseException();
}
{if ("" != null) return unquotedIdentifier();}
throw new Error("Missing return statement in function");
}
/** @see #NonReservedKeyWord */
final public void NonReservedKeyWord0of3() throws ParseException {
if (jj_2_1119(2)) {
jj_consume_token(A);
} else if (jj_2_1120(2)) {
jj_consume_token(ACTION);
} else if (jj_2_1121(2)) {
jj_consume_token(ADMIN);
...
And here are the same methods in Core
(core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java):
final public String NonReservedKeyWord() throws ParseException {
switch ((jj_ntk==-1)?jj_ntk_f():jj_ntk) {
case A:
case ACTION:
case ADMIN:
case APPLY:
...
case YEARS:{
NonReservedKeyWord0of3();
break;
}
case ABSENT:
...
case ZONE:{
NonReservedKeyWord1of3();
break;
}
...
default:
jj_la1[436] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
{if ("" != null) return unquotedIdentifier();}
throw new Error("Missing return statement in function");
}
/** @see #NonReservedKeyWord */
final public void NonReservedKeyWord0of3() throws ParseException {
switch ((jj_ntk==-1)?jj_ntk_f():jj_ntk) {
case A:{
jj_consume_token(A);
break;
}
case ACTION:{
jj_consume_token(ACTION);
break;
}
case ADMIN:{
jj_consume_token(ADMIN);
break;
}
...
Both of the above are generated using JavaCC 7.0.13. Other parsers,
such as Server, look similar to Core. Under JavaCC 4.0, all parsers
generate a 'switch'.
In all parsers we split the reserved keywords into 3 rules (0of3,
1of3, 2of3) due to the size restrictions noted in
https://issues.apache.org/jira/browse/CALCITE-2405.
I was puzzled why one is generating a 'switch' and the other is
generating chained 'if'...'else-if's. At first I thought it was that
Babel had more keywords, but some experiments eliminated that
possibility. I also disproved the hypothesis that it is because Babel
allows extra characters in identifiers (see
https://issues.apache.org/jira/browse/CALCITE-5668). My current
hypothesis is that Babel needs to use lookahead in order to determine
whether a non-reserved keyword can be converted to an identifier.
But whatever the reason, something seems to be very different about
the Babel grammar. Given how frequently identifiers occur when parsing
SQL, I would not be surprised if the Babel parser is significantly
slower than the regular parser under JavaCC 7.0.13.
In my opinion, that is not a bug that should prevent us from upgrading
JavaCC. Especially given that JavaCC 4.0 has a performance bug that is
affecting all of our parser variants.
Julian