Re: [HACKERS] scanner/parser minimization

Heikki Linnakangas Sat, 02 Mar 2013 10:48:32 -0800

On 02.03.2013 17:09, Tom Lane wrote:

Greg Stark<st...@mit.edu>  writes:

Regarding yytransition I think the problem is we're using flex to
implement keyword recognition which is usually not what it's used for.
Usually people use flex to handle syntax things like quoting and
numeric formats. All identifiers are handled by flex as equivalent.
Then the last step in the scanner for identifiers is to look up the
identifier in a hash table and return the keyword token if it's a
keyword. That would massively simplify the scanner tables.


Uh ... no.  I haven't looked into why the flex tables are so large,
but this theory is just wrong.  See ScanKeywordLookup().

Interestingly, the yy_transition array generated by flex used to be muchsmaller:


8.3: 22072 elements
8.4: 62623 elements
master: 64535 elements

The big jump between 8.3 and 8.4 was caused by introduction of theunicode escapes: U&'foo' [UESCAPE 'x'] . And in particular, the "errorrule" for the UESCAPE, which we use to avoid backtracking.

I experimented with a patch that uses two extra flex states to shortenthe error rules, see attached. The idea is that after lexing a unicodeliteral like "U&'foo'", you enter a new state, in which you checkwhether an "UESCAPE 'x'" follows. This slashes the size of the array to36581 elements.


- Heikki

*** a/src/backend/parser/scan.l
--- b/src/backend/parser/scan.l
***************
*** 162,168 ****
--- 162,170 ----
  %x xq
  %x xdolq
  %x xui
+ %x xuiend
  %x xus
+ %x xusend
  %x xeu
  
  /*
***************
*** 279,295 ****
  /* Unicode escapes */
  uescape			[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']{quote}
  /* error rule to avoid backup */
! uescapefail		("-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU])
  
  /* Quoted identifier with Unicode escapes */
  xuistart		[uU]&{dquote}
- xuistop1		{dquote}{whitespace}*{uescapefail}?
- xuistop2		{dquote}{whitespace}*{uescape}
  
  /* Quoted string with Unicode escapes */
  xusstart		[uU]&{quote}
! xusstop1		{quote}{whitespace}*{uescapefail}?
! xusstop2		{quote}{whitespace}*{uescape}
  
  /* error rule to avoid backup */
  xufailed		[uU]&
--- 281,297 ----
  /* Unicode escapes */
  uescape			[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']{quote}
  /* error rule to avoid backup */
! uescapefail		[uU][eE][sS][cC][aA][pP][eE]{whitespace}*"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU]
  
  /* Quoted identifier with Unicode escapes */
  xuistart		[uU]&{dquote}
  
  /* Quoted string with Unicode escapes */
  xusstart		[uU]&{quote}
! 
! xustop1		{uescapefail}?
! xustop2		{uescape}
! 
  
  /* error rule to avoid backup */
  xufailed		[uU]&
***************
*** 536,549 ****
  					yylval->str = litbufdup(yyscanner);
  					return SCONST;
  				}
! <xus>{xusstop1} {
! 					/* throw back all but the quote */
  					yyless(1);
  					BEGIN(INITIAL);
  					yylval->str = litbuf_udeescape('\\', yyscanner);
  					return SCONST;
  				}
! <xus>{xusstop2} {
  					BEGIN(INITIAL);
  					yylval->str = litbuf_udeescape(yytext[yyleng-2], yyscanner);
  					return SCONST;
--- 538,558 ----
  					yylval->str = litbufdup(yyscanner);
  					return SCONST;
  				}
! <xus>{quotestop} |
! <xus>{quotefail} {
  					yyless(1);
+ 					BEGIN(xusend);
+ 				}
+ <xusend>{whitespace}
+ <xusend>{other} |
+ <xusend>{xustop1} {
+ 					/* throw back everything */
+ 					yyless(0);
  					BEGIN(INITIAL);
  					yylval->str = litbuf_udeescape('\\', yyscanner);
  					return SCONST;
  				}
! <xusend>{xustop2} {
  					BEGIN(INITIAL);
  					yylval->str = litbuf_udeescape(yytext[yyleng-2], yyscanner);
  					return SCONST;
***************
*** 702,708 ****
  					yylval->str = ident;
  					return IDENT;
  				}
! <xui>{xuistop1}	{
  					char		   *ident;
  
  					BEGIN(INITIAL);
--- 711,723 ----
  					yylval->str = ident;
  					return IDENT;
  				}
! <xui>{dquote} {
! 					yyless(1);
! 					BEGIN(xuiend);
! 				}
! <xuiend>{whitespace} { }
! <xuiend>{other} |
! <xuiend>{xustop1}	{
  					char		   *ident;
  
  					BEGIN(INITIAL);
***************
*** 712,722 ****
  					if (yyextra->literallen >= NAMEDATALEN)
  						truncate_identifier(ident, yyextra->literallen, true);
  					yylval->str = ident;
! 					/* throw back all but the quote */
! 					yyless(1);
  					return IDENT;
  				}
! <xui>{xuistop2}	{
  					char		   *ident;
  
  					BEGIN(INITIAL);
--- 727,736 ----
  					if (yyextra->literallen >= NAMEDATALEN)
  						truncate_identifier(ident, yyextra->literallen, true);
  					yylval->str = ident;
! 					/* throw back everything that follows the end quote */
  					return IDENT;
  				}
! <xuiend>{xustop2}	{
  					char		   *ident;
  
  					BEGIN(INITIAL);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] scanner/parser minimization

Reply via email to