On 02.03.2013 17:09, Tom Lane wrote:
Greg Stark<st...@mit.edu>  writes:
Regarding yytransition I think the problem is we're using flex to
implement keyword recognition which is usually not what it's used for.
Usually people use flex to handle syntax things like quoting and
numeric formats. All identifiers are handled by flex as equivalent.
Then the last step in the scanner for identifiers is to look up the
identifier in a hash table and return the keyword token if it's a
keyword. That would massively simplify the scanner tables.

Uh ... no.  I haven't looked into why the flex tables are so large,
but this theory is just wrong.  See ScanKeywordLookup().

Interestingly, the yy_transition array generated by flex used to be much smaller:

8.3: 22072 elements
8.4: 62623 elements
master: 64535 elements

The big jump between 8.3 and 8.4 was caused by introduction of the unicode escapes: U&'foo' [UESCAPE 'x'] . And in particular, the "error rule" for the UESCAPE, which we use to avoid backtracking.

I experimented with a patch that uses two extra flex states to shorten the error rules, see attached. The idea is that after lexing a unicode literal like "U&'foo'", you enter a new state, in which you check whether an "UESCAPE 'x'" follows. This slashes the size of the array to 36581 elements.

- Heikki
*** a/src/backend/parser/scan.l
--- b/src/backend/parser/scan.l
***************
*** 162,168 ****
--- 162,170 ----
  %x xq
  %x xdolq
  %x xui
+ %x xuiend
  %x xus
+ %x xusend
  %x xeu
  
  /*
***************
*** 279,295 ****
  /* Unicode escapes */
  uescape			[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']{quote}
  /* error rule to avoid backup */
! uescapefail		("-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU])
  
  /* Quoted identifier with Unicode escapes */
  xuistart		[uU]&{dquote}
- xuistop1		{dquote}{whitespace}*{uescapefail}?
- xuistop2		{dquote}{whitespace}*{uescape}
  
  /* Quoted string with Unicode escapes */
  xusstart		[uU]&{quote}
! xusstop1		{quote}{whitespace}*{uescapefail}?
! xusstop2		{quote}{whitespace}*{uescape}
  
  /* error rule to avoid backup */
  xufailed		[uU]&
--- 281,297 ----
  /* Unicode escapes */
  uescape			[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']{quote}
  /* error rule to avoid backup */
! uescapefail		[uU][eE][sS][cC][aA][pP][eE]{whitespace}*"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU]
  
  /* Quoted identifier with Unicode escapes */
  xuistart		[uU]&{dquote}
  
  /* Quoted string with Unicode escapes */
  xusstart		[uU]&{quote}
! 
! xustop1		{uescapefail}?
! xustop2		{uescape}
! 
  
  /* error rule to avoid backup */
  xufailed		[uU]&
***************
*** 536,549 ****
  					yylval->str = litbufdup(yyscanner);
  					return SCONST;
  				}
! <xus>{xusstop1} {
! 					/* throw back all but the quote */
  					yyless(1);
  					BEGIN(INITIAL);
  					yylval->str = litbuf_udeescape('\\', yyscanner);
  					return SCONST;
  				}
! <xus>{xusstop2} {
  					BEGIN(INITIAL);
  					yylval->str = litbuf_udeescape(yytext[yyleng-2], yyscanner);
  					return SCONST;
--- 538,558 ----
  					yylval->str = litbufdup(yyscanner);
  					return SCONST;
  				}
! <xus>{quotestop} |
! <xus>{quotefail} {
  					yyless(1);
+ 					BEGIN(xusend);
+ 				}
+ <xusend>{whitespace}
+ <xusend>{other} |
+ <xusend>{xustop1} {
+ 					/* throw back everything */
+ 					yyless(0);
  					BEGIN(INITIAL);
  					yylval->str = litbuf_udeescape('\\', yyscanner);
  					return SCONST;
  				}
! <xusend>{xustop2} {
  					BEGIN(INITIAL);
  					yylval->str = litbuf_udeescape(yytext[yyleng-2], yyscanner);
  					return SCONST;
***************
*** 702,708 ****
  					yylval->str = ident;
  					return IDENT;
  				}
! <xui>{xuistop1}	{
  					char		   *ident;
  
  					BEGIN(INITIAL);
--- 711,723 ----
  					yylval->str = ident;
  					return IDENT;
  				}
! <xui>{dquote} {
! 					yyless(1);
! 					BEGIN(xuiend);
! 				}
! <xuiend>{whitespace} { }
! <xuiend>{other} |
! <xuiend>{xustop1}	{
  					char		   *ident;
  
  					BEGIN(INITIAL);
***************
*** 712,722 ****
  					if (yyextra->literallen >= NAMEDATALEN)
  						truncate_identifier(ident, yyextra->literallen, true);
  					yylval->str = ident;
! 					/* throw back all but the quote */
! 					yyless(1);
  					return IDENT;
  				}
! <xui>{xuistop2}	{
  					char		   *ident;
  
  					BEGIN(INITIAL);
--- 727,736 ----
  					if (yyextra->literallen >= NAMEDATALEN)
  						truncate_identifier(ident, yyextra->literallen, true);
  					yylval->str = ident;
! 					/* throw back everything that follows the end quote */
  					return IDENT;
  				}
! <xuiend>{xustop2}	{
  					char		   *ident;
  
  					BEGIN(INITIAL);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to