Re: Sync ECPG scanner with core
I wrote: > On 11/14/18, Tom Lane wrote: >> Alvaro Herrera writes: >>> Maybe time to compile the ecpg test cases during "make check-world"? >> >> I'm dubious that the lexer is a significant part of that, though I could >> be wrong ... > > If it were, it'd be much easier to try a Flex flag other than the > default, most compact representation, something like -Cfe. That'd be a > prerequisite for no-backtracking to matter anyway. That's easy enough > to test, so I'll volunteer to do that sometime. The preproc phase of make check in the ecpg dir only takes 150ms. Compiling and linking the test executables takes another 2s, and running the tests takes another 5s on my machine. Even without setting up and tearing down the temp instance, preproc is trivial. -John Naylor
Re: Sync ECPG scanner with core
On 11/14/18, Tom Lane wrote: > Looks like good stuff, pushed with a small additional amount of work. Thanks. >> -Does ECPG still need its own functions for mm_alloc() and >> mm_strdup(), if we have equivalents in src/common? > > The error handling isn't the same, so I don't think we can just replace > them with the src/common functions. It is, however, potentially worth > our trouble to fix things so that within pgc.l, palloc() and pstrdup() > are macros for mm_alloc/mm_strdup, thereby further reducing the diffs > to the core lexer. > > I'd be pretty tempted to also change all the hard references to > base_yylval to go through a pointer variable, so that diffs like > this go away: > > -yylval->str = pstrdup(yytext); > +base_yylval.str = mm_strdup(yytext); > > I believe that that'd result in more compact code, too --- on most > machines, references to global variables are more expensive in > code bytes than indirecting through a local pointer variable. I'll look into that as time permits. On 11/14/18, Tom Lane wrote: > Alvaro Herrera writes: >> On 2018-Nov-13, Tom Lane wrote: >>> More generally, I wonder if it'd be practical to make pgc.l backup-free >>> altogether. I'm not sure that the resulting gain in lexing speed would >>> really be of interest to anyone, > >> Maybe time to compile the ecpg test cases during "make check-world"? > > I'm dubious that the lexer is a significant part of that, though I could > be wrong ... If it were, it'd be much easier to try a Flex flag other than the default, most compact representation, something like -Cfe. That'd be a prerequisite for no-backtracking to matter anyway. That's easy enough to test, so I'll volunteer to do that sometime. -John Naylor
Re: Sync ECPG scanner with core
Alvaro Herrera writes: > On 2018-Nov-13, Tom Lane wrote: >> More generally, I wonder if it'd be practical to make pgc.l backup-free >> altogether. I'm not sure that the resulting gain in lexing speed would >> really be of interest to anyone, > Maybe time to compile the ecpg test cases during "make check-world"? I'm dubious that the lexer is a significant part of that, though I could be wrong ... regards, tom lane
Re: Sync ECPG scanner with core
On 2018-Nov-13, Tom Lane wrote: > More generally, I wonder if it'd be practical to make pgc.l backup-free > altogether. I'm not sure that the resulting gain in lexing speed would > really be of interest to anyone, Maybe time to compile the ecpg test cases during "make check-world"? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Sync ECPG scanner with core
John Naylor writes: > Commit 21c8ee7946 was one of several that sync'd the backend scanner > with the psql scanner, with the commit message stating "Someday it'd > be nice to make ecpg's pgc.l more easily diff'able too, but today is > not that day." Attached is a set of patches to make progress towards > that. The ECPG tests pass. There's probably more that could be done, > but this seems like a good base to start from. Looks like good stuff, pushed with a small additional amount of work. > -base_yyerror() is used once, but every other call site uses > mmerror(), so use that instead. Yeah, I think this is a bug, though minor. The other places that deal with unexpected EOF use mmfatal(PARSE_ERROR, ...) though, so I did it like that. > In the course of doing this, I noticed a few other possible ECPG > scanner cleanups to consider, although the payoff might not justify > the work involved: > -Copy process_integer_literal() from the core scanner (maybe only if > decimal_fail is also brought over). I went ahead and did both parts of that, mainly because as things stood the ecpg lexer would produce a different token sequence for "1..10" than the core would. I don't think this really matters right at the moment, but if for instance ecpg decided to insert spaces where it thought the token boundaries were, it would matter. > -Match core's handling of unicode escapes, even though pgc.l is not > backup-free. Yeah, the amount of remaining diffs due to the omission of the anti-backup rules is a bit annoying and confusing. More generally, I wonder if it'd be practical to make pgc.l backup-free altogether. I'm not sure that the resulting gain in lexing speed would really be of interest to anyone, but just in terms of using uniform lexer design rules, it might be worthwhile. I tried a run with -b just to see, and indeed there are a bunch of backup cases in the ECPG-specific rules, so it'd take some work. > -Does ECPG still need its own functions for mm_alloc() and > mm_strdup(), if we have equivalents in src/common? The error handling isn't the same, so I don't think we can just replace them with the src/common functions. It is, however, potentially worth our trouble to fix things so that within pgc.l, palloc() and pstrdup() are macros for mm_alloc/mm_strdup, thereby further reducing the diffs to the core lexer. I'd be pretty tempted to also change all the hard references to base_yylval to go through a pointer variable, so that diffs like this go away: -yylval->str = pstrdup(yytext); +base_yylval.str = mm_strdup(yytext); I believe that that'd result in more compact code, too --- on most machines, references to global variables are more expensive in code bytes than indirecting through a local pointer variable. > -Use reentrant APIs? Hm. There's no functional value in that for ecpg, but maybe worth doing anyway to get rid of diffs like -addlit(yytext, yyleng, yyscanner); +addlit(yytext, yyleng); Guess it depends on how nitpicky you're feeling. In principle, reducing these remaining diffs would ease future maintenance of the lexers, but they're not something we change often. regards, tom lane
Re: Sync ECPG scanner with core
It seems the pach tester is confused by the addition of the demonstration diff file. I'm reattaching just the patchset to see if it turns green. -John Naylor From 107e3c8a0b65b0196ea4370a724c8b2a1b0fdf79 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Sun, 30 Sep 2018 12:51:41 +0700 Subject: [PATCH v1 1/4] First pass at syncing ECPG scanner with the core scanner. Adjust whitespace and formatting, clean up some comments, and move the block of whitespace rules. --- src/backend/parser/scan.l | 2 +- src/fe_utils/psqlscan.l | 2 +- src/interfaces/ecpg/preproc/pgc.l | 773 -- 3 files changed, 408 insertions(+), 369 deletions(-) diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l index 950b8b8591..a2454732a1 100644 --- a/src/backend/parser/scan.l +++ b/src/backend/parser/scan.l @@ -192,7 +192,7 @@ extern void core_yyset_column(int column_no, yyscan_t yyscanner); * XXX perhaps \f (formfeed) should be treated as a newline as well? * * XXX if you change the set of whitespace characters, fix scanner_isspace() - * to agree, and see also the plpgsql lexer. + * to agree. */ space [ \t\n\r\f] diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l index fdf49875a7..25253b54ea 100644 --- a/src/fe_utils/psqlscan.l +++ b/src/fe_utils/psqlscan.l @@ -151,7 +151,7 @@ extern void psql_yyset_column(int column_no, yyscan_t yyscanner); * XXX perhaps \f (formfeed) should be treated as a newline as well? * * XXX if you change the set of whitespace characters, fix scanner_isspace() - * to agree, and see also the plpgsql lexer. + * to agree. */ space [ \t\n\r\f] diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l index 0792118cfe..b96f17ca20 100644 --- a/src/interfaces/ecpg/preproc/pgc.l +++ b/src/interfaces/ecpg/preproc/pgc.l @@ -108,16 +108,19 @@ static struct _if_value * We use exclusive states for quoted strings, extended comments, * and to eliminate parsing troubles for numeric strings. * Exclusive states: - * bit string literal - * extended C-style comments in C - * extended C-style comments in SQL - * delimited identifiers (double-quoted identifiers) - thomas 1997-10-27 - * hexadecimal numeric string - thomas 1997-11-16 - * standard quoted strings - thomas 1997-07-30 - * standard quoted strings in C - michael - * extended quoted strings (support backslash escape sequences) - * national character quoted strings + * bit string literal + * extended C-style comments in C + * extended C-style comments in SQL + * delimited identifiers (double-quoted identifiers) + * + * hexadecimal numeric string + * standard quoted strings + * extended quoted strings (support backslash escape sequences) + * national character quoted strings + * standard quoted strings in C * $foo$ quoted strings + * + * * quoted identifier with Unicode escapes * quoted string with Unicode escapes */ @@ -138,6 +141,48 @@ static struct _if_value %x xui %x xus +/* + * In order to make the world safe for Windows and Mac clients as well as + * Unix ones, we accept either \n or \r as a newline. A DOS-style \r\n + * sequence will be seen as two successive newlines, but that doesn't cause + * any problems. SQL-style comments, which start with -- and extend to the + * next newline, are treated as equivalent to a single whitespace character. + * + * NOTE a fine point: if there is no newline following --, we will absorb + * everything to the end of the input as a comment. This is correct. Older + * versions of Postgres failed to recognize -- as a comment if the input + * did not end with a newline. + * + * XXX perhaps \f (formfeed) should be treated as a newline as well? + * + * XXX if you change the set of whitespace characters, fix ecpg_isspace() + * to agree. + */ + +space [ \t\n\r\f] +horiz_space [ \t\f] +newline [\n\r] +non_newline [^\n\r] + +comment ("--"{non_newline}*) + +whitespace ({space}+|{comment}) + +/* + * SQL requires at least one newline in the whitespace separating + * string literals that are to be concatenated. Silly, but who are we + * to argue? Note that {whitespace_with_newline} should not have * after + * it, whereas {whitespace} should generally have a * after it... + */ + +horiz_whitespace ({horiz_space}|{comment}) +whitespace_with_newline ({horiz_whitespace}*{newline}{whitespace}*) + +quote ' +quotestop {quote}{whitespace}* +quotecontinue {quote}{whitespace_with_newline}{quote} +quotefail {quote}{whitespace}*"-" + /* Bit string */ xbstart [bB]{quote} @@ -216,17 +261,17 @@ xdcinside ({xdcqq}|{xdcqdq}|{xdcother}) * The "extended comment" syntax closely resembles allowable operator syntax. * The tricky part here is to get lex to recognize a string starting with * slash-star as a comment, when interpreting it as an operator would produce - * a longer match --- remember lex will prefer a longer match! Also, if we + * a longer match
Re: Sync ECPG scanner with core
On 9/30/18, Michael Paquier wrote: > It would be nice to add this patch to the next commit fest: > https://commitfest.postgresql.org/20/ Done, thanks. -John Naylor
Re: Sync ECPG scanner with core
On Sun, Sep 30, 2018 at 04:32:53PM +0700, John Naylor wrote: > Commit 21c8ee7946 was one of several that sync'd the backend scanner > with the psql scanner, with the commit message stating "Someday it'd > be nice to make ecpg's pgc.l more easily diff'able too, but today is > not that day." Attached is a set of patches to make progress towards > that. The ECPG tests pass. There's probably more that could be done, > but this seems like a good base to start from. It would be nice to add this patch to the next commit fest: https://commitfest.postgresql.org/20/ -- Michael signature.asc Description: PGP signature