Re: [HACKERS] Speeding up the Postgres lexer
On Wed, 2005-05-25 at 11:28 +1000, Neil Conway wrote: Tom Lane wrote: Dunno. Depending on such a thing would require depending on a new flex version, and seeing that the flex guys haven't put out a new release since the badly broken 2.5.31 more than 2 years ago, I wouldn't hold my breath waiting for one we can use. It should be easy enough to check for the presence of the flag at configure-time and only use it if flex implements it. Prior to reading Simon's message I was actually thinking of hacking up the same thing :), so I agree it is probably worth doing. Neil, I'll be doing this after the beta freeze is announced. If you wanted to do this before then, I wouldn't stop you. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Speeding up the Postgres lexer
On Mon, 2005-05-23 at 22:43 -0400, Tom Lane wrote: Based on the comments so far in this thread, I'll go ahead and commit the patch, with some comments attached of course --- in particular a big head comment to run flex with -b and see that lex.backup says something to this effect. flex counts the number of times it needs to back up in a variable called num_backing_up. It tests this to see whether to generate code differently. ISTM you could have a -x switch to make it generate an error if num_backing_up 0. Would we use the -x switch if we had it? If so I'll submit a patch here first, then on to GNU. It'll take a while before it comes back round to us though. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Speeding up the Postgres lexer
Simon Riggs [EMAIL PROTECTED] writes: Would we use the -x switch if we had it? Dunno. Depending on such a thing would require depending on a new flex version, and seeing that the flex guys haven't put out a new release since the badly broken 2.5.31 more than 2 years ago, I wouldn't hold my breath waiting for one we can use. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Speeding up the Postgres lexer
Tom Lane wrote: Dunno. Depending on such a thing would require depending on a new flex version, and seeing that the flex guys haven't put out a new release since the badly broken 2.5.31 more than 2 years ago, I wouldn't hold my breath waiting for one we can use. It should be easy enough to check for the presence of the flag at configure-time and only use it if flex implements it. Prior to reading Simon's message I was actually thinking of hacking up the same thing :), so I agree it is probably worth doing. -Neil ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Speeding up the Postgres lexer
This seems fine. I don't think the lexer changes enough for us to have issues with new cases. I think adding some comments to explain why we are doing it is enough, and perhaps a test case that can be reproduced later for testing. --- Tom Lane wrote: I was doing some profiles recently that showed that on simple statements (like INSERT with a few column values) the basic parser (flex/bison) was taking up a noticeable percentage of the total CPU time. We already have done some things to speed up the lexer, like use -CF option for large-but-fast tables. I read in the flex manual about how you can make it faster yet if you can avoid the need for backup states. It's not terribly well documented, but I eventually worked out how to do it, and find that it seems to make the lexer about a third faster. Basically what you have to do is provide alternate rules that allow the lexer to match *some* rule on any prefix of a complete token. For instance the rule for {real} implies that given 123.456E+ if the next character is not a digit, then the {real} rule fails, and the lexer has to backtrack to 123.456 which it can recognize as a {decimal}. You can add a rule that does match in this situation, throws back the two unwanted characters with yyless(), and returns the number as the {decimal} rule would do. The next call will carry on lexing the E as an identifier, etc. (Note: this is going to end up being a syntax error because the SQL grammar never allows a number to immediately precede an identifier; but that's not the lexer's concern, and so I don't think that we should make the extra rule throw an error immediately.) What I'm wondering is whether this is really worth doing or not. There are currently just two parts of the lexer rules that are affected --- the {real} rule illustrated above, and the rules that allow quoted strings to be split across lines as the SQL spec requires. But the patches are still pretty ugly, and what's really annoying is that there doesn't seem to be any way to get flex to complain if someone later makes a change that breaks the no-backup-cases property again. A prototype patch (completely undocumented :-() is attached. Any thoughts about whether it's worth pressing forward with? regards, tom lane *** src/backend/parser/scan.l.origFri Mar 11 21:15:20 2005 --- src/backend/parser/scan.l Mon May 23 01:05:54 2005 *** *** 148,163 * validate the contents. */ xbstart [bB]{quote} ! xbstop {quote} xbinside[^']* xbcat {quote}{whitespace_with_newline}{quote} /* Hexadecimal number */ xhstart [xX]{quote} ! xhstop {quote} xhinside[^']* xhcat {quote}{whitespace_with_newline}{quote} /* National character */ --- 148,165 * validate the contents. */ xbstart [bB]{quote} ! xbstop {quote}{whitespace}* xbinside[^']* xbcat {quote}{whitespace_with_newline}{quote} + xbconfused {quote}{whitespace}*- /* Hexadecimal number */ xhstart [xX]{quote} ! xhstop {quote}{whitespace}* xhinside[^']* xhcat {quote}{whitespace_with_newline}{quote} + xhconfused {quote}{whitespace}*- /* National character */ *** *** 169,180 */ quote ' xqstart {quote} ! xqstop {quote} xqdouble{quote}{quote} xqinside[^\\']+ xqescape[\\][^0-7] xqoctesc[\\][0-7]{1,3} xqcat {quote}{whitespace_with_newline}{quote} /* $foo$ style quotes (dollar quoting) * The quoted string starts with $foo$ where foo is an optional string --- 171,183 */ quote ' xqstart {quote} ! xqstop {quote}{whitespace}* xqdouble{quote}{quote} xqinside[^\\']+ xqescape[\\][^0-7] xqoctesc[\\][0-7]{1,3} xqcat {quote}{whitespace_with_newline}{quote} + xqconfused {quote}{whitespace}*- /* $foo$ style quotes (dollar quoting) * The quoted string starts with $foo$ where foo is an optional string *** *** 185,190 --- 188,194 dolq_start [A-Za-z\200-\377_] dolq_cont [A-Za-z\200-\377_0-9] dolqdelim \$({dolq_start}{dolq_cont}*)?\$ + dolqfailed \${dolq_start}{dolq_cont}* dolqinside [^$]+ /* Double quote *** *** 247,253
Re: [HACKERS] Speeding up the Postgres lexer
On Mon, 2005-05-23 at 12:31 -0400, Tom Lane wrote: doesn't seem to be any way to get flex to complain if someone later makes a change that breaks the no-backup-cases property again. After some digging, there is a -b option will generate a file called lex.backup if any backup-states exist. The file is said to contain information that would help you remove backup-states. It seems straightforward to test for the existence of that file in the build process? Or perhaps add a special test state --enable-flextest to perform the test during the build. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Speeding up the Postgres lexer
Tom Lane wrote: [snip - flex is slowed down by backtracking - how to fix ] What I'm wondering is whether this is really worth doing or not. There are currently just two parts of the lexer rules that are affected --- the {real} rule illustrated above, and the rules that allow quoted strings to be split across lines as the SQL spec requires. But the patches are still pretty ugly, and what's really annoying is that there doesn't seem to be any way to get flex to complain if someone later makes a change that breaks the no-backup-cases property again. I would be more concerned if there were not reasonable alternatives to many bulk parsed inserts (COPY, prepared statement). But I do think it's worth it, even so ... not all client interfaces support prepared statements (notoriously PHP, although I understand KL has sent patches to fix that) and not all inserts are suitable for COPY. cheers andrew ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Speeding up the Postgres lexer
What I'm wondering is whether this is really worth doing or not. There are currently just two parts of the lexer rules that are affected --- the {real} rule illustrated above, and the rules that allow quoted strings to be split across lines as the SQL spec requires. But the patches are still pretty ugly, and what's really annoying is that there doesn't seem to be any way to get flex to complain if someone later makes a change that breaks the no-backup-cases property again. I was just thinking that if there's not a switch, it's prone to error again. However, the lexer isn't touched anywhere near as much as the grammar is right? So just put a large comment/warning/reminder at the top to test for non-backup states. I'm definitely in favour of a 1/3 speedup of the lexer. Chris ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Speeding up the Postgres lexer
But I do think it's worth it, even so ... not all client interfaces support prepared statements (notoriously PHP, although I understand KL has sent patches to fix that) and not all inserts are suitable for COPY. There is now pg_prepare/pg_execute/pg_query_params in PHP, however you could always have just used straight SQL PREPARE and EXECUTE commands. Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Speeding up the Postgres lexer
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2005-05-23 at 12:31 -0400, Tom Lane wrote: doesn't seem to be any way to get flex to complain if someone later makes a change that breaks the no-backup-cases property again. After some digging, there is a -b option will generate a file called lex.backup if any backup-states exist. The file is said to contain information that would help you remove backup-states. Right, reading that file is how I learned to fix it ... It seems straightforward to test for the existence of that file in the build process? Or perhaps add a special test state --enable-flextest to perform the test during the build. Well, the problem is that the success state isn't that the file doesn't exist or is empty or anything so easy as that. What you want is for it to say No backing up. and it seems a tad too ugly to code such a test into the makefiles. I'd not want to bet that the success message is spelled the same way in every flex version, or is locale-independent. Based on the comments so far in this thread, I'll go ahead and commit the patch, with some comments attached of course --- in particular a big head comment to run flex with -b and see that lex.backup says something to this effect. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Speeding up the Postgres lexer
Based on the comments so far in this thread, I'll go ahead and commit the patch, with some comments attached of course --- in particular a big head comment to run flex with -b and see that lex.backup says something to this effect. Add it to the release check-list. Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Speeding up the Postgres lexer
Christopher Kings-Lynne [EMAIL PROTECTED] writes: However, the lexer isn't touched anywhere near as much as the grammar is right? Yeah --- if you look at the CVS history, changes that affect the flex rules (and not just the text of the C-code actions) are really rare these days. If there were a lot of churn I'd think this proposal was completely impractical, but there's not much anymore. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq