Re: [HACKERS] Speeding up the Postgres lexer

2005-06-06 Thread Simon Riggs
On Wed, 2005-05-25 at 11:28 +1000, Neil Conway wrote:
 Tom Lane wrote:
  Dunno.  Depending on such a thing would require depending on a new flex
  version, and seeing that the flex guys haven't put out a new release
  since the badly broken 2.5.31 more than 2 years ago, I wouldn't hold
  my breath waiting for one we can use.
 
 It should be easy enough to check for the presence of the flag at 
 configure-time and only use it if flex implements it. Prior to reading 
 Simon's message I was actually thinking of hacking up the same thing :), 
 so I agree it is probably worth doing.

Neil,

I'll be doing this after the beta freeze is announced. If you wanted to
do this before then, I wouldn't stop you.

Best Regards, Simon Riggs




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Speeding up the Postgres lexer

2005-05-24 Thread Simon Riggs
On Mon, 2005-05-23 at 22:43 -0400, Tom Lane wrote:
 Based on the comments so far in this thread, I'll go ahead and commit
 the patch, with some comments attached of course --- in particular a big
 head comment to run flex with -b and see that lex.backup says something
 to this effect.

flex counts the number of times it needs to back up in a variable called
num_backing_up. It tests this to see whether to generate code
differently. ISTM you could have a -x switch to make it generate an
error if num_backing_up  0.

Would we use the -x switch if we had it? If so I'll submit a patch here
first, then on to GNU. It'll take a while before it comes back round to
us though.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Speeding up the Postgres lexer

2005-05-24 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Would we use the -x switch if we had it?

Dunno.  Depending on such a thing would require depending on a new flex
version, and seeing that the flex guys haven't put out a new release
since the badly broken 2.5.31 more than 2 years ago, I wouldn't hold
my breath waiting for one we can use.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Speeding up the Postgres lexer

2005-05-24 Thread Neil Conway

Tom Lane wrote:

Dunno.  Depending on such a thing would require depending on a new flex
version, and seeing that the flex guys haven't put out a new release
since the badly broken 2.5.31 more than 2 years ago, I wouldn't hold
my breath waiting for one we can use.


It should be easy enough to check for the presence of the flag at 
configure-time and only use it if flex implements it. Prior to reading 
Simon's message I was actually thinking of hacking up the same thing :), 
so I agree it is probably worth doing.


-Neil

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Speeding up the Postgres lexer

2005-05-23 Thread Bruce Momjian

This seems fine.  I don't think the lexer changes enough for us to have
issues with new cases.  I think adding some comments to explain why we
are doing it is enough, and perhaps a test case that can be reproduced
later for testing.

---

Tom Lane wrote:
 I was doing some profiles recently that showed that on simple statements
 (like INSERT with a few column values) the basic parser (flex/bison) was
 taking up a noticeable percentage of the total CPU time.  We already
 have done some things to speed up the lexer, like use -CF option for
 large-but-fast tables.  I read in the flex manual about how you can make
 it faster yet if you can avoid the need for backup states.  It's not
 terribly well documented, but I eventually worked out how to do it, and
 find that it seems to make the lexer about a third faster.  Basically
 what you have to do is provide alternate rules that allow the lexer to
 match *some* rule on any prefix of a complete token.  For instance the
 rule for {real} implies that given
   123.456E+
 if the next character is not a digit, then the {real} rule fails, and
 the lexer has to backtrack to
   123.456
 which it can recognize as a {decimal}.  You can add a rule that does
 match in this situation, throws back the two unwanted characters with
 yyless(), and returns the number as the {decimal} rule would do.
 The next call will carry on lexing the E as an identifier, etc.
 (Note: this is going to end up being a syntax error because the SQL
 grammar never allows a number to immediately precede an identifier;
 but that's not the lexer's concern, and so I don't think that we should
 make the extra rule throw an error immediately.)
 
 What I'm wondering is whether this is really worth doing or not.
 There are currently just two parts of the lexer rules that are affected
 --- the {real} rule illustrated above, and the rules that allow quoted
 strings to be split across lines as the SQL spec requires.  But the
 patches are still pretty ugly, and what's really annoying is that there
 doesn't seem to be any way to get flex to complain if someone later
 makes a change that breaks the no-backup-cases property again.
 
 A prototype patch (completely undocumented :-() is attached.  Any
 thoughts about whether it's worth pressing forward with?
 
   regards, tom lane
 
 *** src/backend/parser/scan.l.origFri Mar 11 21:15:20 2005
 --- src/backend/parser/scan.l Mon May 23 01:05:54 2005
 ***
 *** 148,163 
* validate the contents.
*/
   xbstart [bB]{quote}
 ! xbstop  {quote}
   xbinside[^']*
   xbcat   {quote}{whitespace_with_newline}{quote}
   
   /* Hexadecimal number
*/
   xhstart [xX]{quote}
 ! xhstop  {quote}
   xhinside[^']*
   xhcat   {quote}{whitespace_with_newline}{quote}
   
   /* National character
*/
 --- 148,165 
* validate the contents.
*/
   xbstart [bB]{quote}
 ! xbstop  {quote}{whitespace}*
   xbinside[^']*
   xbcat   {quote}{whitespace_with_newline}{quote}
 + xbconfused  {quote}{whitespace}*-
   
   /* Hexadecimal number
*/
   xhstart [xX]{quote}
 ! xhstop  {quote}{whitespace}*
   xhinside[^']*
   xhcat   {quote}{whitespace_with_newline}{quote}
 + xhconfused  {quote}{whitespace}*-
   
   /* National character
*/
 ***
 *** 169,180 
*/
   quote   '
   xqstart {quote}
 ! xqstop  {quote}
   xqdouble{quote}{quote}
   xqinside[^\\']+
   xqescape[\\][^0-7]
   xqoctesc[\\][0-7]{1,3}
   xqcat   {quote}{whitespace_with_newline}{quote}
   
   /* $foo$ style quotes (dollar quoting)
* The quoted string starts with $foo$ where foo is an optional string
 --- 171,183 
*/
   quote   '
   xqstart {quote}
 ! xqstop  {quote}{whitespace}*
   xqdouble{quote}{quote}
   xqinside[^\\']+
   xqescape[\\][^0-7]
   xqoctesc[\\][0-7]{1,3}
   xqcat   {quote}{whitespace_with_newline}{quote}
 + xqconfused  {quote}{whitespace}*-
   
   /* $foo$ style quotes (dollar quoting)
* The quoted string starts with $foo$ where foo is an optional string
 ***
 *** 185,190 
 --- 188,194 
   dolq_start  [A-Za-z\200-\377_]
   dolq_cont   [A-Za-z\200-\377_0-9]
   dolqdelim   \$({dolq_start}{dolq_cont}*)?\$
 + dolqfailed  \${dolq_start}{dolq_cont}*
   dolqinside  [^$]+
   
   /* Double quote
 ***
 *** 247,253 
   
   

Re: [HACKERS] Speeding up the Postgres lexer

2005-05-23 Thread Simon Riggs
On Mon, 2005-05-23 at 12:31 -0400, Tom Lane wrote:
 doesn't seem to be any way to get flex to complain if someone later
 makes a change that breaks the no-backup-cases property again.

After some digging, there is a  -b option will generate a file called
lex.backup if any backup-states exist. The file is said to contain
information that would help you remove backup-states. 

It seems straightforward to test for the existence of that file in the
build process? Or perhaps add a special test state --enable-flextest to
perform the test during the build.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Speeding up the Postgres lexer

2005-05-23 Thread Andrew Dunstan



Tom Lane wrote:


[snip - flex is slowed down by backtracking  - how to fix ]

What I'm wondering is whether this is really worth doing or not.
There are currently just two parts of the lexer rules that are affected
--- the {real} rule illustrated above, and the rules that allow quoted
strings to be split across lines as the SQL spec requires.  But the
patches are still pretty ugly, and what's really annoying is that there
doesn't seem to be any way to get flex to complain if someone later
makes a change that breaks the no-backup-cases property again.

 




I would be more concerned if there were not reasonable alternatives to 
many bulk parsed inserts (COPY, prepared statement).


But I do think it's worth it, even so ... not all client interfaces 
support prepared statements (notoriously PHP, although I understand KL 
has sent patches to fix that) and not all inserts are suitable for COPY.


cheers

andrew

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Speeding up the Postgres lexer

2005-05-23 Thread Christopher Kings-Lynne

What I'm wondering is whether this is really worth doing or not.
There are currently just two parts of the lexer rules that are affected
--- the {real} rule illustrated above, and the rules that allow quoted
strings to be split across lines as the SQL spec requires.  But the
patches are still pretty ugly, and what's really annoying is that there
doesn't seem to be any way to get flex to complain if someone later
makes a change that breaks the no-backup-cases property again.


I was just thinking that if there's not a switch, it's prone to error again.

However, the lexer isn't touched anywhere near as much as the grammar is 
 right?  So just put a large comment/warning/reminder at the top to 
test for non-backup states.


I'm definitely in favour of a 1/3 speedup of the lexer.

Chris

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Speeding up the Postgres lexer

2005-05-23 Thread Christopher Kings-Lynne
But I do think it's worth it, even so ... not all client interfaces 
support prepared statements (notoriously PHP, although I understand KL 
has sent patches to fix that) and not all inserts are suitable for COPY.


There is now pg_prepare/pg_execute/pg_query_params in PHP, however you 
could always have just used straight SQL PREPARE and EXECUTE commands.


Chris

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Speeding up the Postgres lexer

2005-05-23 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2005-05-23 at 12:31 -0400, Tom Lane wrote:
 doesn't seem to be any way to get flex to complain if someone later
 makes a change that breaks the no-backup-cases property again.

 After some digging, there is a  -b option will generate a file called
 lex.backup if any backup-states exist. The file is said to contain
 information that would help you remove backup-states. 

Right, reading that file is how I learned to fix it ...

 It seems straightforward to test for the existence of that file in the
 build process? Or perhaps add a special test state --enable-flextest to
 perform the test during the build.

Well, the problem is that the success state isn't that the file
doesn't exist or is empty or anything so easy as that.  What you
want is for it to say

No backing up.

and it seems a tad too ugly to code such a test into the makefiles.
I'd not want to bet that the success message is spelled the same way
in every flex version, or is locale-independent.

Based on the comments so far in this thread, I'll go ahead and commit
the patch, with some comments attached of course --- in particular a big
head comment to run flex with -b and see that lex.backup says something
to this effect.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Speeding up the Postgres lexer

2005-05-23 Thread Christopher Kings-Lynne

Based on the comments so far in this thread, I'll go ahead and commit
the patch, with some comments attached of course --- in particular a big
head comment to run flex with -b and see that lex.backup says something
to this effect.


Add it to the release check-list.

Chris

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Speeding up the Postgres lexer

2005-05-23 Thread Tom Lane
Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 However, the lexer isn't touched anywhere near as much as the grammar is 
   right?

Yeah --- if you look at the CVS history, changes that affect the flex
rules (and not just the text of the C-code actions) are really rare
these days.  If there were a lot of churn I'd think this proposal was
completely impractical, but there's not much anymore.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq