[HACKERS] Fix pgstattuple/pgstatindex to use regclass-type as the argument
Hi, As I wrote before, I'd like to clean up pgstattuple functions to allow the same expressions. Re: [HACKERS] [RFC] pgstattuple/pgstatindex enhancement http://www.postgresql.org/message-id/511ee19b.5010...@uptime.jp My goal is to allow specifying a relation/index with several expressions, 'relname', 'schemaname.relname' and oid in all pgstattuple functions. pgstatindex() does not accept oid so far. I have found that the backward-compatibility can be kept when the arguments (text and/or oid) are replaced with regclass type. regclass type seems to be more appropriate here. So, I cleaned up those three functions, pgstattuple(), pgstatindex(), pg_relpages(), to accept a regclass-type argument, instead of using text and/or oid type, as the test cases show. select * from pgstatindex('test_pkey'); select * from pgstatindex('public.test_pkey'); select * from pgstatindex('myschema.test_pkey'); select * from pgstatindex('myschema.test_pkey'::regclass::oid); With attached patch, all functions in the pgstattuple module can accept the same expression to specify the target relation/index. Any comments or suggestions? Regards, -- Satoshi Nagayasu Uptime Technologies, LLC. http://www.uptime.jp pgstattuple_regclass_v1.diff.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] scanner/parser minimization
On 02.03.2013 17:09, Tom Lane wrote: Greg Stark writes: Regarding yytransition I think the problem is we're using flex to implement keyword recognition which is usually not what it's used for. Usually people use flex to handle syntax things like quoting and numeric formats. All identifiers are handled by flex as equivalent. Then the last step in the scanner for identifiers is to look up the identifier in a hash table and return the keyword token if it's a keyword. That would massively simplify the scanner tables. Uh ... no. I haven't looked into why the flex tables are so large, but this theory is just wrong. See ScanKeywordLookup(). Interestingly, the yy_transition array generated by flex used to be much smaller: 8.3: 22072 elements 8.4: 62623 elements master: 64535 elements The big jump between 8.3 and 8.4 was caused by introduction of the unicode escapes: U&'foo' [UESCAPE 'x'] . And in particular, the "error rule" for the UESCAPE, which we use to avoid backtracking. I experimented with a patch that uses two extra flex states to shorten the error rules, see attached. The idea is that after lexing a unicode literal like "U&'foo'", you enter a new state, in which you check whether an "UESCAPE 'x'" follows. This slashes the size of the array to 36581 elements. - Heikki *** a/src/backend/parser/scan.l --- b/src/backend/parser/scan.l *** *** 162,168 --- 162,170 %x xq %x xdolq %x xui + %x xuiend %x xus + %x xusend %x xeu /* *** *** 279,295 /* Unicode escapes */ uescape [uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']{quote} /* error rule to avoid backup */ ! uescapefail ("-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU]) /* Quoted identifier with Unicode escapes */ xuistart [uU]&{dquote} - xuistop1 {dquote}{whitespace}*{uescapefail}? - xuistop2 {dquote}{whitespace}*{uescape} /* Quoted string with Unicode escapes */ xusstart [uU]&{quote} ! xusstop1 {quote}{whitespace}*{uescapefail}? ! xusstop2 {quote}{whitespace}*{uescape} /* error rule to avoid backup */ xufailed [uU]& --- 281,297 /* Unicode escapes */ uescape [uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']{quote} /* error rule to avoid backup */ ! uescapefail [uU][eE][sS][cC][aA][pP][eE]{whitespace}*"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU] /* Quoted identifier with Unicode escapes */ xuistart [uU]&{dquote} /* Quoted string with Unicode escapes */ xusstart [uU]&{quote} ! ! xustop1 {uescapefail}? ! xustop2 {uescape} ! /* error rule to avoid backup */ xufailed [uU]& *** *** 536,549 yylval->str = litbufdup(yyscanner); return SCONST; } ! {xusstop1} { ! /* throw back all but the quote */ yyless(1); BEGIN(INITIAL); yylval->str = litbuf_udeescape('\\', yyscanner); return SCONST; } ! {xusstop2} { BEGIN(INITIAL); yylval->str = litbuf_udeescape(yytext[yyleng-2], yyscanner); return SCONST; --- 538,558 yylval->str = litbufdup(yyscanner); return SCONST; } ! {quotestop} | ! {quotefail} { yyless(1); + BEGIN(xusend); + } + {whitespace} + {other} | + {xustop1} { + /* throw back everything */ + yyless(0); BEGIN(INITIAL); yylval->str = litbuf_udeescape('\\', yyscanner); return SCONST; } ! {xustop2} { BEGIN(INITIAL); yylval->str = litbuf_udeescape(yytext[yyleng-2], yyscanner); return SCONST; *** *** 702,708 yylval->str = ident; return IDENT; } ! {xuistop1} { char *ident; BEGIN(INITIAL); --- 711,723 yylval->str = ident; return IDENT; } ! {dquote} { ! yyless(1); ! BEGIN(xuiend); ! } ! {whitespace} { } ! {other} | ! {xustop1} { char *ident; BEGIN(INITIAL); *** *** 712,722 if (yyextra->literallen >= NAMEDATALEN) truncate_identifier(ident, yyextra->literallen, true); yylval->str = ident; ! /* throw back all but the quote */ ! yyless(1); return IDENT; } ! {xuistop2} { char *ident; BEGIN(INITIAL); --- 727,736 if (yyextra->literallen >= NAMEDATALEN) truncate_identifier(ident, yyextra->literallen, true); yylval->str = ident; ! /* throw back everything that follows the end quote */ return IDENT; } ! {xustop2} { char *ident; BEGIN(INITIAL); -- Sent v
Re: [HACKERS] scanner/parser minimization
Robert Haas writes: > On Thu, Feb 28, 2013 at 4:09 PM, Tom Lane wrote: >> I believe however that it's possible to extract an idea of which >> tokens the parser believes it can see next at any given parse state. >> (I've seen code for this somewhere on the net, but am too lazy to go >> searching for it again right now.) So we could imagine a rule along >> the lines of "if IDENT is allowed as a next token, and $KEYWORD is >> not, then return IDENT not the keyword's own token". >> >> This might be unworkable from a speed standpoint, depending on how >> expensive it is to make the determination about allowable next symbols. >> But it seems worth looking into. > Interesting idea. But wouldn't that change the semantics of the > grammar in some places? In particular, keywords would generally > become less-reserved than they are now, but in a context-dependent > way. Yeah, Greg already raised that point. It shouldn't change the semantics of the grammar *today*, but it would become much harder to tell whether future changes create any conflicts; we'd lose the mechanical verification we get from bison with the current method. That might be sufficient reason not to pursue the idea. It's possible that static analysis of the grammar could replace bison verification; that is, assuming we have a way to identify all states in which both IDENT and some-keyword are allowable, we could generate a report listing which keywords are potentially going to be considered non-reserved, and then watch for changes in that report. But this would require writing even more code that we don't have today. I suspect also that we'd still need a notion of fully reserved keywords that don't get flipped to IDENT even if that's a legal next symbol. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] scanner/parser minimization
On Thu, Feb 28, 2013 at 4:09 PM, Tom Lane wrote: > I believe however that it's possible to extract an idea of which > tokens the parser believes it can see next at any given parse state. > (I've seen code for this somewhere on the net, but am too lazy to go > searching for it again right now.) So we could imagine a rule along > the lines of "if IDENT is allowed as a next token, and $KEYWORD is > not, then return IDENT not the keyword's own token". > > This might be unworkable from a speed standpoint, depending on how > expensive it is to make the determination about allowable next symbols. > But it seems worth looking into. Interesting idea. But wouldn't that change the semantics of the grammar in some places? In particular, keywords would generally become less-reserved than they are now, but in a context-dependent way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] scanner/parser minimization
Greg Stark writes: > Regarding yytransition I think the problem is we're using flex to > implement keyword recognition which is usually not what it's used for. > Usually people use flex to handle syntax things like quoting and > numeric formats. All identifiers are handled by flex as equivalent. > Then the last step in the scanner for identifiers is to look up the > identifier in a hash table and return the keyword token if it's a > keyword. That would massively simplify the scanner tables. Uh ... no. I haven't looked into why the flex tables are so large, but this theory is just wrong. See ScanKeywordLookup(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] scanner/parser minimization
Regarding yytransition I think the problem is we're using flex to implement keyword recognition which is usually not what it's used for. Usually people use flex to handle syntax things like quoting and numeric formats. All identifiers are handled by flex as equivalent. Then the last step in the scanner for identifiers is to look up the identifier in a hash table and return the keyword token if it's a keyword. That would massively simplify the scanner tables. This hash table can be heavily optimized because it's a static lookup. c.f. http://www.gnu.org/software/gperf/ In theory this is more expensive since it needs to do a strcmp in addition to scanning the identifier to determine whether the token ends. But I suspect in practice the smaller tables might outweight that cost. On Thu, Feb 28, 2013 at 9:09 PM, Tom Lane wrote: > I believe however that it's possible to extract an idea of which > tokens the parser believes it can see next at any given parse state. > (I've seen code for this somewhere on the net, but am too lazy to go > searching for it again right now.) So we could imagine a rule along > the lines of "if IDENT is allowed as a next token, and $KEYWORD is > not, then return IDENT not the keyword's own token". That's a pretty cool idea. I'm afraid it might be kind of slow to produce that list and load it into a hash table for every ident token though. I suppose if you can request it only if the ident is a keyword of some type then it wouldn't actually kick in often. That would also imply we could simply use IDENT everywhere we currently have col_name_keyword or type_function_name. Just by having a rule that accepts a keyword that would implicitly make it not be accepted as an IDENT when unquoted in that location. That might make the documentation a bit trickier and it would make it harder for users to make their code forward-compatible. It would also make it harder for hackers to determine when we've accidentally narrowed the allowable identifiers for more cases than they expect. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Greg Stark wrote: > Ants Aasma wrote: >> To give another example of potential future update semantics, if >> we were to allow users manually maintaining materialized view >> contents using DML commands, one would expect TRUNCATE to mean >> "make this matview empty", not "make this matview unavailable". > > Wouldn't that just be a regular table then though? How is that a > materialized view? > > If anything someone might expect truncate to delete any rows from > the source table that appear in the view. But I think it's likely > that even if materialized views were updateable truncate wouldn't > be one of the updateable operations. Yeah, the only way it would make sense to truncate MV contents from a user-written maintenance trigger (assuming we might have such a thing some day) would be if you decided that the change to the underlying data was so severe that you effectively needed to REFRESH, and there would probably be a better way to go about dealing with that. Two other reasons that this might not be a problem are: (1) Any DML against the MV would need to be limited to some context fired by the underlying changes. If we allow changes to the MV outside of that without it being part of some "updateable MV" feature (reversing the direction of flow of changes), the MV could not be trusted at all. If you're going to do that, just use a table. (2) I can think of a couple not-too-horrible syntax tricks we could use to escape from the corner we're worried about painting ourselves into. All of that said, some combination of Heikki's previous suggestion that maybe REFRESH could be used and my noticing that both TRUNCATE and REFRESH create a new heap for an MV, it's just a question of whether we then run the MV's query to fill it with data, led to this thought: REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA This sort of mirrors the CREATE MATERIALIZED VIEW style (which was based on CREATE TABLE AS) and WITH NO DATA puts the MV into the unscannable state either way. I can change the parser to make this literally just the new spelling of TRUNCATE MATERIALIZED VIEW without dashing my hopes of pushing the patch tomorrow. (The patch has been ready to go for weeks other than this syntax issue and documentation which needs to refer to whatever syntax is chosen.) Barring objections, I will use the above and push tomorrow. The only issues which have been raised which will not be addressed at that point are: (1) The suggestion that ALTER MATERIALIZED VIEW name ALTER column support changing the collation isn't something I can see how to do without complication and risk beyond what is appropriate at this point in the release cycle. It will be left off, at least for now. To get a new collation, you will need to drop the MV and re-create it with a query which specifies the collation for the result column. (2) The sepgsql changes are still waiting for a decision from security focused folks. I have two patches for that contrib module ready based on my best reading of things -- one which uses table security labels and one which instead uses new matview labels. When we get a call on which is preferred, I suspect that one patch or the other will be good as-is or with minimal change. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Fri, Mar 1, 2013 at 3:01 PM, Ants Aasma wrote: > . To give another example of potential future > update semantics, if we were to allow users manually maintaining > materialized view contents using DML commands, one would expect > TRUNCATE to mean "make this matview empty", not "make this matview > unavailable". Wouldn't that just be a regular table then though? How is that a materialized view? If anything someone might expect truncate to delete any rows from the source table that appear in the view. But I think it's likely that even if materialized views were updateable truncate wouldn't be one of the updateable operations. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Trigger
Alvaro Herrera writes: > Dimitri Fontaine escribió: > >> The good news is that the patch to do that has already been sent on this >> list, and got reviewed in details by Álvaro who did offer incremental >> changes. Version 3 of that patch is to be found in: >> >> http://www.postgresql.org/message-id/m2fw19n1hr@2ndquadrant.fr > > Here's a v4 of that patch. I added support for DROP OWNED, and added > object name and schema name available to the pg_dropped_objects > function. Thanks! > Do we want some more stuff provided by pg_dropped_objects? We now have > classId, objectId, objectSubId, object name, schema name. One further > thing I think we need is the object's type, i.e. a simple untranslated > string "table", "view", "operator" and so on. AFAICT this requires a > nearly-duplicate of getObjectDescription. About what missing information to add, please review: http://wiki.postgresql.org/wiki/Event_Triggers#How_to_expose_Information_to_Event_Triggers_Functions I'd like us to provide the same set of extra information from within classic Event Triggers and in the records returned by the function. Maybe a good idea would be to create a datatype for that, and publish a single TG_DETAILS variable from within Event Triggers, of that type, and have the pg_dropped_objects() function return a setof that type? About the implementation and the getObjectDescription() remark, please have a look at your latest revision of the other patch in the series, http://www.postgresql.org/message-id/20130109165829.gb4...@alvh.no-ip.org I think it provides exactly what you need here, and you already did cleanup my work for getting at that… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers