[HACKERS] Fix pgstattuple/pgstatindex to use regclass-type as the argument

2013-03-02 Thread Satoshi Nagayasu
Hi,

As I wrote before, I'd like to clean up pgstattuple functions to
allow the same expressions.

Re: [HACKERS] [RFC] pgstattuple/pgstatindex enhancement
http://www.postgresql.org/message-id/511ee19b.5010...@uptime.jp

My goal is to allow specifying a relation/index with several
expressions, 'relname', 'schemaname.relname' and oid in all
pgstattuple functions. pgstatindex() does not accept oid so far.

I have found that the backward-compatibility can be kept
when the arguments (text and/or oid) are replaced with regclass
type. regclass type seems to be more appropriate here.

So, I cleaned up those three functions, pgstattuple(), pgstatindex(),
pg_relpages(), to accept a regclass-type argument, instead of using
text and/or oid type, as the test cases show.

 select * from pgstatindex('test_pkey');
 select * from pgstatindex('public.test_pkey');
 select * from pgstatindex('myschema.test_pkey');
 select * from pgstatindex('myschema.test_pkey'::regclass::oid);

With attached patch, all functions in the pgstattuple module can
accept the same expression to specify the target relation/index.

Any comments or suggestions?

Regards,
-- 
Satoshi Nagayasu 
Uptime Technologies, LLC. http://www.uptime.jp


pgstattuple_regclass_v1.diff.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scanner/parser minimization

2013-03-02 Thread Heikki Linnakangas

On 02.03.2013 17:09, Tom Lane wrote:

Greg Stark  writes:

Regarding yytransition I think the problem is we're using flex to
implement keyword recognition which is usually not what it's used for.
Usually people use flex to handle syntax things like quoting and
numeric formats. All identifiers are handled by flex as equivalent.
Then the last step in the scanner for identifiers is to look up the
identifier in a hash table and return the keyword token if it's a
keyword. That would massively simplify the scanner tables.


Uh ... no.  I haven't looked into why the flex tables are so large,
but this theory is just wrong.  See ScanKeywordLookup().


Interestingly, the yy_transition array generated by flex used to be much 
smaller:


8.3: 22072 elements
8.4: 62623 elements
master: 64535 elements

The big jump between 8.3 and 8.4 was caused by introduction of the 
unicode escapes: U&'foo' [UESCAPE 'x'] . And in particular, the "error 
rule" for the UESCAPE, which we use to avoid backtracking.


I experimented with a patch that uses two extra flex states to shorten 
the error rules, see attached. The idea is that after lexing a unicode 
literal like "U&'foo'", you enter a new state, in which you check 
whether an "UESCAPE 'x'" follows. This slashes the size of the array to 
36581 elements.


- Heikki
*** a/src/backend/parser/scan.l
--- b/src/backend/parser/scan.l
***
*** 162,168 
--- 162,170 
  %x xq
  %x xdolq
  %x xui
+ %x xuiend
  %x xus
+ %x xusend
  %x xeu
  
  /*
***
*** 279,295 
  /* Unicode escapes */
  uescape			[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']{quote}
  /* error rule to avoid backup */
! uescapefail		("-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU])
  
  /* Quoted identifier with Unicode escapes */
  xuistart		[uU]&{dquote}
- xuistop1		{dquote}{whitespace}*{uescapefail}?
- xuistop2		{dquote}{whitespace}*{uescape}
  
  /* Quoted string with Unicode escapes */
  xusstart		[uU]&{quote}
! xusstop1		{quote}{whitespace}*{uescapefail}?
! xusstop2		{quote}{whitespace}*{uescape}
  
  /* error rule to avoid backup */
  xufailed		[uU]&
--- 281,297 
  /* Unicode escapes */
  uescape			[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']{quote}
  /* error rule to avoid backup */
! uescapefail		[uU][eE][sS][cC][aA][pP][eE]{whitespace}*"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU]
  
  /* Quoted identifier with Unicode escapes */
  xuistart		[uU]&{dquote}
  
  /* Quoted string with Unicode escapes */
  xusstart		[uU]&{quote}
! 
! xustop1		{uescapefail}?
! xustop2		{uescape}
! 
  
  /* error rule to avoid backup */
  xufailed		[uU]&
***
*** 536,549 
  	yylval->str = litbufdup(yyscanner);
  	return SCONST;
  }
! {xusstop1} {
! 	/* throw back all but the quote */
  	yyless(1);
  	BEGIN(INITIAL);
  	yylval->str = litbuf_udeescape('\\', yyscanner);
  	return SCONST;
  }
! {xusstop2} {
  	BEGIN(INITIAL);
  	yylval->str = litbuf_udeescape(yytext[yyleng-2], yyscanner);
  	return SCONST;
--- 538,558 
  	yylval->str = litbufdup(yyscanner);
  	return SCONST;
  }
! {quotestop} |
! {quotefail} {
  	yyless(1);
+ 	BEGIN(xusend);
+ }
+ {whitespace}
+ {other} |
+ {xustop1} {
+ 	/* throw back everything */
+ 	yyless(0);
  	BEGIN(INITIAL);
  	yylval->str = litbuf_udeescape('\\', yyscanner);
  	return SCONST;
  }
! {xustop2} {
  	BEGIN(INITIAL);
  	yylval->str = litbuf_udeescape(yytext[yyleng-2], yyscanner);
  	return SCONST;
***
*** 702,708 
  	yylval->str = ident;
  	return IDENT;
  }
! {xuistop1}	{
  	char		   *ident;
  
  	BEGIN(INITIAL);
--- 711,723 
  	yylval->str = ident;
  	return IDENT;
  }
! {dquote} {
! 	yyless(1);
! 	BEGIN(xuiend);
! }
! {whitespace} { }
! {other} |
! {xustop1}	{
  	char		   *ident;
  
  	BEGIN(INITIAL);
***
*** 712,722 
  	if (yyextra->literallen >= NAMEDATALEN)
  		truncate_identifier(ident, yyextra->literallen, true);
  	yylval->str = ident;
! 	/* throw back all but the quote */
! 	yyless(1);
  	return IDENT;
  }
! {xuistop2}	{
  	char		   *ident;
  
  	BEGIN(INITIAL);
--- 727,736 
  	if (yyextra->literallen >= NAMEDATALEN)
  		truncate_identifier(ident, yyextra->literallen, true);
  	yylval->str = ident;
! 	/* throw back everything that follows the end quote */
  	return IDENT;
  }
! {xustop2}	{
  	char		   *ident;
  
  	BEGIN(INITIAL);

-- 
Sent v

Re: [HACKERS] scanner/parser minimization

2013-03-02 Thread Tom Lane
Robert Haas  writes:
> On Thu, Feb 28, 2013 at 4:09 PM, Tom Lane  wrote:
>> I believe however that it's possible to extract an idea of which
>> tokens the parser believes it can see next at any given parse state.
>> (I've seen code for this somewhere on the net, but am too lazy to go
>> searching for it again right now.)  So we could imagine a rule along
>> the lines of "if IDENT is allowed as a next token, and $KEYWORD is
>> not, then return IDENT not the keyword's own token".
>> 
>> This might be unworkable from a speed standpoint, depending on how
>> expensive it is to make the determination about allowable next symbols.
>> But it seems worth looking into.

> Interesting idea.  But wouldn't that change the semantics of the
> grammar in some places?  In particular, keywords would generally
> become less-reserved than they are now, but in a context-dependent
> way.

Yeah, Greg already raised that point.  It shouldn't change the semantics
of the grammar *today*, but it would become much harder to tell whether
future changes create any conflicts; we'd lose the mechanical
verification we get from bison with the current method.  That might be
sufficient reason not to pursue the idea.

It's possible that static analysis of the grammar could replace bison
verification; that is, assuming we have a way to identify all states in
which both IDENT and some-keyword are allowable, we could generate a
report listing which keywords are potentially going to be considered
non-reserved, and then watch for changes in that report.  But this would
require writing even more code that we don't have today.

I suspect also that we'd still need a notion of fully reserved keywords
that don't get flipped to IDENT even if that's a legal next symbol.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scanner/parser minimization

2013-03-02 Thread Robert Haas
On Thu, Feb 28, 2013 at 4:09 PM, Tom Lane  wrote:
> I believe however that it's possible to extract an idea of which
> tokens the parser believes it can see next at any given parse state.
> (I've seen code for this somewhere on the net, but am too lazy to go
> searching for it again right now.)  So we could imagine a rule along
> the lines of "if IDENT is allowed as a next token, and $KEYWORD is
> not, then return IDENT not the keyword's own token".
>
> This might be unworkable from a speed standpoint, depending on how
> expensive it is to make the determination about allowable next symbols.
> But it seems worth looking into.

Interesting idea.  But wouldn't that change the semantics of the
grammar in some places?  In particular, keywords would generally
become less-reserved than they are now, but in a context-dependent
way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scanner/parser minimization

2013-03-02 Thread Tom Lane
Greg Stark  writes:
> Regarding yytransition I think the problem is we're using flex to
> implement keyword recognition which is usually not what it's used for.
> Usually people use flex to handle syntax things like quoting and
> numeric formats. All identifiers are handled by flex as equivalent.
> Then the last step in the scanner for identifiers is to look up the
> identifier in a hash table and return the keyword token if it's a
> keyword. That would massively simplify the scanner tables.

Uh ... no.  I haven't looked into why the flex tables are so large,
but this theory is just wrong.  See ScanKeywordLookup().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scanner/parser minimization

2013-03-02 Thread Greg Stark
Regarding yytransition I think the problem is we're using flex to
implement keyword recognition which is usually not what it's used for.
Usually people use flex to handle syntax things like quoting and
numeric formats. All identifiers are handled by flex as equivalent.
Then the last step in the scanner for identifiers is to look up the
identifier in a hash table and return the keyword token if it's a
keyword. That would massively simplify the scanner tables.

This hash table can be heavily optimized because it's a static lookup.
c.f. http://www.gnu.org/software/gperf/

In theory this is more expensive since it needs to do a strcmp in
addition to scanning the identifier to determine whether the token
ends. But I suspect in practice the smaller tables might outweight
that cost.


On Thu, Feb 28, 2013 at 9:09 PM, Tom Lane  wrote:
> I believe however that it's possible to extract an idea of which
> tokens the parser believes it can see next at any given parse state.
> (I've seen code for this somewhere on the net, but am too lazy to go
> searching for it again right now.)  So we could imagine a rule along
> the lines of "if IDENT is allowed as a next token, and $KEYWORD is
> not, then return IDENT not the keyword's own token".

That's a pretty cool idea. I'm afraid it might be kind of slow to
produce that list and load it into a hash table for every ident token
though. I suppose if you can request it only if the ident is a keyword
of some type then it wouldn't actually kick in often.

That would also imply we could simply use IDENT everywhere we
currently have col_name_keyword or type_function_name. Just by having
a rule that accepts a keyword that would implicitly make it not be
accepted as an IDENT when unquoted in that location. That might make
the documentation a bit trickier and it would make it harder for users
to make their code forward-compatible. It would also make it harder
for hackers to determine when we've accidentally narrowed the
allowable identifiers for more cases than they expect.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views WIP patch

2013-03-02 Thread Kevin Grittner
Greg Stark  wrote:
> Ants Aasma  wrote:

>> To give another example of potential future update semantics, if
>> we were to allow users manually maintaining materialized view
>> contents using DML commands, one would expect TRUNCATE to mean
>> "make this matview empty", not "make this matview unavailable".
>
> Wouldn't that just be a regular table then though? How is that a
> materialized view?
>
> If anything someone might expect truncate to delete any rows from
> the source table that appear in the view. But I think it's likely
> that even if materialized views were updateable truncate wouldn't
> be one of the updateable operations.

Yeah, the only way it would make sense to truncate MV contents from
a user-written maintenance trigger (assuming we might have such a
thing some day) would be if you decided that the change to the
underlying data was so severe that you effectively needed to
REFRESH, and there would probably be a better way to go about
dealing with that.

Two other reasons that this might not be a problem are:

(1)  Any DML against the MV would need to be limited to some
context fired by the underlying changes.  If we allow changes to
the MV outside of that without it being part of some "updateable
MV" feature (reversing the direction of flow of changes), the MV
could not be trusted at all.  If you're going to do that, just use
a table.

(2)  I can think of a couple not-too-horrible syntax tricks we
could use to escape from the corner we're worried about painting
ourselves into.

All of that said, some combination of Heikki's previous suggestion
that maybe REFRESH could be used and my noticing that both TRUNCATE
and REFRESH create a new heap for an MV, it's just a question of
whether we then run the MV's query to fill it with data, led to
this thought:

REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA

This sort of mirrors the CREATE MATERIALIZED VIEW style (which was
based on CREATE TABLE AS) and WITH NO DATA puts the MV into the
unscannable state either way.  I can change the parser to make this
literally just the new spelling of TRUNCATE MATERIALIZED VIEW
without dashing my hopes of pushing the patch tomorrow.  (The patch
has been ready to go for weeks other than this syntax issue and
documentation which needs to refer to whatever syntax is chosen.)

Barring objections, I will use the above and push tomorrow.

The only issues which have been raised which will not be addressed
at that point are:

(1)  The suggestion that ALTER MATERIALIZED VIEW name ALTER column
support changing the collation isn't something I can see how to do
without complication and risk beyond what is appropriate at this
point in the release cycle.  It will be left off, at least for now.
 To get a new collation, you will need to drop the MV and re-create
it with a query which specifies the collation for the result
column.

(2)  The sepgsql changes are still waiting for a decision from
security focused folks.  I have two patches for that contrib module
ready based on my best reading of things -- one which uses table
security labels and one which instead uses new matview labels. 
When we get a call on which is preferred, I suspect that one patch
or the other will be good as-is or with minimal change.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views WIP patch

2013-03-02 Thread Greg Stark
On Fri, Mar 1, 2013 at 3:01 PM, Ants Aasma  wrote:
> . To give another example of potential future
> update semantics, if we were to allow users manually maintaining
> materialized view contents using DML commands, one would expect
> TRUNCATE to mean "make this matview empty", not "make this matview
> unavailable".

Wouldn't that just be a regular table then though? How is that a
materialized view?

If anything someone might expect truncate to delete any rows from the
source table that appear in the view. But I think it's likely that
even if materialized views were updateable truncate wouldn't be one of
the updateable operations.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sql_drop Event Trigger

2013-03-02 Thread Dimitri Fontaine
Alvaro Herrera  writes:

> Dimitri Fontaine escribió:
>
>> The good news is that the patch to do that has already been sent on this
>> list, and got reviewed in details by Álvaro who did offer incremental
>> changes. Version 3 of that patch is to be found in:
>> 
>>   http://www.postgresql.org/message-id/m2fw19n1hr@2ndquadrant.fr
>
> Here's a v4 of that patch.  I added support for DROP OWNED, and added
> object name and schema name available to the pg_dropped_objects
> function.

Thanks!

> Do we want some more stuff provided by pg_dropped_objects?  We now have
> classId, objectId, objectSubId, object name, schema name.  One further
> thing I think we need is the object's type, i.e. a simple untranslated
> string "table", "view", "operator" and so on.  AFAICT this requires a
> nearly-duplicate of getObjectDescription.

About what missing information to add, please review:

  
http://wiki.postgresql.org/wiki/Event_Triggers#How_to_expose_Information_to_Event_Triggers_Functions

I'd like us to provide the same set of extra information from within
classic Event Triggers and in the records returned by the function.

Maybe a good idea would be to create a datatype for that, and publish a
single TG_DETAILS variable from within Event Triggers, of that type, and
have the pg_dropped_objects() function return a setof that type?

About the implementation and the getObjectDescription() remark, please
have a look at your latest revision of the other patch in the series, 

  http://www.postgresql.org/message-id/20130109165829.gb4...@alvh.no-ip.org

I think it provides exactly what you need here, and you already did
cleanup my work for getting at that…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers