Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-05 Thread Kevin Grittner
Kevin Grittner  wrote:

> I think a large part of what is at issue here stems from a bad
> name for the new bool field I added to the RelationData structure
> -- instead of rd_isscannable it should probably be called
> something like rd_ispopulated.  The current name led to some
> fuzzy thinking on my part when it was referenced, and both the
> name and a couple ill-considered uses of it probably contributed
> to concerns about how it was generated and used.
>
> I will post a draft patch today to see whether concerns abate.
> Basically, the name change should help make clear that this is
> not intended to be the only way to determine whether a matview is
> scannable.  Second, there is at least on (and probably more)
> direct tests of this field which should use a function for a
> scannability test.  For 9.3, that will just wrap a test of this
> bool, but it makes clear what the longer-term intent is, and help
> ensure that things don't get missed when patches are written in
> later releases.  Third, some comments need to be corrected and
> added.
>
> Hopefully it can help get us all onto the same page.  If not, it
> should at least better focus the discussion.

Attached is a firt cut at drawing a bright line between the notion
of whether a matview has been *populated* and whether it is
*scannable*.  In 9.3 one is true if and only if the other is, but
I plan on doing work such that this will no longer be true in the
next release, and not making a clear distinction now has been
confusing everyone, including me.

This patch is light on functional changes, and heavier on name and
comment changes.  It does fix the lack of locking for the
user-visible pg_relation_is_scannable() function, and it does
correct the performance regression in pg_dump, but those should be
the only user-visible changes.  Internally I also tried to correct
a modularity violation complained of by Tom.

Vacuum still needs to be taught not to truncate away the first page
of a matview, but that seemed like it was better left for a
separate patch, and if we decide not to use the current technique
for identifying a non-populated matview, it owuld be one more thing
to undo.

It passed `make check-world` and `make installcheck-world`, and a
dump and load of the regression database works.  I got sidetracked
with some support issues, so I didn't have time to dig around for
other weak comments, but I think I'm close on all other changes.

Comments?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***
*** 381,393  cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose,
  		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
  
  	/*
! 	 * Quietly ignore the request if the a materialized view is not scannable.
! 	 * No harm is done because there is nothing no data to deal with, and we
! 	 * don't want to throw an error if this is part of a multi-relation
! 	 * request -- for example, CLUSTER was run on the entire database.
  	 */
  	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
! 		!OldHeap->rd_isscannable)
  	{
  		relation_close(OldHeap, AccessExclusiveLock);
  		return;
--- 381,394 
  		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
  
  	/*
! 	 * Quietly ignore the request if this is a materialized view which has not
! 	 * been populated from its query. No harm is done because there is no data
! 	 * to deal with, and we don't want to throw an error if this is part of a
! 	 * multi-relation request -- for example, CLUSTER was run on the entire
! 	 * database.
  	 */
  	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
! 		!OldHeap->rd_ispopulated)
  	{
  		relation_close(OldHeap, AccessExclusiveLock);
  		return;
***
*** 923,929  copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
  
  	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW)
  		/* Make sure the heap looks good even if no rows are written. */
! 		SetRelationIsScannable(NewHeap);
  
  	/*
  	 * Scan through the OldHeap, either in OldIndex order or sequentially;
--- 924,930 
  
  	if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW)
  		/* Make sure the heap looks good even if no rows are written. */
! 		SetMatViewToPopulated(NewHeap);
  
  	/*
  	 * Scan through the OldHeap, either in OldIndex order or sequentially;
*** a/src/backend/commands/createas.c
--- b/src/backend/commands/createas.c
***
*** 417,423  intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
  
  	if (into->relkind == RELKIND_MATVIEW && !into->skipData)
  		/* Make sure the heap looks good even if no rows are written. */
! 		SetRelationIsScannable(intoRelationDesc);
  
  	/*
  	 * Check INSERT permission on the constructed table.
--- 417,423 
  
  	if (into->relkind == RELKIND_MATVIEW && !into->skipData)
  		/* Make sure the heap looks good even if

Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-05 Thread Kevin Grittner
Kevin Grittner  wrote:

> I think I need to review the whole thread again to make sure I
> wasn't too quick to concede the point.

On a fresh reading of this, I think a large part of what is at
issue here stems from a bad name for the new bool field I added to
the RelationData structure -- instead of rd_isscannable it should
probably be called something like rd_ispopulated.  The current name
led to some fuzzy thinking on my part when it was referenced, and
both the name and a couple ill-considered uses of it probably
contributed to concerns about how it was generated and used.

I will post a draft patch today to see whether concerns abate. 
Basically, the name change should help make clear that this is not
intended to be the only way to determine whether a matview is
scannable.  Second, there is at least on (and probably more) direct
tests of this field which should use a function for a scannability
test.  For 9.3, that will just wrap a test of this bool, but it
makes clear what the longer-term intent is, and help ensure that
things don't get missed when patches are written in later releases.
Third, some comments need to be corrected and added.

Hopefully it can help get us all onto the same page.  If not, it
should at least better focus the discussion.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-05 Thread Kevin Grittner
Noah Misch  wrote:

> The SQL commands I cited as responsible for creating or removing
> the fork all make a new relfilenode anyway.  Thus, "add" actually
> means creating the fork with the new relfilenode, and "remove"
> actually means omitting the fork from the new relfilenode.  The
> association between relfilenodes and relations is, of course,
> transactional.

The same argument applies to the currently-committed code.  The
goal is to not change a matview between zero-length and non-zero
length once the heap exists; but to only have this state change
when REFRESH replaces the heap in the style of CLUSTER, TRUNCATE,
ALTER TABLE with heap rewrite, or the new VACUUM FULL.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-05 Thread Kevin Grittner
Tom Lane  wrote:

> I realize that there's no other (easy) way to make unlogged
> matviews reset to an invalid state on crash, but that doesn't
> make this design choice less of a disaster.  It boxes us into
> something that's entirely unable to support transitions between
> scannable and unscannable states by any means short of a complete
> rewrite of the matview contents; which seems fundamentally
> incompatible with any sort of incremental update scenario.

That assertion makes absolutely no sense.  Once we design (in a
future release cycle) a way for users to declare how current a view
must be to be usable, there is no reason the
pg_relation_is_scannable() function cannot be modified to use those
mechanisms.  I continue to believe that it is a bad idea to ever
allow a matview to be scanned when it is not a materialization of
its query, but that does *not* mean that all matviews that are a
materialization of the query would need to be considered scannable.
My working assumption was that the isscannable field in relcache
would be the first of many tests once we get there; if the matview
actually does represent some materialization of data, the function
would then proceed to check whether it is current enough to use,

It does mean that you could not start incremental maintenance on a
matview without first generating a base by running the query to
create initial data (as part of CREATE or REFRESH), but that seems
like a *good* thing to me.  To do otherwise would make no more
sense than to try to recover a database using just WAL files
without a base backup to apply them to.

> And I remain of the opinion that it's going to box us into not
> being able to fix the problems because of pg_upgrade on-disk-
> compatibility issues.

This argument also seems bogus to me.  Since it is a valid heap
either way, the *worst case* would be recommending that after
upgrading users take some special action on any materialized views
which were not scannable; and I doubt that we would need to do
that.  If you see some risk I'm missing, please elaborate.

> We will be far better off to drop unlogged matviews until we can
> come up with a better design.  If that's so far off that no one
> can see it happening, well, that's tough.  Leaving the door open
> for incremental maintenance is more important.

I've been looking at what is needed for incremental maintenance,
and I'm not seeing the problem.  Since you can't incrementally
update a view which has never been populated, the only way in which
it could create a problem in terms of transactional behavior, I
think, is that this would make it harder to not hold an
AccessExclusiveLock when transitioning between not having
materialized data and having it (or vice versa), and I'm dubious we
can avoid such a lock anyway.  I don't see where it creates any
problems for performing incremental updates once we are in a state
which can allow them.

> This really needs to be catalog state, not filesystem state.

That may be true, but the arguments in this post are so off-base
that I'm wondering whether it really is.  When I read some earlier
posts I was convinced, but now I think I need to review the whole
thread again to make sure I wasn't too quick to concede the point.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-04 Thread Alvaro Herrera
Tom Lane escribió:
> Noah Misch  writes:

> > A slight variation on the committed approach would be to add a "_scannable"
> > relation fork.
> 
> Not very transaction-safe, I think (consider crash midway through a
> transaction that adds or removes the fork), and in any case orders of
> magnitude more expensive than looking at a pg_class field.  This really
> needs to be catalog state, not filesystem state.

We could revive the pg_class_nt patch proposed a decade ago, perhaps ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-04 Thread Tom Lane
Noah Misch  writes:
> On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
>> No.  This is an absolute disaster.  It's taking something we have always
>> considered to be an irrelevant implementation detail and making it into
>> user-visible DDL state, despite the fact that it doesn't begin to satisfy
>> basic transactional behaviors.  We *need* to get rid of that aspect of
>> things.  If you must have scannability state in version 0, okay, but
>> it has to be a catalog property not this.

> In large part, this ended up outside the catalogs due to key limitations of
> the startup process.

Yeah, I realize that there's no other (easy) way to make unlogged
matviews reset to an invalid state on crash, but that doesn't make this
design choice less of a disaster.  It boxes us into something that's
entirely unable to support transitions between scannable and unscannable
states by any means short of a complete rewrite of the matview contents;
which seems fundamentally incompatible with any sort of incremental
update scenario.  And I remain of the opinion that it's going to box us
into not being able to fix the problems because of pg_upgrade
on-disk-compatibility issues.  We will be far better off to drop
unlogged matviews until we can come up with a better design.  If that's
so far off that no one can see it happening, well, that's tough.
Leaving the door open for incremental maintenance is more important.

> A slight variation on the committed approach would be to add a "_scannable"
> relation fork.

Not very transaction-safe, I think (consider crash midway through a
transaction that adds or removes the fork), and in any case orders of
magnitude more expensive than looking at a pg_class field.  This really
needs to be catalog state, not filesystem state.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-04 Thread Noah Misch
Subject updated to account for the wider topics now appearing.

On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
> What I'd actually rather see us spending time on right now is making
> some provision for incremental updates, which I will boldly propose
> could be supported by user-written triggers on the underlying tables
> if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
> matviews, and allowed them to operate on a matview's contents just like
> it was a table.  Now admittedly that would foreclose allowing matviews
> to be updatable in the updatable-view sense, but that's a feature I
> would readily give up if it meant users could build incremental update
> mechanisms this year and not two years down the road.

I can't see taking MVs in the direction of allowing arbitrary DML; that's what
tables are for.  Users wishing to do that should keep using current methods:

  CREATE VIEW mv_impl AS SELECT ...;
  CREATE TABLE mv AS SELECT * FROM mv_impl;
  -- REFRESH analogue: fancier approaches exist
  BEGIN; TRUNCATE mv; INSERT INTO mv SELECT * FROM mv_impl; COMMIT;

If anything, I'd help these users by introducing mechanisms for obtaining a
TRUNCATE;INSERT with the bells and whistles of REFRESH MATERIALIZED VIEW.
Namely, bulk index rebuilds, skipping WAL, and frozen tuples.

> > ... Making sure that
> > the heap has at least one page if data has been generated seems
> > like a not-entirely-unreasonable way to do that, although there
> > remains at least one vacuum bug to fix if we keep it, in addition
> > to Tom's concerns.
> 
> No.  This is an absolute disaster.  It's taking something we have always
> considered to be an irrelevant implementation detail and making it into
> user-visible DDL state, despite the fact that it doesn't begin to satisfy
> basic transactional behaviors.  We *need* to get rid of that aspect of
> things.  If you must have scannability state in version 0, okay, but
> it has to be a catalog property not this.

In large part, this ended up outside the catalogs due to key limitations of
the startup process.  This isn't the first time we've arranged an unusual
dance for this reason, and it probably won't be the last.  Sure, we could take
out unlogged MVs to evade the problem, but re-adding them will mean either
restoring relfilenode-based bookkeeping or attacking the startup process
limitation directly.  There exist fundamental challenges to a direct attack,
like the potential inconsistency of system catalogs themselves.  We could
teach pg_relation_is_scannable() that unlogged MVs are always non-scannable
during recovery, then somehow update system catalogs in all databases at the
end of recovery.  Not a project for 9.3, and I wouldn't be surprised to see it
mushroom in complexity.  The currently-committed approach is a good one given
the applicable constraints.

A slight variation on the committed approach would be to add a "_scannable"
relation fork.  The fork would either be absent (not scannable if an MV) or
empty (possibly-scannable).  Create it in CREATE MATERIALIZED VIEW sans WITH
NO DATA and in REFRESH MATERIALIZED VIEW.  Remove it in TRUNCATE.  When the
startup process reinitializes an unlogged relation, it would also remove any
_scannable fork.  This has a few advantages over the current approach: VACUUM
won't need a special case, and pg_upgrade will be in a better position to blow
away all traces if we introduce a better approach.  The disadvantage is an
extra inode per healthy MV.  (Though it does not lead to a 9.3 solution, I'll
note that an always-present relation metapage would help here.)

Note that I said "possibly-scannable" -- the relfilenode-based indicator
(whether the committed approach or something else) doesn't need to remain the
only input to the question of scannability.  If 9.5 introduces the concept of
age-based scannability expiration, the applicable timestamp could go in
pg_class, and pg_relation_is_scannable() could check both that and the
relfilenode-based indicator.


Concerning the original $SUBJECT, I would look at fixing the performance
problem by having pg_relation_is_scannable() use an algorithm like this:

1. Grab the pg_class entry from syscache.  If it's not found, return NULL.
2. If it's not a matview, return false.
3. Lock the matview and try to open a relcache entry.  Return NULL on failure.
4. Return the scannability as reported by the relcache.

This boils down to the CASE statement you noted upthread, except putting the
fast-exit logic in the function rather than in its caller(s).

nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers