Re: [HACKERS] Relation locking and relcache load (was Re: Going for all green buildfarm results)

2006-07-31 Thread Tom Lane
I wrote:
 ... This means the only thing stopping us from
 taking lock before we invoke relcache is lack of knowledge about the
 rel's relisshared status.

While digging through all the places that open relcache entries,
I've realized that there's another problem, specifically the way that
we lock indexes.  The current theory is that index_open() takes no lock,
and then we establish a lock just for the duration of an index scan.
The comments for index_open explain:

 *Note: we acquire no lock on the index. A lock is not needed when
 *simply examining the index reldesc; the index's schema information
 *is considered to be protected by the lock that the caller had better
 *be holding on the parent relation. Some type of lock should be
 *obtained on the index before physically accessing it, however.
 *This is handled automatically for most uses by index_beginscan
 *and index_endscan for scan cases, or by ExecOpenIndices and
 *ExecCloseIndices for update cases. Other callers will need to
 *obtain their own locks.

However, the lionfish failure makes the folly of this approach evident
(it was in fact index_open that failed in that example, unless my theory
about it is all wrong).  If we're going to move to lock-before-open then
we've got to fix index_open too.  I envision making index_open just
about like heap_open, ie add a lockmode parameter, and then get rid of
the separate lock step in index_beginscan.

There is one small problem with doing that, which is this code in
ExecOpenIndices:

indexDesc = index_open(indexOid);

if (indexDesc-rd_am-amconcurrent)
LockRelation(indexDesc, RowExclusiveLock);
else
LockRelation(indexDesc, AccessExclusiveLock);

IOW you need to already have the index open to find out what sort of
lock to take on it.

I have a modest proposal for fixing that: let's get rid of amconcurrent,
so that RowExclusiveLock is always the right lock to take here.  All of
the currently supported AMs have amconcurrent = true, and so does the
proposed bitmap index patch, and given the project's current focus on
high concurrent performance I cannot imagine that we'd accept a future
patch to add a nonconcurrent index type.  So let's just legislate that
all AMs have to support concurrent updates (or at least that they can't
rely on the main system to protect them from the case).

Any objections?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Relation locking and relcache load (was Re: Going for all green buildfarm results)

2006-07-31 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 I think the best solution for this might be to put the responsibility
 for creating system catalogs' toast tables into the bootstrap phase
 instead of making initdb do it afterwards.  This would be a Good Thing
 anyway since currently we are incapable of dealing with bootstrap-time
 insertions of values large enough to need toasting.  I'm imagining
 adding macros to the include/catalog/*.h files along the lines of

Would this make it much more difficult to support user-defined indexes
on system catalogs?  It looks like we don't support that at the moment
but as we see larger Postgres installations it seems likely we'll need
to.  I don't really consider myself a very heavy Postgres user but I've
got databases w/  30k entries in pg_class and near 300k in
pg_attribute...  Those aren't shared but it sounds like you were talking
about all of them above anyway.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Relation locking and relcache load (was Re: Going for all green buildfarm results)

2006-07-31 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 * Tom Lane ([EMAIL PROTECTED]) wrote:
 I think the best solution for this might be to put the responsibility
 for creating system catalogs' toast tables into the bootstrap phase
 instead of making initdb do it afterwards.

 Would this make it much more difficult to support user-defined indexes
 on system catalogs?

AFAICS the problems with that are orthogonal to this.  You'll never have
user-defined (as in added after initdb) indexes on shared catalogs,
because there is no way to update their pg_class descriptions in all
databases at once.  For non-shared catalogs there's nothing except
access permissions stopping you from adding ordinary indexes now.
We don't support partial or functional indexes on system catalogs,
but the implementation reasons for that are unrelated to what I'm doing.

 It looks like we don't support that at the moment
 but as we see larger Postgres installations it seems likely we'll need
 to.  I don't really consider myself a very heavy Postgres user but I've
 got databases w/  30k entries in pg_class and near 300k in
 pg_attribute...

And are you seeing any performance issues related to lack of indexes?
For the system catalogs we understand the access patterns pretty well
(I think), and I thought we pretty much had the right indexes on them
already.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Relation locking and relcache load (was Re: Going for all green buildfarm results)

2006-07-31 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  * Tom Lane ([EMAIL PROTECTED]) wrote:
  I think the best solution for this might be to put the responsibility
  for creating system catalogs' toast tables into the bootstrap phase
  instead of making initdb do it afterwards.
 
  Would this make it much more difficult to support user-defined indexes
  on system catalogs?
 
 AFAICS the problems with that are orthogonal to this.  You'll never have
 user-defined (as in added after initdb) indexes on shared catalogs,
 because there is no way to update their pg_class descriptions in all
 databases at once.

Ok.

 For non-shared catalogs there's nothing except
 access permissions stopping you from adding ordinary indexes now.

I had thought this might be the case since I had some recollection of
indexes on catalogs either being speculated about or suggested on
-perform.  The error-message isn't entirely clear about this fact
though:

src/backend/catalog/index.c:495 (or so)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg(user-defined indexes on system catalog tables are not supported)));

 And are you seeing any performance issues related to lack of indexes?

Depends on the eye of the beholder to some extent I suppose.

 For the system catalogs we understand the access patterns pretty well
 (I think), and I thought we pretty much had the right indexes on them
 already.

The case that I was specifically thinking about was the relowner in
pg_class not being indexed.

tsf= explain analyze select cl.relname from pg_authid a join pg_class
cl on (a.oid = cl.relowner) where a.rolname = 'postgres';
   QUERY PLAN   


 Hash Join  (cost=2.54..1970.25 rows=383 width=64) (actual
time=0.113..77.950 rows=223 loops=1)
   Hash Cond: (outer.relowner = inner.oid)
   -  Seq Scan on pg_class cl  (cost=0.00..1881.59 rows=16459 width=68)
(actual time=0.036..46.607 rows=17436 loops=1)
   -  Hash  (cost=2.54..2.54 rows=1 width=4) (actual time=0.057..0.057
rows=1 loops=1)
 -  Seq Scan on pg_authid a  (cost=0.00..2.54 rows=1 width=4)
(actual time=0.047..0.050 rows=1 loops=1)
   Filter: (rolname = 'postgres'::name)
 Total runtime: 78.358 ms
(7 rows)

It's not exactly *slow* but an index might speed it up.  I was trying to
create one and couldn't figure out the right incantation to make it
happen.  'allow_system_table_mods = true' wasn't working in
postgresql.conf (it wouldn't start) for some reason...

Other system-catalog queries that I've been a little unhappy about the
performance of (though I don't know if indexes would help, so this is
really just me complaining) are: initial table list in ODBC w/ Access
(takes *forever* when you have alot of tables...); schema/table lists in
phppgadmin when there are alot of schemas/tables; information_schema
queries (try looking at information_schema.columns for a given table
when you've got alot of tables...  over 10x slower than looking at
pg_class/pg_attribute directly, 3 seconds vs. 200ms, or so).

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Relation locking and relcache load (was Re: Going for all green buildfarm results)

2006-07-30 Thread Tom Lane
I wrote:
 I bet Alvaro's spotted the problem.  ALTER INDEX RENAME doesn't seem to
 take any lock on the index's parent table, only on the index itself.
 That means that a query on onek could be trying to read the pg_class
 entries for onek's indexes concurrently with someone trying to commit
 a pg_class update to rename an index.  If the query manages to visit
 the new and old versions of the row in that order, and the commit
 happens between, *neither* of the versions would look valid.  MVCC
 doesn't save us because this is all SnapshotNow.

On further reflection, it seems this is a special case of the general
problem that we ought to lock a relation *before* we attempt to load
the relcache entry, not after.  We've seen previous complaints about
this, for instance a process erroring out with a message about
pg_trigger having a different number of entries than it expected because
someone had committed an ADD/DROP TRIGGER between it reading pg_class
and reading pg_trigger.

A long time ago (30 Oct 2002) I wrote:

 Thinking further, it's really kinda bogus that LockRelation() works on
 an already-opened Relation; if possible we should acquire the lock
 before attempting to create a relcache entry.  (We only need to know the
 OID and the relisshared status before we can make a locktag, so it'd be
 possible to acquire the lock using only the contents of the pg_class row.)
 Not sure how much code restructuring might be involved to make this
 happen, but it'd be worth thinking about for 7.4.

but that never got off the back burner.  The current example shows that
read the pg_class row and then lock doesn't work anyway, because we
might fail to find any version of the pg_class row that passes
SnapshotNow.

One thing that has improved since 7.3 is that we only do relcache loads
by OID, never by name.  This means the only thing stopping us from
taking lock before we invoke relcache is lack of knowledge about the
rel's relisshared status.  Given that the set of shared relations is
pretty small, and fixed in any given backend version, it wouldn't be
unreasonable to handle this by keeping a hardwired list of shared
relation OIDs somewhere in the backend.

Anyway I really think we need to move to a coding rule that says thou
shalt not access a relcache entry without *already* having some kind
of lock on the relation.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Relation locking and relcache load (was Re: Going for all green buildfarm results)

2006-07-30 Thread Tom Lane
I wrote:
 ... This means the only thing stopping us from
 taking lock before we invoke relcache is lack of knowledge about the
 rel's relisshared status.  Given that the set of shared relations is
 pretty small, and fixed in any given backend version, it wouldn't be
 unreasonable to handle this by keeping a hardwired list of shared
 relation OIDs somewhere in the backend.

On further investigation, there is one small stumbling block in that
plan.  We currently have hard-wired OIDs for shared catalogs and their
indexes ... but not for their toast tables (nor the indexes thereon).

I think the best solution for this might be to put the responsibility
for creating system catalogs' toast tables into the bootstrap phase
instead of making initdb do it afterwards.  This would be a Good Thing
anyway since currently we are incapable of dealing with bootstrap-time
insertions of values large enough to need toasting.  I'm imagining
adding macros to the include/catalog/*.h files along the lines of

TOAST_CATALOG(pg_class,4242,4243)

where the numbers are hand-assigned OIDs for the toast table and index.
The existing mechanisms for creating the .bki bootstrap script would
turn these into commands in the .bki script.

We could then get rid of ALTER TABLE ... CREATE TOAST TABLE as a SQL
command altogether, which seems like a good thing to me.  Anyone want to
argue for keeping it?  There really shouldn't be any case where a user
needs to invoke it.

Thoughts, objections?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match