Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Robert Haas
On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com wrote:
 This patch is problematic because formally indexes used by syscaches needs to
 be unique, this one is not though because of 0/InvalidOids entries for
 nailed/shared catalog entries. Those values aren't allowed to be queried 
 though.

That's not the only reason it's not unique.  Take a look at
GetRelFileNode().  We really only guarantee that database OID,
tablespace OID, relfilenode, backend-ID, taken as a four-tuple, is
unique.  You could have the same relfilenode in different tablespaces,
or even within the same tablespace with different backend-IDs.  The
latter might not matter for you because you're presumably disregarding
temp tables, but the former probably does.  It's an uncommon scenario
because we normally set relid = relfilenode, and of course relid is
unique across the database, but the table gets rewritten then you end
up with relid != relfilenode, and I don't think there's anything at
that point that will prevent the new relfilenode from being chosen as
some other relations relfilenode, as long as it's in a different
tablespace.

I think the solution may be to create a specialized cache for this,
rather than relying on the general syscache infrastructure.  You might
look at, e.g., attoptcache.c for an example.  That would allow you to
build a cache that is aware of things like the relmapper
infrastructure, and the fact that temp tables are ignorable for your
purposes.  But I think you will need to include at least the
tablespace OID in the key along with the relfilenode to make it
bullet-proof.

I haven't read through the patch series far enough to know what this
is being used for yet, but my fear is that you're using it to handle
mapping a relflenode extracted from the WAL stream back to a relation
OID.  The problem with that is that relfilenode assignments obey
transaction semantics.  So, if someone begins a transaction, truncates
a table, inserts a tuple, and commits, the heap_insert record is going
to refer to a relfilenode that, according to the system catalogs,
doesn't exist.  This is similar to one of the worries in my other
email, so I won't belabor the point too much more here...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Andres Freund
On Thursday, June 14, 2012 08:50:51 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  This patch is problematic because formally indexes used by syscaches
  needs to be unique, this one is not though because of 0/InvalidOids
  entries for nailed/shared catalog entries. Those values aren't allowed
  to be queried though.
 That's not the only reason it's not unique.  Take a look at
 GetRelFileNode().  We really only guarantee that database OID,
 tablespace OID, relfilenode, backend-ID, taken as a four-tuple, is
 unique.  You could have the same relfilenode in different tablespaces,
 or even within the same tablespace with different backend-IDs.  The
 latter might not matter for you because you're presumably disregarding
 temp tables, but the former probably does.  It's an uncommon scenario
 because we normally set relid = relfilenode, and of course relid is
 unique across the database, but the table gets rewritten then you end
 up with relid != relfilenode, and I don't think there's anything at
 that point that will prevent the new relfilenode from being chosen as
 some other relations relfilenode, as long as it's in a different
 tablespace.
 
 I think the solution may be to create a specialized cache for this,
 rather than relying on the general syscache infrastructure.  You might
 look at, e.g., attoptcache.c for an example.  That would allow you to
 build a cache that is aware of things like the relmapper
 infrastructure, and the fact that temp tables are ignorable for your
 purposes.  But I think you will need to include at least the
 tablespace OID in the key along with the relfilenode to make it
 bullet-proof.
Yes, the tablespace OID should definitely be in there. Need to read up on the 
details of an own cache. Once more I didn't want to put in more work before 
discussing it here.

 I haven't read through the patch series far enough to know what this
 is being used for yet, but my fear is that you're using it to handle
 mapping a relflenode extracted from the WAL stream back to a relation
 OID.  The problem with that is that relfilenode assignments obey
 transaction semantics.  So, if someone begins a transaction, truncates
 a table, inserts a tuple, and commits, the heap_insert record is going
 to refer to a relfilenode that, according to the system catalogs,
 doesn't exist.  This is similar to one of the worries in my other
 email, so I won't belabor the point too much more here...
Well, yes. We *need* to do the mapping back from the relfilenode to a table. 
The idea is that by the fact that the receiving side, be it a full cluster or 
just a catalog one, has a fully synchronized catalog in which DDL gets applied 
correctly, inside the transaction just as in the sending side, it should never 
be wrong to do that mapping.
It probably is necessary to make the syscache lookup/infrastructure to use an 
MVCCish snapshot though. No idea how hard that would be yet. Might be a good 
argument for your argument of using a specialized cache.

Lets sidetrack this till we have a tender agreement on how to handle DDL ;). I 
am aware of the issues with rollbacks, truncate et al...

Thanks,

Andres

-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Robert Haas
On Thu, Jun 14, 2012 at 4:51 PM, Andres Freund and...@2ndquadrant.com wrote:
 Lets sidetrack this till we have a tender agreement on how to handle DDL ;). I
 am aware of the issues with rollbacks, truncate et al...

Agreed; I will write up my thoughts about DDL on the other thread.  I
think that's a key thing we need to figure out; once we understand how
we're handling that, the correct design for this will probably fall
out pretty naturally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Christopher Browne
On Thu, Jun 14, 2012 at 5:00 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 14, 2012 at 4:51 PM, Andres Freund and...@2ndquadrant.com wrote:
 Lets sidetrack this till we have a tender agreement on how to handle DDL ;). 
 I
 am aware of the issues with rollbacks, truncate et al...

 Agreed; I will write up my thoughts about DDL on the other thread.  I
 think that's a key thing we need to figure out; once we understand how
 we're handling that, the correct design for this will probably fall
 out pretty naturally.

I wonder if *an* answer (if not forcibly a perfect one) is to provide
a capturable injection point.

A possible shape of this would be to have a function to which you pass
a DDL statement, at which point, it does two things:
a) Runs the DDL, to make the requested change, and
b) Captures the DDL in a convenient form in the WAL log so that it may
be detected and replayed at the right point in processing.

That's not a solution for capturing DDL automatically, but it's
something that would be useful-ish even today, for systems like Slony
and Londiste, and this would be natural to connect to Dimitri's Event
Triggers.

That also fits with the desire to have components that are (at least
hopefully) usable to other existing replication systems.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

This patch is problematic because formally indexes used by syscaches needs to
be unique, this one is not though because of 0/InvalidOids entries for
nailed/shared catalog entries. Those values aren't allowed to be queried though.

It might be nicer to add infrastructure to do this properly, I just don't have
a clue what the best way for this would be.
---
 src/backend/utils/cache/syscache.c |   11 +++
 src/include/catalog/indexing.h |2 ++
 src/include/utils/syscache.h   |1 +
 3 files changed, 14 insertions(+)

diff --git a/src/backend/utils/cache/syscache.c 
b/src/backend/utils/cache/syscache.c
index c365ec7..9cfb013 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -588,6 +588,17 @@ static const struct cachedesc cacheinfo[] = {
},
1024
},
+   {RelationRelationId,/* RELFILENODE */
+   ClassRelfilenodeIndexId,
+   1,
+   {
+   Anum_pg_class_relfilenode,
+   0,
+   0,
+   0
+   },
+   1024
+   },
{RewriteRelationId, /* RULERELNAME */
RewriteRelRulenameIndexId,
2,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 450ec25..5c9419b 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -106,6 +106,8 @@ DECLARE_UNIQUE_INDEX(pg_class_oid_index, 2662, on pg_class 
using btree(oid oid_o
 #define ClassOidIndexId  2662
 DECLARE_UNIQUE_INDEX(pg_class_relname_nsp_index, 2663, on pg_class using 
btree(relname name_ops, relnamespace oid_ops));
 #define ClassNameNspIndexId  2663
+DECLARE_INDEX(pg_class_relfilenode_index, 2844, on pg_class using 
btree(relfilenode oid_ops));
+#define ClassRelfilenodeIndexId  2844
 
 DECLARE_UNIQUE_INDEX(pg_collation_name_enc_nsp_index, 3164, on pg_collation 
using btree(collname name_ops, collencoding int4_ops, collnamespace oid_ops));
 #define CollationNameEncNspIndexId 3164
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index d59dd4e..63a5042 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -73,6 +73,7 @@ enum SysCacheIdentifier
RANGETYPE,
RELNAMENSP,
RELOID,
+   RELFILENODE,
RULERELNAME,
STATRELATTINH,
TABLESPACEOID,
-- 
1.7.10.rc3.3.g19a6c.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers