Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Christopher Browne
On Thu, Jun 14, 2012 at 5:00 PM, Robert Haas  wrote:
> On Thu, Jun 14, 2012 at 4:51 PM, Andres Freund  wrote:
>> Lets sidetrack this till we have a tender agreement on how to handle DDL ;). 
>> I
>> am aware of the issues with rollbacks, truncate et al...
>
> Agreed; I will write up my thoughts about DDL on the other thread.  I
> think that's a key thing we need to figure out; once we understand how
> we're handling that, the correct design for this will probably fall
> out pretty naturally.

I wonder if *an* answer (if not forcibly a perfect one) is to provide
a capturable injection point.

A possible shape of this would be to have a function to which you pass
a DDL statement, at which point, it does two things:
a) Runs the DDL, to make the requested change, and
b) Captures the DDL in a convenient form in the WAL log so that it may
be detected and replayed at the right point in processing.

That's not a solution for capturing DDL automatically, but it's
something that would be useful-ish even today, for systems like Slony
and Londiste, and this would be natural to connect to Dimitri's "Event
Triggers."

That also fits with the desire to have components that are (at least
hopefully) usable to other existing replication systems.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Robert Haas
On Thu, Jun 14, 2012 at 4:51 PM, Andres Freund  wrote:
> Lets sidetrack this till we have a tender agreement on how to handle DDL ;). I
> am aware of the issues with rollbacks, truncate et al...

Agreed; I will write up my thoughts about DDL on the other thread.  I
think that's a key thing we need to figure out; once we understand how
we're handling that, the correct design for this will probably fall
out pretty naturally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Andres Freund
On Thursday, June 14, 2012 08:50:51 PM Robert Haas wrote:
> On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund  
wrote:
> > This patch is problematic because formally indexes used by syscaches
> > needs to be unique, this one is not though because of 0/InvalidOids
> > entries for nailed/shared catalog entries. Those values aren't allowed
> > to be queried though.
> That's not the only reason it's not unique.  Take a look at
> GetRelFileNode().  We really only guarantee that  tablespace OID, relfilenode, backend-ID>, taken as a four-tuple, is
> unique.  You could have the same relfilenode in different tablespaces,
> or even within the same tablespace with different backend-IDs.  The
> latter might not matter for you because you're presumably disregarding
> temp tables, but the former probably does.  It's an uncommon scenario
> because we normally set relid = relfilenode, and of course relid is
> unique across the database, but the table gets rewritten then you end
> up with relid != relfilenode, and I don't think there's anything at
> that point that will prevent the new relfilenode from being chosen as
> some other relations relfilenode, as long as it's in a different
> tablespace.
> 
> I think the solution may be to create a specialized cache for this,
> rather than relying on the general syscache infrastructure.  You might
> look at, e.g., attoptcache.c for an example.  That would allow you to
> build a cache that is aware of things like the relmapper
> infrastructure, and the fact that temp tables are ignorable for your
> purposes.  But I think you will need to include at least the
> tablespace OID in the key along with the relfilenode to make it
> bullet-proof.
Yes, the tablespace OID should definitely be in there. Need to read up on the 
details of an own cache. Once more I didn't want to put in more work before 
discussing it here.

> I haven't read through the patch series far enough to know what this
> is being used for yet, but my fear is that you're using it to handle
> mapping a relflenode extracted from the WAL stream back to a relation
> OID.  The problem with that is that relfilenode assignments obey
> transaction semantics.  So, if someone begins a transaction, truncates
> a table, inserts a tuple, and commits, the heap_insert record is going
> to refer to a relfilenode that, according to the system catalogs,
> doesn't exist.  This is similar to one of the worries in my other
> email, so I won't belabor the point too much more here...
Well, yes. We *need* to do the mapping back from the relfilenode to a table. 
The idea is that by the fact that the receiving side, be it a full cluster or 
just a catalog one, has a fully synchronized catalog in which DDL gets applied 
correctly, inside the transaction just as in the sending side, it should never 
be wrong to do that mapping.
It probably is necessary to make the syscache lookup/infrastructure to use an 
MVCCish snapshot though. No idea how hard that would be yet. Might be a good 
argument for your argument of using a specialized cache.

Lets sidetrack this till we have a tender agreement on how to handle DDL ;). I 
am aware of the issues with rollbacks, truncate et al...

Thanks,

Andres

-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Robert Haas
On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund  wrote:
> This patch is problematic because formally indexes used by syscaches needs to
> be unique, this one is not though because of 0/InvalidOids entries for
> nailed/shared catalog entries. Those values aren't allowed to be queried 
> though.

That's not the only reason it's not unique.  Take a look at
GetRelFileNode().  We really only guarantee that , taken as a four-tuple, is
unique.  You could have the same relfilenode in different tablespaces,
or even within the same tablespace with different backend-IDs.  The
latter might not matter for you because you're presumably disregarding
temp tables, but the former probably does.  It's an uncommon scenario
because we normally set relid = relfilenode, and of course relid is
unique across the database, but the table gets rewritten then you end
up with relid != relfilenode, and I don't think there's anything at
that point that will prevent the new relfilenode from being chosen as
some other relations relfilenode, as long as it's in a different
tablespace.

I think the solution may be to create a specialized cache for this,
rather than relying on the general syscache infrastructure.  You might
look at, e.g., attoptcache.c for an example.  That would allow you to
build a cache that is aware of things like the relmapper
infrastructure, and the fact that temp tables are ignorable for your
purposes.  But I think you will need to include at least the
tablespace OID in the key along with the relfilenode to make it
bullet-proof.

I haven't read through the patch series far enough to know what this
is being used for yet, but my fear is that you're using it to handle
mapping a relflenode extracted from the WAL stream back to a relation
OID.  The problem with that is that relfilenode assignments obey
transaction semantics.  So, if someone begins a transaction, truncates
a table, inserts a tuple, and commits, the heap_insert record is going
to refer to a relfilenode that, according to the system catalogs,
doesn't exist.  This is similar to one of the worries in my other
email, so I won't belabor the point too much more here...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers