Re: [HACKERS] xml_is_document and selective pg_re_throw

2012-06-13 Thread Nikhil Sontakke
 No, I don't see any particular risk there.  The places that might throw
 ERRCODE_INVALID_XML_DOCUMENT are sufficiently few (as in, exactly one,
 in this usage) that we can have reasonable confidence we know what the
 system state is when we catch that error.


Hmmm, I was writing some code in which I happened to hold a LWLock when
this function was called. The first catch/rethrow cleaned up the
InterruptHoldoffCount value. A subsequent release of that LWLock tripped up
the (Assert(InterruptHoldoffCount  0);) inside RESUME_INTERRUPTS().

I know holding an lwlock like this might not be a good idea, but this
behavior just got me thinking about other probable issues.

Regards,
Nikhils


  A better way would have been to modify xml_parse to take an additional
  boolean argument to_rethrow and not to rethrow if that is false?

 We could do that, but it would greatly complicate xml_parse IMO, since
 it still needs its own PG_TRY block to handle other error cases, and
 only one of those error cases ought to optionally return failure instead
 of re-throwing.

regards, tom lane



Re: [HACKERS] hint bit i/o reduction

2012-06-13 Thread Amit Kapila
 1) Keep track # times the last transaction id was repeatedly seen in
 tqual.c (resetting as soon as a new xid was touched.  We can do this
 just for xmin, or separately for both xmin and xmax.

Will this be done when we see a new xid duricg scan of tuple, if yes then
Won't this logic impact incase there are very few records per transaction in
database.
As in that case the new optimization won't help much, but it adds the new
instructions.

 We can also try to be smart and disable
 the 'avoid setting the hint bit' logic if the page is already dirty.

Whats the benefit for not setting hint bit for a already dirty page. 

-Original Message-
From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Merlin Moncure
Sent: Wednesday, June 13, 2012 4:28 AM
To: PostgreSQL-development
Subject: [HACKERS] hint bit i/o reduction

hackers,

A while back I made an effort implementing a backend local hint bit
cache with the intention of mitigating i/o impact for scan heavy
workloads that involve moving a lot of records around per transaction.
 The basic concept was to keep some backend private memory that
tracked the resolution of recently seen transactions and worked in a
similar fashion to the hint bits:

if (!commit_hint_bit_set  hint_bit_cache(xid) == committed)
{

The other interesting aspect was that, if the bit was found in the
cache, the bit was set on the tuple but the page was not dirtied.  If
the xid was not found in the cache, the bit was set and the page was
dirtied normally.  From a strictly performance standpoint, limited
testing seemed to indicate it more or less fixed the hint bit i/o
problems.  Still, injecting extra logic (especially a cache lookup)
into the visibility routines is not to be taken lightly, and maybe
there's a better way.  There simply has to be a way to ameliorate hint
bit i/o without spending a lot of cycles and hopefully not too much
impacting other workloads.  Unfortunately, assuming the CRC patch
makes it in, any of the branch of tricks in the line of 'set the bit
but avoid dirtying the page' aren't going to fly.  I think, at the end
of the day, we have to avoid setting the bit, but only in cases where
we are fairly sure we would prefer not to do that.

I'm thinking something fairly simple would get some good mileage:

1) Keep track # times the last transaction id was repeatedly seen in
tqual.c (resetting as soon as a new xid was touched.  We can do this
just for xmin, or separately for both xmin and xmax.
2) right after checking the hint bit (assuming it wasn't set), test to
see if our xid is the 'last xid', and iff #times_seen_same 
#some_number (say, 100 or 1000), use it as if the hint bit was set.
#some_number can be chosen to some fairly conservative defined value,
or perhaps we can be sneaky and try and adjust it based on things like
how many unforced clog faults we're doing -- maybe even keeping some
accounting at the page level.  We can also try to be smart and disable
the 'avoid setting the hint bit' logic if the page is already dirty.

The presumption here is that if you're seeing the same xid over and
over, there simply isn't a lot of value in recording it in page after
page after page.  It's tempting to suggest that there is already an
'is this xid the same as the last one AKA last_transaction_id' at the
clog visibility checking level, but that doesn't help for the
following reasons:

1) while it does avoid the jump out to clog, it occurs at the wrong
place in the sequence of visibility checking (after you've wasted
cycles in TransactionIdIsInProgress() etc) and
2) by being in the clog level can't influence visibility behaviors of
whether or not to set the bit.
3) it's not inline

I see two potential downsides here:
1) maybe the extra logic in visibility is too expensive (but i doubt it)
2) you're missing early opportunities to set the all visible bit which
in turn will influence index only scans.

The above might be a small price to pay for saving all the i/o and
sooner or later if the records are long lived vacuum will swing around
and tag them.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimising windows installer password confusion

2012-06-13 Thread Dave Page
On Wed, Jun 13, 2012 at 2:12 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:

 Users don't remember passwords, though. It's one of those constants, and is
 why practically every web site etc out there offers password recovery.

 The installer IMO needs to store the postgres account password in a registry
 key with permissions set so that only users with local admin rights (ie: who
 can use the installer) can view it. I don't like the idea of storing a
 password, but it's only going to be accessible if you already have rights to
 the registry as local admin, in which case the attacker can just reset it
 themselves (or root your machine). So long as they installer warns that the
 password shouldn't be one you use elsewhere because it can be recovered from
 your computer, I don't see a problem.---

The idea of storing the password in clear text in the registry gives
me nervous twitches. Whilst is should be secure if done as you
suggest, a) a simple mistake could leave it vulnerable and give us an
embarrassing security issue to deal with. It also doesn't help us in
the cases where users have another installation of PostgreSQL from
somewhere that doesn't store the password (which is likely to be the
case for years to come, even if it was our installer that was used
previously).

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimising windows installer password confusion

2012-06-13 Thread Dave Page
On Wed, Jun 13, 2012 at 2:18 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 On 06/12/2012 08:08 PM, Dave Page wrote:

 Some background: By default the installer will use 'postgres' for both
 the service (OS) account, and the database superuser account. It will
 use the same password for both (though, users have complete control at
 the command line if they want it, which is why the account names are
 shown in the installer). Unlike *nix installations, the service
 account must have a password, which is required during both
 installation and upgrade to configure the PostgreSQL service. We
 therefore ask for the password during both installation and upgrade.
 If the user account exists (which can happen if there has previously
 been an installation of Postgres on the system), the user must specify
 the existing password for the account during installation (and of
 course, during all upgrades). This is where users normally get stuck,
 as they've forgotten the password they set in the past.

 They may not even have set it, because the EnterpriseDB installer can be run
 unattended and may've been used by some 3rd party package.

 I'd be interested in seeing what Microsoft installers that create isolated
 user accounts do. I think .NET creates one for ASP, so that'd be one option
 to look into.

They tend to use the localsystem or networkservice accounts for most
things, which don't have passwords. The reason we don't do that is
that since the early days of 8.0 we've said we want to enable users to
use the service account as if it were a regular account, so that they
can do things like access network resources (useful for server-side
copy for example).

I wasn't overly convinced back then that that was necessary, and I'm
still not now. I suppose we potentially could use the networkservice
account by default, and let advanced users override that on the
command line if they need to. Then remembering the password becomes
their responsibility.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] errdetail_busy_db plural forms

2012-06-13 Thread Peter Eisentraut
I was looking for missing use of gettext plural forms, which led me to
errdetail_busy_db().

While we can't do much about this:

errdetail(There are %d other session(s) and %d prepared transaction(s) 
using the database.,
  notherbackends, npreparedxacts);

I think it's still worth pluralizing the other cases

errdetail(There are %d other session(s) using the database.,
  notherbackends);

and

errdetail(There are %d prepared transaction(s) using the database.,
  npreparedxacts);

Especially the other sessions case is probably the one most seen by users.

So I propose the attached patch.

diff --git i/src/backend/commands/dbcommands.c w/src/backend/commands/dbcommands.c
index b7224bd..c9b80ad 100644
--- i/src/backend/commands/dbcommands.c
+++ w/src/backend/commands/dbcommands.c
@@ -1804,20 +1804,21 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 static int
 errdetail_busy_db(int notherbackends, int npreparedxacts)
 {
-	/*
-	 * We don't worry about singular versus plural here, since the English
-	 * rules for that don't translate very well.  But we can at least avoid
-	 * the case of zero items.
-	 */
 	if (notherbackends  0  npreparedxacts  0)
+		/* We don't deal with singular versus plural here, since gettext
+		 * doesn't support multiple plurals in one string. */
 		errdetail(There are %d other session(s) and %d prepared transaction(s) using the database.,
   notherbackends, npreparedxacts);
 	else if (notherbackends  0)
-		errdetail(There are %d other session(s) using the database.,
-  notherbackends);
+		errdetail_plural(There is %d other session using the database.,
+		 There are %d other sessions using the database.,
+		 notherbackends,
+		 notherbackends);
 	else
-		errdetail(There are %d prepared transaction(s) using the database.,
-  npreparedxacts);
+		errdetail_plural(There is %d prepared transaction using the database.,
+		 There are %d prepared transactions using the database.,
+		 npreparedxacts,
+		 npreparedxacts);
 	return 0;	/* just to keep ereport macro happy */
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimising windows installer password confusion

2012-06-13 Thread Dave Page
On Wed, Jun 13, 2012 at 3:07 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 On 06/13/2012 01:19 AM, Sachin Srivastava wrote:


 On Tue, Jun 12, 2012 at 7:43 PM, Dave Page dp...@pgadmin.org
 mailto:dp...@pgadmin.org wrote:

    On Tue, Jun 12, 2012 at 2:57 PM, Robert Haas robertmh...@gmail.com
    mailto:robertmh...@gmail.com wrote:


      What we need is to display a different dialogue based on the
    situation.
     
      If the account already exists, we should say Please enter the
      password for the existing postgres account.  If you do not know the
      password, you can reset it using the Windows control panel.


 Why using the windows control panel ?

Because when I wrote the email I was looking for a simple solution
that wouldn't require writing code that has potential to fail
depending on how the users environment is configured (the user account
stuff tends to go wrong in weird ways, for example when used on
domains in unusual (or high security) configurations. We're spending a
lot of effort at the moment getting the 9.2 buildfarm together, and
updating all the StackBuilder add-on packages (think multiple
man-months) - I'm trying not to add to that too much.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimising windows installer password confusion

2012-06-13 Thread Florian Pflug
On Jun13, 2012, at 11:10 , Dave Page wrote:
 On Wed, Jun 13, 2012 at 2:12 AM, Craig Ringer
 cr...@postnewspapers.com.au wrote:
 
 Users don't remember passwords, though. It's one of those constants, and is
 why practically every web site etc out there offers password recovery.
 
 The installer IMO needs to store the postgres account password in a registry
 key with permissions set so that only users with local admin rights (ie: who
 can use the installer) can view it. I don't like the idea of storing a
 password, but it's only going to be accessible if you already have rights to
 the registry as local admin, in which case the attacker can just reset it
 themselves (or root your machine). So long as they installer warns that the
 password shouldn't be one you use elsewhere because it can be recovered from
 your computer, I don't see a problem.---
 
 The idea of storing the password in clear text in the registry gives
 me nervous twitches. Whilst is should be secure if done as you
 suggest, a) a simple mistake could leave it vulnerable and give us an
 embarrassing security issue to deal with. It also doesn't help us in
 the cases where users have another installation of PostgreSQL from
 somewhere that doesn't store the password (which is likely to be the
 case for years to come, even if it was our installer that was used
 previously).

Hm, doesn't the registry already contain the postgres service account's
password? AFAIR, on windows you cannot really impersonate an account without
knowing it's password, which is the reason why a) the password of a user
account is stored in the registry if you enable auto-logon and b) you need
to know the service account's password to create a service.

Some googling brought up a tool called isvcpwd[1] which seems to be able to
change service account passwords without breaking services. Judging from a brief
glance over the source code, it does so by iterating over all services
domain-wide, and adjusting the service definition of those which rely on the
modified account(s). So that seems to support the theory that the passwords
are stored in the individual machine's registries.

Some further googling indicates that, yes, the service account passwords
are stored in the registry, but are only accessible to the LocalSystem
account [2]. Querying them from the postgres installer thus isn't really an
option. But what you could do, I guess, is to offer the user the ability to
change the password, and using the approach from [1] to update the affected
service definitions afterwards.

best regards,
Florian Pflug

[1] https://www.itefix.no/i2/isvcpwd
[2] 
http://www.windowsnetworking.com/kbase/windowstips/windowsnt/registrytips/miscellaneous/LSASecrets.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-06-13 Thread Peter Eisentraut
On tis, 2012-06-12 at 21:09 -0700, Jeff Davis wrote:
 On Sun, 2012-03-25 at 19:59 -0700, Jeff Davis wrote:
  On Sat, 2012-03-17 at 17:48 +0100, Cédric Villemain wrote:
   I agree with Andres.
   
   
   I believe we should use sync_file_range (_before?) with linux.
   
   And we can use posix_fadvise_dontneed on other kernels.
   
  OK, updated patch attached. sync_file_range() is preferred,
  posix_fadvise() is a fallback.
  
 
 Rebased patch attached. No other changes.

The --help output for the -N option was copy-and-pasted wrongly.

The message issued when using -N is also a bit content-free.  Maybe
something like

Running in nosync mode.  The data directory might become corrupt if the
operating system crashes.\n

Which leads to the question, how does one get out of this state?  Is
running sync(1) enough?  Is starting the postgres server enough?

There are no updates to the initdb man page included in your patch,
which would be a suitable place to discuss this briefly.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-13 Thread Andres Freund
Hi everyone,

This mail contains the highlevel design description of how our prototype of
in-core logical replication works. The individual patches will be posted as
replies to this email. I obviously welcome all sorts of productive comments to
both the individual patches and the architecture.

Unless somebody objects I will add most of the individual marked as RFC to the
current commitfest. I hope that with comments stemming from that round we can
get several of the patches into the first or second commitfest. As soon as the
design is clear/accepted we will try very hard to get the following patches
into the second or third round.

If anybody disaggrees with the procedual way we try do this, please raise a hand
now.

I tried to find the right balance between keeping the description short enough
that anybody will read the design docs and verbose enough that it is
understandable. I can go into much more detail in any part if wanted.

Please, keep in mind that those patches are *RFC* and a prototype and not
intended to be applied as-is. There is a short description of the individual
patches and their relevancy at the end of the email.

Greetings,

Andres



=== Design goals for logical replication === :
- in core
- fast
- async
- robust
- multi-master
- modular
- as unintrusive as possible implementation wise
- basis for other technologies (sharding, replication into other DBMSs, ...)

For reasons why we think this is an important set of features please check out
the presentation from the in-core replication summit at pgcon:
http://wiki.postgresql.org/wiki/File:BDR_Presentation_PGCon2012.pdf

While you may argue that most of the above design goals are already provided by
various trigger based replication solutions like Londiste or Slony, we think
that thats not enough for various reasons:

- not in core (and thus less trustworthy)
- duplication of writes due to an additional log
- performance in general (check the end of the above presentation)
- complex to use because there is no native administration interface

We want to emphasize that this proposed architecture is based on the experience
of developing a minimal prototype which we developed with the above goals in
mind. While we obviously hope that a good part of it is reusable for the
community we definitely do *not* expect that the community accepts this
+as-is. It is intended to be the basis upon which we, the community, can build
and design the future logical replication.

=== Basic architecture === :
Very broadly speaking there are several major pieces common to most approaches
to replication:
1. Source data generation
2. Transportation of that data
3. Applying the changes
4. Conflict resolution


1.:

As we need a change stream that contains all required changes in the correct
order, the requirement for this stream to reflect changes across multiple
concurrent backends raises concurrency and scalability issues. Reusing the
WAL stream for this seems a good choice since it is needed anyway and adresses
those issues already, and it further means that we don't incur duplicate
writes. Any other stream generating componenent would introduce additional
scalability issues.

We need a change stream that contains all required changes in the correct order
which thus needs to be synchronized across concurrent backends which introduces
obvious concurrency/scalability issues.
Reusing the WAL stream for this seems a good choice since it is needed anyway
and adresses those issues already, and it further means we don't duplicate the
writes and locks already performance for its maintenance.

Unfortunately, in this case, the WAL is mostly a physical representation of the
changes and thus does not, by itself, contain the necessary information in a
convenient format to create logical changesets.

The biggest problem is, that interpreting tuples in the WAL stream requires an
up-to-date system catalog and needs to be done in a compatible backend and
architecture. The requirement of an up-to-date catalog could be solved by
adding more data to the WAL stream but it seems to be likely that that would
require relatively intrusive  complex changes. Instead we chose to require a
synchronized catalog at the decoding site. That adds some complexity to use
cases like replicating into a different database or cross-version
replication. For those it is relatively straight-forward to develop a proxy pg
instance that only contains the catalog and does the transformation to textual
changes.

This also is the solution to the other big problem, the need to work around
architecture/version specific binary formats. The alternative, producing
cross-version, cross-architecture compatible binary changes or even moreso
textual changes all the time seems to be prohibitively expensive. Both from a
cpu and a storage POV and also from the point of implementation effort.

The catalog on the site where changes originate can *not* be used for the
decoding because at the time we decode the WAL 

[HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

This patch is problematic because formally indexes used by syscaches needs to
be unique, this one is not though because of 0/InvalidOids entries for
nailed/shared catalog entries. Those values aren't allowed to be queried though.

It might be nicer to add infrastructure to do this properly, I just don't have
a clue what the best way for this would be.
---
 src/backend/utils/cache/syscache.c |   11 +++
 src/include/catalog/indexing.h |2 ++
 src/include/utils/syscache.h   |1 +
 3 files changed, 14 insertions(+)

diff --git a/src/backend/utils/cache/syscache.c 
b/src/backend/utils/cache/syscache.c
index c365ec7..9cfb013 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -588,6 +588,17 @@ static const struct cachedesc cacheinfo[] = {
},
1024
},
+   {RelationRelationId,/* RELFILENODE */
+   ClassRelfilenodeIndexId,
+   1,
+   {
+   Anum_pg_class_relfilenode,
+   0,
+   0,
+   0
+   },
+   1024
+   },
{RewriteRelationId, /* RULERELNAME */
RewriteRelRulenameIndexId,
2,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 450ec25..5c9419b 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -106,6 +106,8 @@ DECLARE_UNIQUE_INDEX(pg_class_oid_index, 2662, on pg_class 
using btree(oid oid_o
 #define ClassOidIndexId  2662
 DECLARE_UNIQUE_INDEX(pg_class_relname_nsp_index, 2663, on pg_class using 
btree(relname name_ops, relnamespace oid_ops));
 #define ClassNameNspIndexId  2663
+DECLARE_INDEX(pg_class_relfilenode_index, 2844, on pg_class using 
btree(relfilenode oid_ops));
+#define ClassRelfilenodeIndexId  2844
 
 DECLARE_UNIQUE_INDEX(pg_collation_name_enc_nsp_index, 3164, on pg_collation 
using btree(collname name_ops, collencoding int4_ops, collnamespace oid_ops));
 #define CollationNameEncNspIndexId 3164
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index d59dd4e..63a5042 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -73,6 +73,7 @@ enum SysCacheIdentifier
RANGETYPE,
RELNAMENSP,
RELOID,
+   RELFILENODE,
RULERELNAME,
STATRELATTINH,
TABLESPACEOID,
-- 
1.7.10.rc3.3.g19a6c.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

Adds a single and a double linked list which can easily embedded into other
datastructures and can be used without any additional allocations.

Problematic: It requires USE_INLINE to be used. It could be remade to fallback
to to externally defined functions if that is not available but that hardly
seems sensibly at this day and age. Besides, the speed hit would be noticeable
and its only used in new code which could be disabled on machines - given they
still exists - without proper support for inline functions
---
 src/include/utils/ilist.h |  253 +
 1 file changed, 253 insertions(+)
 create mode 100644 src/include/utils/ilist.h

diff --git a/src/include/utils/ilist.h b/src/include/utils/ilist.h
new file mode 100644
index 000..81d540e
--- /dev/null
+++ b/src/include/utils/ilist.h
@@ -0,0 +1,253 @@
+#ifndef ILIST_H
+#define ILIST_H
+
+#ifdef __GNUC__
+#define unused_attr __attribute__((unused))
+#else
+#define unused_attr
+#endif
+
+#ifndef USE_INLINE
+#error a compiler supporting static inlines is required
+#endif
+
+#include assert.h
+
+typedef struct ilist_d_node ilist_d_node;
+
+struct ilist_d_node
+{
+   ilist_d_node* prev;
+   ilist_d_node* next;
+};
+
+typedef struct
+{
+   ilist_d_node head;
+} ilist_d_head;
+
+typedef struct ilist_s_node ilist_s_node;
+
+struct ilist_s_node
+{
+   ilist_s_node* next;
+};
+
+typedef struct
+{
+   ilist_s_node head;
+} ilist_s_head;
+
+#ifdef ILIST_DEBUG
+void ilist_d_check(ilist_d_head* head);
+#else
+static inline void ilist_d_check(ilist_d_head* head)
+{
+}
+#endif
+
+static inline void ilist_d_init(ilist_d_head *head)
+{
+   head-head.next = head-head.prev = head-head;
+   ilist_d_check(head);
+}
+
+/*
+ * adds a node at the beginning of the list
+ */
+static inline void ilist_d_push_front(ilist_d_head *head, ilist_d_node *node)
+{
+   node-next = head-head.next;
+   node-prev = head-head;
+   node-next-prev = node;
+   head-head.next = node;
+   ilist_d_check(head);
+}
+
+
+/*
+ * adds a node at the end of the list
+ */
+static inline void ilist_d_push_back(ilist_d_head *head, ilist_d_node *node)
+{
+   node-next = head-head;
+   node-prev = head-head.prev;
+   node-prev-next = node;
+   head-head.prev = node;
+   ilist_d_check(head);
+}
+
+
+/*
+ * adds a node after another *in the same list*
+ */
+static inline void ilist_d_add_after(unused_attr ilist_d_head *head, 
ilist_d_node *after, ilist_d_node *node)
+{
+   node-prev = after;
+   node-next = after-next;
+   after-next = node;
+   node-next-prev = node;
+   ilist_d_check(head);
+}
+
+/*
+ * adds a node after another *in the same list*
+ */
+static inline void ilist_d_add_before(unused_attr ilist_d_head *head, 
ilist_d_node *before, ilist_d_node *node)
+{
+   node-prev = before-prev;
+   node-next = before;
+   before-prev = node;
+   node-prev-next = node;
+   ilist_d_check(head);
+}
+
+
+/*
+ * removes a node from a list
+ */
+static inline void ilist_d_remove(unused_attr ilist_d_head *head, ilist_d_node 
*node)
+{
+   ilist_d_check(head);
+   node-prev-next = node-next;
+   node-next-prev = node-prev;
+   ilist_d_check(head);
+}
+
+/*
+ * removes the first node from a list or returns NULL
+ */
+static inline ilist_d_node* ilist_d_pop_front(ilist_d_head *head)
+{
+   ilist_d_node* ret;
+
+   if (head-head == head-head.next)
+   return NULL;
+
+   ret = head-head.next;
+   ilist_d_remove(head, head-head.next);
+   return ret;
+}
+
+
+static inline bool ilist_d_has_next(ilist_d_head *head, ilist_d_node *node)
+{
+   return node-next != head-head;
+}
+
+static inline bool ilist_d_has_prev(ilist_d_head *head, ilist_d_node *node)
+{
+   return node-prev != head-head;
+}
+
+static inline bool ilist_d_is_empty(ilist_d_head *head)
+{
+   return head-head.next == head-head.prev;
+}
+
+#define ilist_d_front(type, membername, ptr) (((ptr)-head) == 
(ptr)-head.next) ? \
+   NULL : ilist_container(type, membername, (ptr)-head.next)
+
+#define ilist_d_front_unchecked(type, membername, ptr) ilist_container(type, 
membername, (ptr)-head.next)
+
+#define ilist_d_back(type, membername, ptr)  (((ptr)-head) == 
(ptr)-head.prev) ? \
+   NULL : ilist_container(type, membername, (ptr)-head.prev)
+
+#define ilist_container(type, membername, ptr) ((type*)((char*)(ptr) - 
offsetof(type, membername)))
+
+#define ilist_d_foreach(name, ptr) for(name = (ptr)-head.next;\
+ name != (ptr)-head; \
+ name = name-next)
+
+#define ilist_d_foreach_modify(name, nxt, ptr) for(name = (ptr)-head.next,
\
+  nxt = name-next;
   \
+   name != (ptr)-head
\
+ 

[HACKERS] [PATCH 02/16] Add zeroRecPtr as a shortcut for initializing a local variable to {0, 0}

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

This is locally defined in lots of places and would get introduced frequently
in the next commits. It is expected that this can be defined in a header-only
manner as soon as the XLogInsert scalability groundwork from Heikki gets in.
---
 src/backend/access/transam/xlogutils.c |1 +
 src/include/access/xlogdefs.h  |1 +
 2 files changed, 2 insertions(+)

diff --git a/src/backend/access/transam/xlogutils.c 
b/src/backend/access/transam/xlogutils.c
index 6ddcc59..3a2462b 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -51,6 +51,7 @@ typedef struct xl_invalid_page
 
 static HTAB *invalid_page_tab = NULL;
 
+XLogRecPtr zeroRecPtr = {0, 0};
 
 /* Report a reference to an invalid page */
 static void
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index 5e6d7e6..2768427 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -35,6 +35,7 @@ typedef struct XLogRecPtr
uint32  xrecoff;/* byte offset of location in 
log file */
 } XLogRecPtr;
 
+extern XLogRecPtr zeroRecPtr;
 #define XLogRecPtrIsInvalid(r) ((r).xrecoff == 0)
 
 
-- 
1.7.10.rc3.3.g19a6c.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

The individual changes need to be identified by an xid. The xid can be a
subtransaction or a toplevel one, at commit those can be reintegrated by doing
a k-way mergesort between the individual transaction.

Callbacks for apply_begin, apply_change and apply_commit are provided to
retrieve complete transactions.

Missing:
- spill-to-disk
- correct subtransaction merge, current behaviour is simple/wrong
- DDL handling (?)
- resource usage controls
---
 src/backend/replication/Makefile |2 +
 src/backend/replication/logical/Makefile |   19 ++
 src/backend/replication/logical/applycache.c |  380 ++
 src/include/replication/applycache.h |  185 +
 4 files changed, 586 insertions(+)
 create mode 100644 src/backend/replication/logical/Makefile
 create mode 100644 src/backend/replication/logical/applycache.c
 create mode 100644 src/include/replication/applycache.h

diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 9d9ec87..ae7f6b1 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,8 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
 OBJS = walsender.o walreceiverfuncs.o walreceiver.o basebackup.o \
repl_gram.o syncrep.o
 
+SUBDIRS = logical
+
 include $(top_srcdir)/src/backend/common.mk
 
 # repl_scanner is compiled as part of repl_gram
diff --git a/src/backend/replication/logical/Makefile 
b/src/backend/replication/logical/Makefile
new file mode 100644
index 000..2eadab8
--- /dev/null
+++ b/src/backend/replication/logical/Makefile
@@ -0,0 +1,19 @@
+#-
+#
+# Makefile--
+#Makefile for src/backend/replication/logical
+#
+# IDENTIFICATION
+#src/backend/replication/logical/Makefile
+#
+#-
+
+subdir = src/backend/replication/logical
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
+
+OBJS = applycache.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/applycache.c 
b/src/backend/replication/logical/applycache.c
new file mode 100644
index 000..b73b0ba
--- /dev/null
+++ b/src/backend/replication/logical/applycache.c
@@ -0,0 +1,380 @@
+/*-
+ *
+ * applycache.c
+ *
+ * PostgreSQL logical replay cache management
+ *
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/replication/applycache.c
+ *
+ */
+#include postgres.h
+
+#include access/heapam.h
+#include access/xact.h
+#include catalog/pg_class.h
+#include catalog/pg_control.h
+#include replication/applycache.h
+
+#include utils/ilist.h
+#include utils/memutils.h
+#include utils/relcache.h
+#include utils/syscache.h
+
+const Size max_memtries = 116;
+
+const size_t max_cached_changes = 1024;
+const size_t max_cached_tuplebufs = 1024; /* ~8MB */
+const size_t max_cached_transactions = 512;
+
+typedef struct ApplyCacheTXNByIdEnt
+{
+   TransactionId xid;
+   ApplyCacheTXN* txn;
+} ApplyCacheTXNByIdEnt;
+
+static ApplyCacheTXN* ApplyCacheGetTXN(ApplyCache *cache);
+static void ApplyCacheReturnTXN(ApplyCache *cache, ApplyCacheTXN* txn);
+
+static ApplyCacheTXN* ApplyCacheTXNByXid(ApplyCache*, TransactionId xid, bool 
create);
+
+
+ApplyCache*
+ApplyCacheAllocate(void)
+{
+   ApplyCache* cache = (ApplyCache*)malloc(sizeof(ApplyCache));
+   HASHCTL hash_ctl;
+
+   if (!cache)
+   elog(ERROR, Could not allocate the ApplyCache);
+
+   memset(hash_ctl, 0, sizeof(hash_ctl));
+
+   cache-context = AllocSetContextCreate(TopMemoryContext,
+  ApplyCache,
+  ALLOCSET_DEFAULT_MINSIZE,
+  ALLOCSET_DEFAULT_INITSIZE,
+  ALLOCSET_DEFAULT_MAXSIZE);
+
+   hash_ctl.keysize = sizeof(TransactionId);
+   hash_ctl.entrysize = sizeof(ApplyCacheTXNByIdEnt);
+   hash_ctl.hash = tag_hash;
+   hash_ctl.hcxt = cache-context;
+
+   cache-by_txn = hash_create(ApplyCacheByXid, 1000, hash_ctl,
+   HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+
+   cache-nr_cached_transactions = 0;
+   cache-nr_cached_changes = 0;
+   cache-nr_cached_tuplebufs = 0;
+
+   ilist_d_init(cache-cached_transactions);
+   ilist_d_init(cache-cached_changes);
+   ilist_s_init(cache-cached_tuplebufs);
+
+   return cache;
+}
+
+void ApplyCacheFree(ApplyCache* cache)
+{
+   /* FIXME: check for in-progress transactions */
+   /* FIXME: clean up cached transaction */
+   /* FIXME: clean up cached changes */
+   /* FIXME: clean 

[HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

The previous coding could miss xlog writeouts at several places. E.g. when wal
was written out by the background writer or even after a commit if
synchronous_commit=off.
This could lead to delays in sending data to the standby of up to 7 seconds.

To fix this move the responsibility of notification to the layer where the
neccessary information is actually present. We take some care not to do the
notification while we hold conteded locks like WALInsertLock or WalWriteLock
locks.

Document the preexisting fact that we rely on SetLatch to be safe from within
signal handlers and critical sections.

This removes the temporary bandaid from 2c8a4e9be2730342cbca85150a2a9d876aa77ff6
---
 src/backend/access/transam/twophase.c |   21 -
 src/backend/access/transam/xact.c |7 --
 src/backend/access/transam/xlog.c |   24 +--
 src/backend/port/unix_latch.c |3 +++
 src/backend/port/win32_latch.c|4 
 src/backend/replication/walsender.c   |   41 -
 src/include/replication/walsender.h   |2 ++
 7 files changed, 66 insertions(+), 36 deletions(-)

diff --git a/src/backend/access/transam/twophase.c 
b/src/backend/access/transam/twophase.c
index b94fae3..bdb7bcd 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1042,13 +1042,6 @@ EndPrepare(GlobalTransaction gxact)
 
/* If we crash now, we have prepared: WAL replay will fix things */
 
-   /*
-* Wake up all walsenders to send WAL up to the PREPARE record 
immediately
-* if replication is enabled
-*/
-   if (max_wal_senders  0)
-   WalSndWakeup();
-
/* write correct CRC and close file */
if ((write(fd, statefile_crc, sizeof(pg_crc32))) != sizeof(pg_crc32))
{
@@ -2045,13 +2038,6 @@ RecordTransactionCommitPrepared(TransactionId xid,
/* Flush XLOG to disk */
XLogFlush(recptr);
 
-   /*
-* Wake up all walsenders to send WAL up to the COMMIT PREPARED record
-* immediately if replication is enabled
-*/
-   if (max_wal_senders  0)
-   WalSndWakeup();
-
/* Mark the transaction committed in pg_clog */
TransactionIdCommitTree(xid, nchildren, children);
 
@@ -2133,13 +2119,6 @@ RecordTransactionAbortPrepared(TransactionId xid,
XLogFlush(recptr);
 
/*
-* Wake up all walsenders to send WAL up to the ABORT PREPARED record
-* immediately if replication is enabled
-*/
-   if (max_wal_senders  0)
-   WalSndWakeup();
-
-   /*
 * Mark the transaction aborted in clog.  This is not absolutely 
necessary
 * but we may as well do it while we are here.
 */
diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index 8f00186..3cc2bfa 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1141,13 +1141,6 @@ RecordTransactionCommit(void)
XLogFlush(XactLastRecEnd);
 
/*
-* Wake up all walsenders to send WAL up to the COMMIT record
-* immediately if replication is enabled
-*/
-   if (max_wal_senders  0)
-   WalSndWakeup();
-
-   /*
 * Now we may update the CLOG, if we wrote a COMMIT record above
 */
if (markXidCommitted)
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index bcb71c4..166efb0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1017,6 +1017,8 @@ begin:;
 
END_CRIT_SECTION();
 
+   /* wakeup the WalSnd now that we released the WALWriteLock */
+   WalSndWakeupProcess();
return RecPtr;
}
 
@@ -1218,6 +1220,9 @@ begin:;
 
END_CRIT_SECTION();
 
+   /* wakeup the WalSnd now that we outside contented locks */
+   WalSndWakeupProcess();
+
return RecPtr;
 }
 
@@ -1822,6 +1827,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool 
xlog_switch)
if (finishing_seg || (xlog_switch  last_iteration))
{
issue_xlog_fsync(openLogFile, openLogId, 
openLogSeg);
+
+   /* signal that we need to wakeup WalSnd later */
+   WalSndWakeupRequest();
+
LogwrtResult.Flush = LogwrtResult.Write;
/* end of page */
 
if (XLogArchivingActive())
@@ -1886,6 +1895,9 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool 
xlog_switch)
openLogOff = 0;
}
issue_xlog_fsync(openLogFile, openLogId, openLogSeg);
+
+  

[HACKERS] [PATCH 07/16] Log enough data into the wal to reconstruct logical changes from it if wal_level=logical

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

This adds a new wal_level value 'logical'

Missing cases:
- heap_multi_insert
- primary key changes for updates
- no primary key
- LOG_NEWPAGE
---
 src/backend/access/heap/heapam.c|  135 ---
 src/backend/access/transam/xlog.c   |1 +
 src/backend/catalog/index.c |   74 +
 src/bin/pg_controldata/pg_controldata.c |2 +
 src/include/access/xlog.h   |3 +-
 src/include/catalog/index.h |4 +
 6 files changed, 207 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9519e73..9149d53 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -52,6 +52,7 @@
 #include access/xact.h
 #include access/xlogutils.h
 #include catalog/catalog.h
+#include catalog/index.h
 #include catalog/namespace.h
 #include miscadmin.h
 #include pgstat.h
@@ -1937,10 +1938,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId 
cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr  recptr;
-   XLogRecData rdata[3];
+   XLogRecData rdata[4];
Pagepage = BufferGetPage(buffer);
uint8   info = XLOG_HEAP_INSERT;
 
+   /*
+* For the logical replication case we need the tuple even if 
were
+* doing a full page write. We could alternatively store a 
pointer into
+* the fpw though.
+* For that to work we add another rdata entry for the buffer 
in that
+* case.
+*/
+   boolneed_tuple = wal_level == WAL_LEVEL_LOGICAL;
+
xlrec.all_visible_cleared = all_visible_cleared;
xlrec.target.node = relation-rd_node;
xlrec.target.tid = heaptup-t_self;
@@ -1960,18 +1970,32 @@ heap_insert(Relation relation, HeapTuple tup, CommandId 
cid,
 */
rdata[1].data = (char *) xlhdr;
rdata[1].len = SizeOfHeapHeader;
-   rdata[1].buffer = buffer;
+   rdata[1].buffer = need_tuple ? InvalidBuffer : buffer;
rdata[1].buffer_std = true;
rdata[1].next = (rdata[2]);
 
/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
rdata[2].data = (char *) heaptup-t_data + 
offsetof(HeapTupleHeaderData, t_bits);
rdata[2].len = heaptup-t_len - offsetof(HeapTupleHeaderData, 
t_bits);
-   rdata[2].buffer = buffer;
+   rdata[2].buffer = need_tuple ? InvalidBuffer : buffer;
rdata[2].buffer_std = true;
rdata[2].next = NULL;
 
/*
+* add record for the buffer without actual content thats 
removed if
+* fpw is done for that buffer
+*/
+   if(need_tuple){
+   rdata[2].next = (rdata[3]);
+
+   rdata[3].data = NULL;
+   rdata[3].len = 0;
+   rdata[3].buffer = buffer;
+   rdata[3].buffer_std = true;
+   rdata[3].next = NULL;
+   }
+
+   /*
 * If this is the single and first tuple on page, we can reinit 
the
 * page instead of restoring the whole thing.  Set flag, and 
hide
 * buffer references from XLogInsert.
@@ -1980,7 +2004,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId 
cid,
PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
{
info |= XLOG_HEAP_INIT_PAGE;
-   rdata[1].buffer = rdata[2].buffer = InvalidBuffer;
+   rdata[1].buffer = rdata[2].buffer = rdata[3].buffer = 
InvalidBuffer;
}
 
recptr = XLogInsert(RM_HEAP_ID, info, rdata);
@@ -2568,7 +2592,9 @@ l1:
{
xl_heap_delete xlrec;
XLogRecPtr  recptr;
-   XLogRecData rdata[2];
+   XLogRecData rdata[4];
+
+   bool need_tuple = wal_level == WAL_LEVEL_LOGICAL  
relation-rd_id  = FirstNormalObjectId;
 
xlrec.all_visible_cleared = all_visible_cleared;
xlrec.target.node = relation-rd_node;
@@ -2584,6 +2610,73 @@ l1:
rdata[1].buffer_std = true;
rdata[1].next = NULL;
 
+   /*
+* XXX: We could decide not to log changes when the origin is 
not the
+* local node, that should reduce redundant logging.
+*/
+   if(need_tuple){
+   xl_heap_header xlhdr;
+
+   Oid indexoid = InvalidOid;
+   int16 pknratts;
+   

[HACKERS] [PATCH 12/16] Add state to keep track of logical replication

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

In order to have restartable replication with minimal additional writes its
very useful to know up to which point we have replayed/received changes from a
foreign node.

One representation of that is the lsn of changes at the originating cluster.

We need to keep track of the point up to which we received data and up to where
we applied data.

For that we added a field 'origin_lsn' to commit records. This allows to keep
track of the apply position with crash recovery with minimal additional io. We
only added the field to non-compact commit records to reduce the overhead in
case logical replication is not used.

Checkpoints need to keep track of the apply/receive positions as well because
otherwise it would be hard to determine the lsn from where to restart
receive/apply after a shutdown/crash if no changes happened since the last
shutdown/crash.

While running the startup process, the walreceiver and a (future) apply process
will need a coherent picture those two states so add shared memory state to
keep track of it. Currently this is represented in the walreceivers shared
memory segment. This will likely need to change.

During crash recovery/physical replication the origin_lsn field of commit
records is used to update the shared memory, and thus the next checkpoint's,
notion of the apply state.

Missing:

- For correct crash recovery we need more state than the 'apply lsn' because
  transactions on the originating side can overlap. At the lsn we just applied
  many other transaction can be in-progres. To correctly handle that we need to
  keep track of oldest start lsn of any transaction currently being reassembled
  (c.f. ApplyCache). Then we can start to reassemble the ApplyCache up from
  that point and throw away any transaction which comitted before the
  recorded/recovered apply lsn.
  It should be sufficient to store that knowledge in shared memory and
  checkpoint records.
---
 src/backend/access/transam/xact.c  |   22 -
 src/backend/access/transam/xlog.c  |   73 
 src/backend/replication/walreceiverfuncs.c |8 +++
 src/include/access/xact.h  |1 +
 src/include/catalog/pg_control.h   |   13 -
 src/include/replication/walreceiver.h  |   13 +
 6 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index dc30a17..40ac965 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -39,11 +39,13 @@
 #include replication/logical.h
 #include replication/syncrep.h
 #include replication/walsender.h
+#include replication/walreceiver.h
 #include storage/lmgr.h
 #include storage/predicate.h
 #include storage/procarray.h
 #include storage/sinvaladt.h
 #include storage/smgr.h
+#include storage/spin.h
 #include utils/combocid.h
 #include utils/guc.h
 #include utils/inval.h
@@ -1015,7 +1017,8 @@ RecordTransactionCommit(void)
/*
 * Do we need the long commit record? If not, use the compact 
format.
 */
-   if (nrels  0 || nmsgs  0 || RelcacheInitFileInval || 
forceSyncCommit)
+   if (nrels  0 || nmsgs  0 || RelcacheInitFileInval || 
forceSyncCommit ||
+   (wal_level == WAL_LEVEL_LOGICAL  
current_replication_origin_id != guc_replication_origin_id))
{
XLogRecData rdata[4];
int lastrdata = 0;
@@ -1037,6 +1040,8 @@ RecordTransactionCommit(void)
xlrec.nrels = nrels;
xlrec.nsubxacts = nchildren;
xlrec.nmsgs = nmsgs;
+   xlrec.origin_lsn = current_replication_origin_lsn;
+
rdata[0].data = (char *) (xlrec);
rdata[0].len = MinSizeOfXactCommit;
rdata[0].buffer = InvalidBuffer;
@@ -4575,6 +4580,21 @@ xact_redo_commit_internal(TransactionId xid, RepNodeId 
originating_node,
LWLockRelease(XidGenLock);
}
 
+   /*
+* record where were at wrt to recovery. We need that to know from 
where on
+* to restart applying logical change records
+*/
+   if(LogicalWalReceiverActive()  !XLByteEQ(origin_lsn, zeroRecPtr))
+   {
+   /*
+* probably we don't need the locking because no lcr receiver 
can run
+* yet.
+*/
+   SpinLockAcquire(WalRcv-mutex);
+   WalRcv-mm_applyState[originating_node] = origin_lsn;
+   SpinLockRelease(WalRcv-mutex);
+   }
+
if (standbyState == STANDBY_DISABLED)
{
/*
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 0622726..20a4611 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c

[HACKERS] [PATCH 09/16] Decode wal (with wal_level=logical) into changes in an ApplyCache instance

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

This requires an up2date catalog and can thus only be run on a replica.

Missing:
- HEAP_NEWPAGE support
- HEAP2_MULTI_INSERT support
- DDL integration. *No* ddl, including TRUNCATE is possible atm
---
 src/backend/replication/logical/Makefile |2 +-
 src/backend/replication/logical/decode.c |  439 ++
 src/include/replication/decode.h |   23 ++
 3 files changed, 463 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/replication/logical/decode.c
 create mode 100644 src/include/replication/decode.h

diff --git a/src/backend/replication/logical/Makefile 
b/src/backend/replication/logical/Makefile
index 2eadab8..7dd9663 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -14,6 +14,6 @@ include $(top_builddir)/src/Makefile.global
 
 override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
 
-OBJS = applycache.o
+OBJS = applycache.o decode.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
new file mode 100644
index 000..7e07d50
--- /dev/null
+++ b/src/backend/replication/logical/decode.c
@@ -0,0 +1,439 @@
+/*-
+ *
+ * decode.c
+ *
+ * Decodes wal records from an xlogreader.h callback into an applycache
+ *
+ *
+ * Portions Copyright (c) 2010-2012, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/replication/logical/decode.c
+ *
+ */
+#include postgres.h
+
+#include access/heapam.h
+#include access/transam.h
+#include access/xlog_internal.h
+#include access/xact.h
+
+#include replication/applycache.h
+#include replication/decode.h
+
+#include utils/memutils.h
+#include utils/syscache.h
+#include utils/lsyscache.h
+
+static void DecodeXLogTuple(char* data, Size len,
+HeapTuple table, ApplyCacheTupleBuf* tuple);
+
+static void DecodeInsert(ApplyCache *cache, XLogRecordBuffer* buf);
+
+static void DecodeUpdate(ApplyCache *cache, XLogRecordBuffer* buf);
+
+static void DecodeDelete(ApplyCache *cache, XLogRecordBuffer* buf);
+
+static void DecodeNewpage(ApplyCache *cache, XLogRecordBuffer* buf);
+static void DecodeMultiInsert(ApplyCache *cache, XLogRecordBuffer* buf);
+
+static void DecodeCommit(ApplyCache* cache, XLogRecordBuffer* buf, 
TransactionId xid,
+TransactionId *sub_xids, int nsubxacts);
+
+
+void DecodeRecordIntoApplyCache(ApplyCache *cache, XLogRecordBuffer* buf)
+{
+   XLogRecord* r = buf-record;
+   uint8 info = r-xl_info  ~XLR_INFO_MASK;
+
+   switch (r-xl_rmid)
+   {
+   case RM_HEAP_ID:
+   {
+   info = XLOG_HEAP_OPMASK;
+   switch (info)
+   {
+   case XLOG_HEAP_INSERT:
+   DecodeInsert(cache, buf);
+   break;
+
+   /* no guarantee that we get an HOT update 
again, so handle it as a normal update*/
+   case XLOG_HEAP_HOT_UPDATE:
+   case XLOG_HEAP_UPDATE:
+   DecodeUpdate(cache, buf);
+   break;
+
+   case XLOG_HEAP_NEWPAGE:
+   DecodeNewpage(cache, buf);
+   break;
+
+   case XLOG_HEAP_DELETE:
+   DecodeDelete(cache, buf);
+   break;
+   default:
+   break;
+   }
+   break;
+   }
+   case RM_HEAP2_ID:
+   {
+   info = XLOG_HEAP_OPMASK;
+   switch (info)
+   {
+   case XLOG_HEAP2_MULTI_INSERT:
+   /* this also handles the 
XLOG_HEAP_INIT_PAGE case */
+   DecodeMultiInsert(cache, buf);
+   break;
+   default:
+   /* everything else here is just 
physical stuff were not interested in */
+   break;
+   }
+   break;
+   }
+
+   case RM_XACT_ID:
+   {
+   switch (info)
+   {
+   case XLOG_XACT_COMMIT:
+   {
+   TransactionId *sub_xids;
+   xl_xact_commit *xlrec = 
(xl_xact_commit*)buf-record_data;
+
+   

[HACKERS] [PATCH 11/16] Add infrastructure for manipulating multiple streams of wal on a segment handling level

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

For that add a 'node_id' parameter to most commands dealing with wal
segments. A node_id thats 'InvalidMultimasterNodeId' references local wal,
every other node_id referes to wal in a new pg_lcr directory.

Using duplicated code would reduce the impact of that change but the long-term
code-maintenance burden outweighs that by a far bit.

Besides the decision to add a 'node_id' parameter to several functions the
changes in this patch are fairly mechanical.
---
 src/backend/access/transam/xlog.c   |   54 ---
 src/backend/replication/basebackup.c|4 +-
 src/backend/replication/walreceiver.c   |2 +-
 src/backend/replication/walsender.c |9 +++--
 src/bin/initdb/initdb.c |1 +
 src/bin/pg_resetxlog/pg_resetxlog.c |2 +-
 src/include/access/xlog.h   |2 +-
 src/include/access/xlog_internal.h  |   13 +--
 src/include/replication/logical.h   |2 +
 src/include/replication/walsender_private.h |2 +-
 10 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 504b4d0..0622726 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -635,8 +635,8 @@ static bool XLogCheckBuffer(XLogRecData *rdata, bool 
doPageWrites,
 static bool AdvanceXLInsertBuffer(bool new_segment);
 static bool XLogCheckpointNeeded(uint32 logid, uint32 logseg);
 static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch);
-static bool InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath,
-  bool find_free, int *max_advance,
+static bool InstallXLogFileSegment(RepNodeId node_id, uint32 *log, uint32 *seg,
+  char *tmppath, bool find_free, int 
*max_advance,
   bool use_lock);
 static int XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 int source, bool notexistOk);
@@ -1736,8 +1736,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool 
xlog_switch)
 
/* create/use new log file */
use_existent = true;
-   openLogFile = XLogFileInit(openLogId, openLogSeg,
-  
use_existent, true);
+   openLogFile = XLogFileInit(InvalidMultimasterNodeId, 
openLogId,
+  openLogSeg, use_existent, 
true);
openLogOff = 0;
}
 
@@ -2376,6 +2376,9 @@ XLogNeedsFlush(XLogRecPtr record)
  * place.  This should be TRUE except during bootstrap log creation.  The
  * caller must *not* hold the lock at call.
  *
+ * node_id: if != InvalidMultimasterNodeId this xlog file is actually a LCR
+ * file
+ *
  * Returns FD of opened file.
  *
  * Note: errors here are ERROR not PANIC because we might or might not be
@@ -2384,8 +2387,8 @@ XLogNeedsFlush(XLogRecPtr record)
  * in a critical section.
  */
 int
-XLogFileInit(uint32 log, uint32 seg,
-bool *use_existent, bool use_lock)
+XLogFileInit(RepNodeId node_id, uint32 log, uint32 seg,
+ bool *use_existent, bool use_lock)
 {
charpath[MAXPGPATH];
chartmppath[MAXPGPATH];
@@ -2396,7 +2399,7 @@ XLogFileInit(uint32 log, uint32 seg,
int fd;
int nbytes;
 
-   XLogFilePath(path, ThisTimeLineID, log, seg);
+   XLogFilePath(path, ThisTimeLineID, node_id, log, seg);
 
/*
 * Try to use existent file (checkpoint maker may have created it 
already)
@@ -2425,6 +2428,11 @@ XLogFileInit(uint32 log, uint32 seg,
 */
elog(DEBUG2, creating and filling new WAL file);
 
+   /*
+* FIXME: to be safe we need to create tempfile in the pg_lcr directory 
if
+* its actually an lcr file because pg_lcr might be in a different
+* partition.
+*/
snprintf(tmppath, MAXPGPATH, XLOGDIR /xlogtemp.%d, (int) getpid());
 
unlink(tmppath);
@@ -2493,7 +2501,7 @@ XLogFileInit(uint32 log, uint32 seg,
installed_log = log;
installed_seg = seg;
max_advance = XLOGfileslop;
-   if (!InstallXLogFileSegment(installed_log, installed_seg, tmppath,
+   if (!InstallXLogFileSegment(node_id, installed_log, installed_seg, 
tmppath,
*use_existent, 
max_advance,
use_lock))
{
@@ -2548,7 +2556,7 @@ XLogFileCopy(uint32 log, uint32 seg,
/*
 * Open the source file
 */
-   XLogFilePath(path, srcTLI, srclog, srcseg);
+   XLogFilePath(path, srcTLI, 

[HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

One solution to avoid loops when doing wal based logical replication in
topologies which are more complex than one unidirectional transport is
introducing the concept of a 'origin_id' into the wal stream. Luckily there is
some padding in the XLogRecord struct that allows us to add that field without
further bloating the struct.
This solution was chosen because it allows for just about any topology and is
inobstrusive.

This adds a new configuration parameter multimaster_node_id which determines
the id used for wal originating in one cluster.

When applying changes from wal from another cluster code can set the variable
current_replication_origin_id. This is a global variable because passing it
through everything which can generate wal would be far to intrusive.
---
 src/backend/access/transam/xact.c |   48 +++--
 src/backend/access/transam/xlog.c |3 +-
 src/backend/access/transam/xlogreader.c   |2 ++
 src/backend/replication/logical/Makefile  |2 +-
 src/backend/replication/logical/logical.c |   19 ++
 src/backend/utils/misc/guc.c  |   19 ++
 src/backend/utils/misc/postgresql.conf.sample |3 ++
 src/include/access/xlog.h |4 +--
 src/include/access/xlogdefs.h |2 ++
 src/include/replication/logical.h |   22 
 10 files changed, 110 insertions(+), 14 deletions(-)
 create mode 100644 src/backend/replication/logical/logical.c
 create mode 100644 src/include/replication/logical.h

diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index 3cc2bfa..dc30a17 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,8 +36,9 @@
 #include libpq/be-fsstubs.h
 #include miscadmin.h
 #include pgstat.h
-#include replication/walsender.h
+#include replication/logical.h
 #include replication/syncrep.h
+#include replication/walsender.h
 #include storage/lmgr.h
 #include storage/predicate.h
 #include storage/procarray.h
@@ -4545,12 +4546,13 @@ xactGetCommittedChildren(TransactionId **ptr)
  * actions for which the order of execution is critical.
  */
 static void
-xact_redo_commit_internal(TransactionId xid, XLogRecPtr lsn,
- TransactionId *sub_xids, int 
nsubxacts,
- SharedInvalidationMessage 
*inval_msgs, int nmsgs,
- RelFileNode *xnodes, int 
nrels,
- Oid dbId, Oid tsId,
- uint32 xinfo)
+xact_redo_commit_internal(TransactionId xid, RepNodeId originating_node,
+  XLogRecPtr lsn, XLogRecPtr origin_lsn,
+  TransactionId *sub_xids, int nsubxacts,
+  SharedInvalidationMessage *inval_msgs, int nmsgs,
+  RelFileNode *xnodes, int nrels,
+  Oid dbId, Oid tsId,
+  uint32 xinfo)
 {
TransactionId max_xid;
int i;
@@ -4659,8 +4661,13 @@ xact_redo_commit_internal(TransactionId xid, XLogRecPtr 
lsn,
  * Utility function to call xact_redo_commit_internal after breaking down xlrec
  */
 static void
+ HEAD
 xact_redo_commit(xl_xact_commit *xlrec,
 TransactionId xid, XLogRecPtr lsn)
+===
+xact_redo_commit(xl_xact_commit *xlrec, RepNodeId originating_node,
+   TransactionId xid, 
XLogRecPtr lsn)
+ Introduce the concept that wal has a 'origin' node
 {
TransactionId *subxacts;
SharedInvalidationMessage *inval_msgs;
@@ -4670,18 +4677,26 @@ xact_redo_commit(xl_xact_commit *xlrec,
/* invalidation messages array follows subxids */
inval_msgs = (SharedInvalidationMessage *) 
(subxacts[xlrec-nsubxacts]);
 
+ HEAD
xact_redo_commit_internal(xid, lsn, subxacts, xlrec-nsubxacts,
  inval_msgs, 
xlrec-nmsgs,
  xlrec-xnodes, 
xlrec-nrels,
  xlrec-dbId,
  xlrec-tsId,
  xlrec-xinfo);
+===
+   xact_redo_commit_internal(xid, originating_node, lsn, xlrec-origin_lsn,
+ subxacts, xlrec-nsubxacts, inval_msgs,
+ xlrec-nmsgs, xlrec-xnodes, xlrec-nrels,
+ xlrec-dbId, xlrec-tsId, xlrec-xinfo);
+ Introduce the concept that wal has a 'origin' node
 }
 
 /*
  * Utility function to call xact_redo_commit_internal  for compact form of 
message.
  */
 static void
+ HEAD
 

[HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

Features:
- streaming reading/writing
- filtering
- reassembly of records

Reusing the ReadRecord infrastructure in situations where the code that wants
to do so is not tightly integrated into xlog.c is rather hard and would require
changes to rather integral parts of the recovery code which doesn't seem to be
a good idea.

Missing:
- compressing the stream when removing uninteresting records
- writing out correct CRCs
- validating CRCs
- separating reader/writer
---
 src/backend/access/transam/Makefile |2 +-
 src/backend/access/transam/xlogreader.c |  914 +++
 src/include/access/xlogreader.h |  173 ++
 3 files changed, 1088 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/access/transam/xlogreader.c
 create mode 100644 src/include/access/xlogreader.h

diff --git a/src/backend/access/transam/Makefile 
b/src/backend/access/transam/Makefile
index f82f10e..660b5fc 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = clog.o transam.o varsup.o xact.o rmgr.o slru.o subtrans.o multixact.o \
-   twophase.o twophase_rmgr.o xlog.o xlogfuncs.o xlogutils.o
+   twophase.o twophase_rmgr.o xlog.o xlogfuncs.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/access/transam/xlogreader.c 
b/src/backend/access/transam/xlogreader.c
new file mode 100644
index 000..6f15d66
--- /dev/null
+++ b/src/backend/access/transam/xlogreader.c
@@ -0,0 +1,914 @@
+/*-
+ *
+ * xlogreader.c
+ *
+ * Aa somewhat generic xlog read interface
+ *
+ * Portions Copyright (c) 2010-2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *   src/backend/access/transam/readxlog.c
+ *
+ *-
+ *
+ * FIXME:
+ * * CRC computation
+ * * separation of reader/writer
+ */
+
+#include postgres.h
+
+#include access/xlog_internal.h
+#include access/transam.h
+#include catalog/pg_control.h
+#include access/xlogreader.h
+
+/* FIXME */
+#include replication/walsender_private.h
+#include replication/walprotocol.h
+
+//#define VERBOSE_DEBUG
+
+XLogReaderState* XLogReaderAllocate(void)
+{
+   XLogReaderState* state = 
(XLogReaderState*)malloc(sizeof(XLogReaderState));
+   int i;
+
+   if (!state)
+   goto oom;
+
+   memset(state-buf.record, 0, sizeof(XLogRecord));
+   state-buf.record_data_size = XLOG_BLCKSZ*8;
+   state-buf.record_data =
+   malloc(state-buf.record_data_size);
+
+   if (!state-buf.record_data)
+   goto oom;
+
+   if (!state)
+   goto oom;
+
+   for (i = 0; i  XLR_MAX_BKP_BLOCKS; i++)
+   {
+   state-buf.bkp_block_data[i] =
+   malloc(BLCKSZ);
+
+   if (!state-buf.bkp_block_data[i])
+   goto oom;
+   }
+   XLogReaderReset(state);
+   return state;
+
+oom:
+   elog(ERROR, could not allocate memory for XLogReaderState);
+   return 0;
+}
+
+void XLogReaderReset(XLogReaderState* state)
+{
+   state-in_record = false;
+   state-in_bkp_blocks = 0;
+   state-in_bkp_block_header = false;
+   state-in_skip = false;
+   state-remaining_size = 0;
+   state-nbytes = 0;
+   state-incomplete = false;
+   state-initialized = false;
+   state-needs_input = false;
+   state-needs_output = false;
+}
+
+static inline bool
+XLogReaderHasInput(XLogReaderState* state, Size size)
+{
+   XLogRecPtr tmp = state-curptr;
+   XLByteAdvance(tmp, size);
+   if (XLByteLE(state-endptr, tmp))
+   return false;
+   return true;
+}
+
+static inline bool
+XLogReaderHasOutput(XLogReaderState* state, Size size){
+   if (state-nbytes + size  MAX_SEND_SIZE)
+   return false;
+   return true;
+}
+
+static inline bool
+XLogReaderHasSpace(XLogReaderState* state, Size size)
+{
+   XLogRecPtr tmp = state-curptr;
+   XLByteAdvance(tmp, size);
+   if (XLByteLE(state-endptr, tmp))
+   return false;
+   else if (state-nbytes + size  MAX_SEND_SIZE)
+   return false;
+   return true;
+}
+
+void
+XLogReaderRead(XLogReaderState* state)
+{
+   XLogRecord* temp_record;
+
+   state-needs_input = false;
+   state-needs_output = false;
+
+   /*
+* Do some basic sanity checking and setup if were starting anew.
+*/
+   if (!state-initialized)
+   {
+   state-initialized = true;
+   /*
+* we need to start reading at the beginning of the page to 
understand
+* what we are currently reading. We will skip over that 
because we
+* check curptr  startptr 

[HACKERS] [PATCH 16/16] current version of the design document

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

---
 src/backend/replication/logical/DESIGN |  209 
 1 file changed, 209 insertions(+)
 create mode 100644 src/backend/replication/logical/DESIGN

diff --git a/src/backend/replication/logical/DESIGN 
b/src/backend/replication/logical/DESIGN
new file mode 100644
index 000..2cf08ff
--- /dev/null
+++ b/src/backend/replication/logical/DESIGN
@@ -0,0 +1,209 @@
+=== Design goals for logical replication ===:
+- in core
+- fast
+- async
+- robust
+- multi-master
+- modular
+- as unintrusive as possible implementation wise
+- basis for other technologies (sharding, replication into other DBMSs, ...)
+
+For reasons why we think this is an important set of features please check out
+the presentation from the in-core replication summit at pgcon:
+http://wiki.postgresql.org/wiki/File:BDR_Presentation_PGCon2012.pdf
+
+While you may argue that most of the above design goals are already provided by
+various trigger based replication solutions like Londiste or Slony, we think
+that thats not enough for various reasons:
+
+- not in core (and thus less trustworthy)
+- duplication of writes due to an additional log
+- performance in general (check the end of the above presentation)
+- complex to use because there is no native administration interface
+
+We want to emphasize that this proposed architecture is based on the experience
+of developing a minimal prototype which we developed with the above goals in
+mind. While we obviously hope that a good part of it is reusable for the
+community we definitely do *not* expect that the community accepts this
++as-is. It is intended to be the basis upon which we, the community, can build
+and design the future logical replication.
+
+=== Basic architecture ===:
+Very broadly speaking there are several major pieces common to most approaches
+to replication:
+1. Source data generation
+2. Transportation of that data
+3. Applying the changes
+4. Conflict resolution
+
+
+1.:
+
+As we need a change stream that contains all required changes in the correct
+order, the requirement for this stream to reflect changes across multiple
+concurrent backends raises concurrency and scalability issues. Reusing the
+WAL stream for this seems a good choice since it is needed anyway and adresses
+those issues already, and it further means that we don't incur duplicate
+writes. Any other stream generating componenent would introduce additional
+scalability issues.
+
+We need a change stream that contains all required changes in the correct order
+which thus needs to be synchronized across concurrent backends which introduces
+obvious concurrency/scalability issues.
+Reusing the WAL stream for this seems a good choice since it is needed anyway
+and adresses those issues already, and it further means we don't duplicate the
+writes and locks already performance for its maintenance.
+
+Unfortunately, in this case, the WAL is mostly a physical representation of the
+changes and thus does not, by itself, contain the necessary information in a
+convenient format to create logical changesets.
+
+The biggest problem is, that interpreting tuples in the WAL stream requires an
+up-to-date system catalog and needs to be done in a compatible backend and
+architecture. The requirement of an up-to-date catalog could be solved by
+adding more data to the WAL stream but it seems to be likely that that would
+require relatively intrusive  complex changes. Instead we chose to require a
+synchronized catalog at the decoding site. That adds some complexity to use
+cases like replicating into a different database or cross-version
+replication. For those it is relatively straight-forward to develop a proxy pg
+instance that only contains the catalog and does the transformation to textual
+changes.
+
+This also is the solution to the other big problem, the need to work around
+architecture/version specific binary formats. The alternative, producing
+cross-version, cross-architecture compatible binary changes or even moreso
+textual changes all the time seems to be prohibitively expensive. Both from a
+cpu and a storage POV and also from the point of implementation effort.
+
+The catalog on the site where changes originate can *not* be used for the
+decoding because at the time we decode the WAL the catalog may have changed
+from the state it was in when the WAL was generated. A possible solution for
+this would be to have a fully versioned catalog but that again seems to be
+rather complex and intrusive.
+
+For some operations (UPDATE, DELETE) and corner-cases (e.g. full page writes)
+additional data needs to be logged, but the additional amount of data isn't
+that big. Requiring a primary-key for any change but INSERT seems to be a
+sensible thing for now. The required changes are fully contained in heapam.c
+and are pretty simple so far.
+
+2.:
+
+For transport of the non-decoded data from the originating site to the decoding
+site we decided to 

[HACKERS] [PATCH 15/16] Activate/Implement the apply process which applies received updates from another node

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

One apply process currently can only apply changes from one database in another
cluster (with a specific node_id).

Currently synchronous_commit=off is statically set in the apply process because
after a crash we can safely recover all changes which we didn't apply so there
is no point of incurring the overhead of synchronous commits. This might be
problematic in combination with synchronous replication.

Missing/Todo:
- The foreign node_id currently is hardcoded (2, 1 depending on the local id)
  as is the database (postgres). This obviously need to change.
- Proper mainloop with error handling, PROCESS_INTERRUPTS and everything
- Start multiple apply processes (per node_id per database)
- Possibly switch databases during runtime?
---
 src/backend/postmaster/bgworker.c |   10 +-
 src/backend/replication/logical/logical.c |  194 +
 src/include/replication/logical.h |3 +
 3 files changed, 198 insertions(+), 9 deletions(-)

diff --git a/src/backend/postmaster/bgworker.c 
b/src/backend/postmaster/bgworker.c
index 8144050..bbb7e86 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -52,6 +52,7 @@
 #include postmaster/bgworker.h
 #include postmaster/fork_process.h
 #include postmaster/postmaster.h
+#include replication/logical.h
 #include storage/bufmgr.h
 #include storage/ipc.h
 #include storage/latch.h
@@ -91,8 +92,6 @@ static void bgworker_sigterm_handler(SIGNAL_ARGS);
 
 NON_EXEC_STATIC void BgWorkerMain(int argc, char *argv[]);
 
-static bool do_logicalapply(void);
-
 /
  *   BGWORKER CODE
  /
@@ -394,10 +393,3 @@ NumBgWorkers(void)
return numWorkers;
 #endif
 }
-
-static bool
-do_logicalapply(void)
-{
-   elog(LOG, doing logical apply);
-   return false;
-}
diff --git a/src/backend/replication/logical/logical.c 
b/src/backend/replication/logical/logical.c
index 4f34488..7fadafe 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -13,7 +13,201 @@
  *
  */
 #include postgres.h
+
+#include access/xlogreader.h
+
+#include replication/applycache.h
 #include replication/logical.h
+#include replication/apply.h
+#include replication/decode.h
+#include replication/walreceiver.h
+/*FIXME: XLogRead*/
+#include replication/walsender_private.h
+
+#include storage/ipc.h
+#include storage/proc.h
+
 int guc_replication_origin_id = InvalidMultimasterNodeId;
 RepNodeId current_replication_origin_id = InvalidMultimasterNodeId;
 XLogRecPtr current_replication_origin_lsn = {0, 0};
+
+static XLogReaderState* xlogreader_state = 0;
+
+static bool
+replay_record_is_interesting(XLogReaderState* state, XLogRecord* r)
+{
+   /*
+* we filtered in the sender, so no filtering necessary atm.
+*
+* If we want to introduce per-table filtering after a proxy or such 
this
+* would be the place.
+*/
+   return true;
+}
+
+static void
+replay_writeout_data(XLogReaderState* state, char* data, Size len)
+{
+   /* no data needs to persists after this */
+   return;
+}
+
+static void
+replay_finished_record(XLogReaderState* state, XLogRecordBuffer* buf)
+{
+   ReaderApplyState* apply_state = state-private_data;
+   ApplyCache *cache = apply_state-apply_cache;
+
+   DecodeRecordIntoApplyCache(cache, buf);
+}
+
+static void
+replay_read_page(XLogReaderState* state, char* cur_page, XLogRecPtr startptr)
+{
+   XLogPageHeader page_header;
+
+   Assert((startptr.xrecoff % XLOG_BLCKSZ) == 0);
+
+   /* FIXME: more sensible/efficient implementation */
+   XLogRead(cur_page, receiving_from_node_id, startptr, XLOG_BLCKSZ);
+
+   page_header = (XLogPageHeader)cur_page;
+
+   if (page_header-xlp_magic != XLOG_PAGE_MAGIC)
+   {
+   elog(FATAL, page header magic %x, should be %x at %X/%X, 
page_header-xlp_magic,
+XLOG_PAGE_MAGIC, startptr.xlogid, startptr.xrecoff);
+   }
+}
+
+bool
+do_logicalapply(void)
+{
+   XLogRecPtr *from;
+   XLogRecPtr *to;
+   ReaderApplyState *apply_state;
+   int res;
+
+   static bool initialized = false;
+
+   if (!initialized)
+   {
+   /*
+* FIXME: We need a sensible implementation for choosing this.
+*/
+   if (guc_replication_origin_id == 1)
+   {
+   receiving_from_node_id = 2;
+   }
+   else
+   {
+   receiving_from_node_id = 1;
+   }
+   }
+
+   ResetLatch(MyProc-procLatch);
+
+   SpinLockAcquire(WalRcv-mutex);
+   from = WalRcv-mm_applyState[receiving_from_node_id];
+   to = WalRcv-mm_receiveState[receiving_from_node_id];
+   

[HACKERS] [PATCH 14/16] Add module to apply changes from an apply-cache using low-level functions

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

We decided to use low level functions to do the apply instead of producing sql
statements containing the data (or using prepared statements) because both, the
text conversion and the full executor overhead aren't introduce a significant
overhead which is unneccesary if youre using the same version of pg on the same
architecture.

There are loads of use cases though that require different methods of applyin
though - so the part doing the applying from an ApplyCache is just a bunch of
well abstracted callbacks getting passed all the required knowledge to change
the data representation into other formats.

Missing:

- TOAST handling. For physical apply not much needs to be done because the
  toast inserts will have been made beforehand. There needs to be an option in
  ApplyCache that helps reassembling TOAST datums to make it easier to write
  apply modules which convert to text.
---
 src/backend/replication/logical/Makefile |2 +-
 src/backend/replication/logical/apply.c  |  313 ++
 src/include/replication/apply.h  |   24 +++
 3 files changed, 338 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/replication/logical/apply.c
 create mode 100644 src/include/replication/apply.h

diff --git a/src/backend/replication/logical/Makefile 
b/src/backend/replication/logical/Makefile
index c2d6d82..d0e0b13 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -14,6 +14,6 @@ include $(top_builddir)/src/Makefile.global
 
 override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
 
-OBJS = applycache.o decode.o logical.o
+OBJS = apply.o applycache.o decode.o logical.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/logical/apply.c 
b/src/backend/replication/logical/apply.c
new file mode 100644
index 000..646bd54
--- /dev/null
+++ b/src/backend/replication/logical/apply.c
@@ -0,0 +1,313 @@
+/*-
+ *
+ * logical.c
+ *
+ * Support functions for logical/multimaster replication
+ *
+ *
+ * Portions Copyright (c) 2010-2012, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/replication/logical.c
+ *
+ */
+#include postgres.h
+
+#include access/xact.h
+#include access/heapam.h
+#include access/genam.h
+
+#include catalog/pg_control.h
+#include catalog/index.h
+
+#include executor/executor.h
+
+#include replication/applycache.h
+#include replication/apply.h
+
+#include utils/rel.h
+#include utils/snapmgr.h
+#include utils/lsyscache.h
+
+
+
+static void
+UserTableUpdateIndexes(Relation heapRel, HeapTuple heapTuple);
+
+
+void apply_begin_txn(ApplyCache* cache, ApplyCacheTXN* txn)
+{
+   ApplyApplyCacheState *state = cache-private_data;
+
+   state-original_resource_owner = CurrentResourceOwner;
+
+   PreventTransactionChain(true, Apply Process cannot be started inside a 
txn);
+
+   StartTransactionCommand();
+
+   PushActiveSnapshot(GetTransactionSnapshot());
+}
+
+void apply_commit_txn(ApplyCache* cache, ApplyCacheTXN* txn)
+{
+   ApplyApplyCacheState *state = cache-private_data;
+
+   current_replication_origin_lsn = txn-lsn;
+
+   PopActiveSnapshot();
+   CommitTransactionCommand();
+
+
+   /*
+* for some reason after (Start|Commit)TransactionCommand we loose our
+* resource owner, restore it.
+* XXX: is that correct?
+*/
+   CurrentResourceOwner = state-original_resource_owner;
+
+   current_replication_origin_lsn.xlogid = 0;
+   current_replication_origin_lsn.xrecoff = 0;
+}
+
+
+void apply_change(ApplyCache* cache, ApplyCacheTXN* txn, ApplyCacheTXN* 
subtxn, ApplyCacheChange* change)
+{
+   /* for inserting */
+   Relationtuple_rel;
+
+   tuple_rel = heap_open(HeapTupleGetOid(change-table), RowExclusiveLock);
+
+   switch (change-action)
+   {
+   case APPLY_CACHE_CHANGE_INSERT:
+   {
+#ifdef VERBOSE_DEBUG
+   elog(LOG, INSERT);
+#endif
+   simple_heap_insert(tuple_rel, change-newtuple-tuple);
+
+   UserTableUpdateIndexes(tuple_rel, 
change-newtuple-tuple);
+   break;
+   }
+   case APPLY_CACHE_CHANGE_UPDATE:
+   {
+   Oid indexoid = InvalidOid;
+   int16 pknratts;
+   int16 pkattnum[INDEX_MAX_KEYS];
+   Oid pktypoid[INDEX_MAX_KEYS];
+   Oid pkopclass[INDEX_MAX_KEYS];
+
+   ScanKeyData cur_skey[INDEX_MAX_KEYS];
+   int i;
+   bool isnull;
+   TupleDesc desc = RelationGetDescr(tuple_rel);
+
+   Relation index_rel;
+
+   HeapTuple old_tuple;
+   bool found = false;
+
+

[HACKERS] [PATCH 13/16] Introduction of pair of logical walreceiver/sender

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

A logical WALReceiver is started directly by Postmaster when we enter PM_RUN
state and the new parameter multimaster_conninfo is set. For now only one of
those is started, but the code doesn't rely on that. In future multiple ones
should be allowed.

To transfer that data a new command, START_LOGICAL_REPLICATION is introduced in
the walsender reusing most of the infrastructure for START_REPLICATION. The
former uses the same on-the-wire format as the latter.

To make initialization possibly IDENTIFY_SYSTEM returns two new columns node_id
returning the multimaster_node_id and last_checkpoint returning the RedoRecPtr.

The walreceiver writes that data into the previously introduce pg_lcr/$node_id
directory.

Future Directions/TODO:
- pass node_ids were interested in to START_LOGICAL_REPLICATION to allow
  complex topologies
- allow to pass filters to reduce the transfer volume
- compress the transferred data by actually removing uninteresting records
  instead of replacing them by NOOP records. This adds some complexities
  because we still need to map the received lsn to the requested lsn so we know
  where to restart transferring data and such.
- check that wal on the sending side was generated with WAL_LEVEL_LOGICAL
---
 src/backend/postmaster/postmaster.c|   10 +-
 .../libpqwalreceiver/libpqwalreceiver.c|  104 -
 src/backend/replication/repl_gram.y|   19 +-
 src/backend/replication/repl_scanner.l |1 +
 src/backend/replication/walreceiver.c  |  165 +++-
 src/backend/replication/walreceiverfuncs.c |1 +
 src/backend/replication/walsender.c|  422 +++-
 src/backend/utils/misc/guc.c   |9 +
 src/backend/utils/misc/postgresql.conf.sample  |1 +
 src/include/nodes/nodes.h  |1 +
 src/include/nodes/replnodes.h  |   10 +
 src/include/replication/logical.h  |4 +
 src/include/replication/walreceiver.h  |9 +-
 13 files changed, 624 insertions(+), 132 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index 71cfd6d..13e9592 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1449,6 +1449,11 @@ ServerLoop(void)
kill(AutoVacPID, SIGUSR2);
}
 
+   /* Restart walreceiver process in certain states only. */
+   if (WalReceiverPID == 0  pmState == PM_RUN 
+   LogicalWalReceiverActive())
+   WalReceiverPID = StartWalReceiver();
+
/* Check all the workers requested are running. */
if (pmState == PM_RUN)
StartBackgroundWorkers();
@@ -2169,7 +2174,8 @@ pmdie(SIGNAL_ARGS)
/* and the walwriter too */
if (WalWriterPID != 0)
signal_child(WalWriterPID, SIGTERM);
-
+   if (WalReceiverPID != 0)
+   signal_child(WalReceiverPID, SIGTERM);
/*
 * If we're in recovery, we can't kill the 
startup process
 * right away, because at present doing so does 
not release
@@ -2421,6 +2427,8 @@ reaper(SIGNAL_ARGS)
PgArchPID = pgarch_start();
if (PgStatPID == 0)
PgStatPID = pgstat_start();
+   if (WalReceiverPID == 0  LogicalWalReceiverActive())
+   WalReceiverPID = StartWalReceiver();
StartBackgroundWorkers();
 
/* at this point we are really open for business */
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c 
b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 979b66b..0ea3fce 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -46,7 +46,8 @@ static PGconn *streamConn = NULL;
 static char *recvBuf = NULL;
 
 /* Prototypes for interface functions */
-static bool libpqrcv_connect(char *conninfo, XLogRecPtr startpoint);
+static bool libpqrcv_connect(char *conninfo, XLogRecPtr* redo, XLogRecPtr* 
where_at, bool startedDuringRecovery);
+static bool libpqrcv_start(char *conninfo, XLogRecPtr* startpoint, bool 
startedDuringRecovery);
 static bool libpqrcv_receive(int timeout, unsigned char *type,
 char **buffer, int *len);
 static void libpqrcv_send(const char *buffer, int nbytes);
@@ -63,10 +64,12 @@ void
 _PG_init(void)
 {
/* Tell walreceiver how to reach us */
-   if (walrcv_connect != NULL || walrcv_receive != 

[HACKERS] [PATCH 05/16] Preliminary: Introduce Bgworker process

2012-06-13 Thread Andres Freund
From: Simon Riggs si...@2ndquadrant.com

Early prototype that allows for just 1 bgworker which calls a function called
do_applyprocess().  Expect major changes in this, but not in ways that would
effect the apply process.
---
 src/backend/postmaster/Makefile   |4 +-
 src/backend/postmaster/bgworker.c |  403 +
 src/backend/postmaster/postmaster.c   |   91 --
 src/backend/tcop/postgres.c   |5 +
 src/backend/utils/init/miscinit.c |5 +-
 src/backend/utils/init/postinit.c |3 +-
 src/backend/utils/misc/guc.c  |   37 ++-
 src/backend/utils/misc/postgresql.conf.sample |4 +
 src/include/postmaster/bgworker.h |   29 ++
 9 files changed, 550 insertions(+), 31 deletions(-)
 create mode 100644 src/backend/postmaster/bgworker.c
 create mode 100644 src/include/postmaster/bgworker.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3056b09..7b23353 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgwriter.o fork_process.o pgarch.o pgstat.o postmaster.o \
-   startup.o syslogger.o walwriter.o checkpointer.o
+OBJS = autovacuum.o bgworker.o bgwriter.o fork_process.o pgarch.o pgstat.o \
+   postmaster.o startup.o syslogger.o walwriter.o checkpointer.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c 
b/src/backend/postmaster/bgworker.c
new file mode 100644
index 000..8144050
--- /dev/null
+++ b/src/backend/postmaster/bgworker.c
@@ -0,0 +1,403 @@
+/*-
+ *
+ * bgworker.c
+ *
+ * PostgreSQL Integrated Worker Daemon
+ *
+ * Background workers can execute arbitrary user code. A shared library
+ * can request creation of a worker using RequestAddinBGWorkerProcess().
+ *
+ * The worker process is forked from the postmaster and then attaches
+ * to shared memory similarly to an autovacuum worker and finally begins
+ * executing the supplied WorkerMain function.
+ *
+ * If the fork() call fails in the postmaster, it will try again later.
+ * Note that the failure can only be transient (fork failure due to
+ * high load, memory pressure, too many processes, etc); more permanent
+ * problems, like failure to connect to a database, are detected later in the
+ * worker and dealt with just by having the worker exit normally.  Postmaster
+ * will launch a new worker again later.
+ *
+ * Note that there can be more than one worker in a database concurrently.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/bgworker.c
+ *
+ *-
+ */
+#include postgres.h
+
+#include signal.h
+#include sys/types.h
+#include sys/time.h
+#include time.h
+#include unistd.h
+
+#include access/heapam.h
+#include access/reloptions.h
+#include access/transam.h
+#include access/xact.h
+#include catalog/dependency.h
+#include catalog/namespace.h
+#include catalog/pg_database.h
+#include commands/dbcommands.h
+#include commands/vacuum.h
+#include libpq/pqsignal.h
+#include miscadmin.h
+#include pgstat.h
+#include postmaster/bgworker.h
+#include postmaster/fork_process.h
+#include postmaster/postmaster.h
+#include storage/bufmgr.h
+#include storage/ipc.h
+#include storage/latch.h
+#include storage/pmsignal.h
+#include storage/proc.h
+#include storage/procsignal.h
+#include storage/sinvaladt.h
+#include tcop/tcopprot.h
+#include utils/fmgroids.h
+#include utils/lsyscache.h
+#include utils/memutils.h
+#include utils/ps_status.h
+#include utils/rel.h
+#include utils/snapmgr.h
+#include utils/syscache.h
+#include utils/timestamp.h
+#include utils/tqual.h
+
+
+/*
+ * GUC parameters
+ */
+intMaxWorkers;
+
+static int bgworker_addin_request = 0;
+static bool bgworker_addin_request_allowed = true;
+
+/* Flags to tell if we are in a worker process */
+static bool am_bgworker = false;
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+static volatile sig_atomic_t got_SIGTERM = false;
+
+static void bgworker_sigterm_handler(SIGNAL_ARGS);
+
+NON_EXEC_STATIC void BgWorkerMain(int argc, char *argv[]);
+
+static bool do_logicalapply(void);
+
+/
+ *   BGWORKER CODE
+ /
+
+/* SIGTERM: time to die */
+static void
+bgworker_sigterm_handler(SIGNAL_ARGS)
+{
+   int save_errno = 

Re: [HACKERS] Ability to listen on two unix sockets

2012-06-13 Thread Honza Horak

On 06/10/2012 12:37 AM, Peter Eisentraut wrote:

On sön, 2012-06-10 at 00:25 +0200, Andres Freund wrote:

We already have the ability to configure the Unix socket

directory in

postgresql.conf.  All you need to do is turn that into a list.

That doesn't help libpq using clients.

There is no mandate here to do anything about that.

Well, Martin Pitt/the debian package is patching exactly that. Youre
saying
that everything that needs to be done to make that easier is to make
unix_socket_dir a list. So there seems to be some disparity there ;)


The Debian package doesn't need any change or assistance, really.  You
can change the default location of the socket by patching
pg_config_manual.h, and that's a one-liner.  The only thing that would
make that slightly easier or better is making it a configure option, but
that was intentionally decided against in the old days (not by me).


Since systemd with it's PrivateTmp feature is going to be used in more 
and more distros, there will probably be a bigger need to solve 
in-accessible default unix socket directory /tmp in the future.


Thus, I'd like to ask if anybody is aware of any issue connected with 
simply patching pg_config_manual.h, same as Debian does it already? For 
example, is there any piece of software, that simply rely on /tmp 
location of the socket and doesn't use libpg for the communication?


Regards,
Honza

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hint bit i/o reduction

2012-06-13 Thread Merlin Moncure
On Wed, Jun 13, 2012 at 3:57 AM, Amit Kapila amit.kap...@huawei.com wrote:
 1) Keep track # times the last transaction id was repeatedly seen in
 tqual.c (resetting as soon as a new xid was touched.  We can do this
 just for xmin, or separately for both xmin and xmax.

 Will this be done when we see a new xid duricg scan of tuple, if yes then
 Won't this logic impact incase there are very few records per transaction in
 database.
 As in that case the new optimization won't help much, but it adds the new
 instructions.

Yes, but only in the unhinted case -- in oltp workloads tuples get
hinted fairly quickly so I doubt this would be a huge impact.  Hinted
scans will work exactly as they do now.  In the unhinted case for OLTP
a  few tests are added but that's noise compared to the other stuff
going on.

 We can also try to be smart and disable
 the 'avoid setting the hint bit' logic if the page is already dirty.

 Whats the benefit for not setting hint bit for a already dirty page.

Nothing -- we are disabling the '*avoid* setting the hint bit' logic
(double negative there).  In other words we would be disabling the
setting of hint bits if we see the same transaction many times in a
row, but only if the page isn't dirty.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-13 Thread Honza Horak

On 06/10/2012 03:41 PM, Robert Haas wrote:

On Sun, Jun 10, 2012 at 8:36 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

Peter Eisentrautpete...@gmx.net  writes:

On lör, 2012-06-09 at 18:26 -0400, Tom Lane wrote:

That's not actually quite the same thing as what I suggest above.
Currently, unix_socket_directory *overrides* the compiled-in choice.
I'm suggesting that it would be better to invent a list that is *added
to* the compiled-in choice.  If we think it would be best to still be
able to override that, then I'd vote for keeping unix_socket_directory
as is, and then adding a list named something like
secondary_socket_directories.   But if we just turn
unix_socket_directory into a list, I think the lack of separation
between primary and secondary directories will be confusing.



By that logic, any list-valued parameter should be split into a primary
and secondary setting.


Well, no: the key point here is that there will be one directory that is
special because it's the one baked into libpq.  I agree that for the
purposes of the backend in isolation, we might as well just have a list.
What's less clear is whether, when considering the backend+client
ecosystem as a whole, the special status of the configure-time socket
directory ought to be reflected in the way we set up the GUCs.  I have
to admit that I'm not totally sold on either approach.


I think we should consider this in the context of allowing both
additional UNIX sockets and additional TCP ports.  In the case of TCP
ports, it's clearly no good to turn port into a list, because one
port number has to be primary, since it goes into the PID file and
also affects the naming of any UNIX sockets created.  If we add
secondary_socket_dirs, I think we will also need secondary_ports.  One
idea might be to have one new GUC that serves both purposes:

additional_sockets = '12345, /foo'

I'm sure there are other ways to skin the cat as well.


Going through the thread, I'd like to sum it up choosing approach with 
less potential issues and would like to find a consensus if possible.


It seems unix_socket_directory could be turned into list and probably 
renamed to unix_socket_directories, since it would be confusing if a 
list value is in singular. On the other hand, we probably don't want to 
specify listening ports together with additional unix sockets in one 
configuration option, so it seems better to add a new configuration 
option to distinguish the primary listening port from additional ports.


Regards,
Honza

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hint bit i/o reduction

2012-06-13 Thread Amit Kapila
 Yes, but only in the unhinted case -- in oltp workloads tuples get
 hinted fairly quickly so I doubt this would be a huge impact.  Hinted
 scans will work exactly as they do now.  In the unhinted case for OLTP
 a  few tests are added but that's noise compared to the other stuff
 going on.

I believe the design you have purposed is for the unhinted cases only, means
if the tuple has already hint set then it will not go to new logic. Is that
right?

In unhinted cases, there can be 2 ways hint bit can be set
1. It gets the transaction status from Clog memory buffers
2. It needs to do I/O to get the transaction status from Clog.

So, I think it will add overhead in case-1 where the processing is fast, but
now we will add some new instructions to it.
However I think if you have already tested the scenario for same then there
should not be any problem.

-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Wednesday, June 13, 2012 6:50 PM
To: Amit Kapila
Cc: PostgreSQL-development
Subject: Re: [HACKERS] hint bit i/o reduction

On Wed, Jun 13, 2012 at 3:57 AM, Amit Kapila amit.kap...@huawei.com wrote:
 1) Keep track # times the last transaction id was repeatedly seen in
 tqual.c (resetting as soon as a new xid was touched.  We can do this
 just for xmin, or separately for both xmin and xmax.

 Will this be done when we see a new xid duricg scan of tuple, if yes then
 Won't this logic impact incase there are very few records per transaction
in
 database.
 As in that case the new optimization won't help much, but it adds the new
 instructions.

Yes, but only in the unhinted case -- in oltp workloads tuples get
hinted fairly quickly so I doubt this would be a huge impact.  Hinted
scans will work exactly as they do now.  In the unhinted case for OLTP
a  few tests are added but that's noise compared to the other stuff
going on.

 We can also try to be smart and disable
 the 'avoid setting the hint bit' logic if the page is already dirty.

 Whats the benefit for not setting hint bit for a already dirty page.

Nothing -- we are disabling the '*avoid* setting the hint bit' logic
(double negative there).  In other words we would be disabling the
setting of hint bits if we see the same transaction many times in a
row, but only if the page isn't dirty.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink

2012-06-13 Thread Amit Kapila
I have created the patch by including fallback_application_name for dblink
as well.
In this I have used the name of fallback_application_name as dblink.

Please let me know your suggestions regarding the same.

-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Wednesday, June 13, 2012 12:13 AM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] WIP patch for Todo Item : Provide
fallback_application_name in contrib/pgbench, oid2name, and dblink

On Mon, Jun 11, 2012 at 5:30 AM, Amit Kapila amit.kap...@huawei.com wrote:
 As per the previous discussion in link below, it seems that fallback
 application name needs to be provided for only

 pgbench and oid2name.


http://archives.postgresql.org/message-id/w2g9837222c1004070216u3bc46b3ahbdd
fdffdbfb46...@mail.gmail.com



 However the title of Todo Item suggests it needs to be done for dblink as
 well.

 Please suggest if it needs to be done for dblink, if yes then what should
be
 fallback_application_name for it?

Why not 'dblink'?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


fallback_application_name.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 16/16] current version of the design document

2012-06-13 Thread Merlin Moncure
On Wed, Jun 13, 2012 at 6:28 AM, Andres Freund and...@2ndquadrant.com wrote:
 +synchronized catalog at the decoding site. That adds some complexity to use
 +cases like replicating into a different database or cross-version
 +replication. For those it is relatively straight-forward to develop a proxy 
 pg
 +instance that only contains the catalog and does the transformation to 
 textual
 +changes.

wow.  Anyways, could you elaborate on a little on how this proxy
instance concept would work?  Let's take the case where I have N
small-ish schema identical database shards that I want to aggregate
into a single warehouse -- something that HS/SR currently can't do.
There's a lot of ways to do that obviously but assuming the warehouse
would have to have a unique schema, could it be done in your
architecture?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The code for this is as attached.  Note that I'd rip out the normal-path
 tracking of line boundaries; it seems better to have a second scan of
 the data in the error case and save the cycles in non-error cases.

 Really?!

Um ... do you have a problem with that idea, and if so what?  It would
be considerably more complicated to do it without a second pass.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 16/16] current version of the design document

2012-06-13 Thread Andres Freund
Hi Merlin,

On Wednesday, June 13, 2012 04:21:12 PM Merlin Moncure wrote:
 On Wed, Jun 13, 2012 at 6:28 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  +synchronized catalog at the decoding site. That adds some complexity to
  use +cases like replicating into a different database or cross-version
  +replication. For those it is relatively straight-forward to develop a
  proxy pg +instance that only contains the catalog and does the
  transformation to textual +changes.
 wow.  Anyways, could you elaborate on a little on how this proxy
 instance concept would work?
To do the decoding into another form you need an up2date catalog + correct 
binaries. So the idea would be to have a minimal instance which is just a copy 
of the database with all the tables with an oid  FirstNormalObjectId i.e. 
only the catalog tables. Then you can apply all xlog changes on system tables 
using the existing infrastructure for HS (or use the command trigger 
equivalent we need to build for BDR) and decode everything else into the 
ApplyCache just as done in the patch. Then you would fill out the callbacks 
for the ApplyCache (see patch 14/16 and 15/16 for an example) to do whatever 
you want with the data. I.e. generate plain sql statements or run some 
transform procedure.

 Let's take the case where I have N small-ish schema identical database
 shards that I want to aggregate into a single warehouse -- something that
 HS/SR currently can't do.
 There's a lot of ways to do that obviously but assuming the warehouse
 would have to have a unique schema, could it be done in your
 architecture?
Not sure what you mean by the warehouse having a unique schema? It has the 
same schema as the OLTP counterparts? That would obviously be the easy case if 
you take care and guarantee uniqueness of keys upfront. That basically would 
be trivial ;)
It gets a bit more complex if you need to transform the data for the 
warehouse. I don't plan to put in work to make that possible without some C 
coding (filling out the callbacks and doing the work in there). It shouldn't 
need much though.

Does that answer your question?

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Robert Haas
On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The code for this is as attached.  Note that I'd rip out the normal-path
 tracking of line boundaries; it seems better to have a second scan of
 the data in the error case and save the cycles in non-error cases.

 Really?!

 Um ... do you have a problem with that idea, and if so what?  It would
 be considerably more complicated to do it without a second pass.

Could you explain how it's broken now, and why it will be hard to fix?
 People may well want to use a cast to JSON within an exception block
as a way of testing whether strings are valid JSON.  We should not
assume that the cost of an exception is totally irrelevant, because
this might be iterated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hint bit i/o reduction

2012-06-13 Thread Merlin Moncure
On Wed, Jun 13, 2012 at 9:02 AM, Amit Kapila amit.kap...@huawei.com wrote:
 Yes, but only in the unhinted case -- in oltp workloads tuples get
 hinted fairly quickly so I doubt this would be a huge impact.  Hinted
 scans will work exactly as they do now.  In the unhinted case for OLTP
 a  few tests are added but that's noise compared to the other stuff
 going on.

 I believe the design you have purposed is for the unhinted cases only, means
 if the tuple has already hint set then it will not go to new logic. Is that
 right?

 In unhinted cases, there can be 2 ways hint bit can be set
 1. It gets the transaction status from Clog memory buffers
 2. It needs to do I/O to get the transaction status from Clog.

 So, I think it will add overhead in case-1 where the processing is fast, but
 now we will add some new instructions to it.

yeah -- but you still have to call:
*) TransactionIdIsCurrentTransactionId(), which adds a couple of tests
(and a bsearch in the presence of subtransactions)
*) TransactionIdIsInProgress() which adds a few tests and a out of
line call in the typical case
*) TransactionIdDidCommit() which adds a few tests and two out of line
calls in the typical case
and, while setting the hint it:
*) Several more tests plus:
*) TransactionIdGetCommitLSN()  another out of line call and a test
*) XLogNeedsFlush() two more out of line calls plus a few tests
*) SetBufferCommitInfoNeedsSave out of line call, several more tests

adding a few instructions to the above probably won't be measurable
(naturally it would be tested to confirm that).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Andres Freund
On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  The code for this is as attached.  Note that I'd rip out the
  normal-path tracking of line boundaries; it seems better to have a
  second scan of the data in the error case and save the cycles in
  non-error cases.
  
  Really?!
  
  Um ... do you have a problem with that idea, and if so what?  It would
  be considerably more complicated to do it without a second pass.
 
 Could you explain how it's broken now, and why it will be hard to fix?
  People may well want to use a cast to JSON within an exception block
 as a way of testing whether strings are valid JSON.  We should not
 assume that the cost of an exception is totally irrelevant, because
 this might be iterated.
Exception blocks/subtransctions already are considerably expensive. I have a 
hard time believing this additional cost would be measureable.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-13 Thread Florian Pflug
On Jun13, 2012, at 15:14 , Honza Horak wrote:
 Since systemd with it's PrivateTmp feature is going to be used in more and 
 more distros, there will probably be a bigger need to solve in-accessible 
 default unix socket directory /tmp in the future.
 
 Thus, I'd like to ask if anybody is aware of any issue connected with simply 
 patching pg_config_manual.h, same as Debian does it already? For example, is 
 there any piece of software, that simply rely on /tmp location of the socket 
 and doesn't use libpg for the communication?

I've used self-compiled postgres version on debian for years, and debian's way 
of doing things is major PITA in that situation. You end up having to specify 
that full path to the socket directory *everywhere*, because otherwise things 
start to break if you recompile something and it suddenly happens to use the 
debian-supplied libpq instead of your own.

Supporting sockets in multiple directories would solve that, once and for all.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Robert Haas
On Wed, Jun 13, 2012 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote:
 On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  The code for this is as attached.  Note that I'd rip out the
  normal-path tracking of line boundaries; it seems better to have a
  second scan of the data in the error case and save the cycles in
  non-error cases.
 
  Really?!
 
  Um ... do you have a problem with that idea, and if so what?  It would
  be considerably more complicated to do it without a second pass.

 Could you explain how it's broken now, and why it will be hard to fix?
  People may well want to use a cast to JSON within an exception block
 as a way of testing whether strings are valid JSON.  We should not
 assume that the cost of an exception is totally irrelevant, because
 this might be iterated.
 Exception blocks/subtransctions already are considerably expensive. I have a
 hard time believing this additional cost would be measureable.

According to my testing, the main cost of an exception block catching
a division-by-zero error is that of generating the error message,
primarily sprintf()-type stuff.  The cost of scanning a multi-megabyte
string seems likely to be much larger.

Mind you, I'm not going to spend a lot of time crying into my beer if
it turns out that there's no other reasonable way to implement this,
but I do think that it's entirely appropriate to ask why it's not
possible to do better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Andres Freund
On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 11:06 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote:
  On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
   Robert Haas robertmh...@gmail.com writes:
   On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   The code for this is as attached.  Note that I'd rip out the
   normal-path tracking of line boundaries; it seems better to have a
   second scan of the data in the error case and save the cycles in
   non-error cases.
   
   Really?!
   
   Um ... do you have a problem with that idea, and if so what?  It would
   be considerably more complicated to do it without a second pass.
  
  Could you explain how it's broken now, and why it will be hard to fix?
   People may well want to use a cast to JSON within an exception block
  as a way of testing whether strings are valid JSON.  We should not
  assume that the cost of an exception is totally irrelevant, because
  this might be iterated.
  
  Exception blocks/subtransctions already are considerably expensive. I
  have a hard time believing this additional cost would be measureable.
 
 According to my testing, the main cost of an exception block catching
 a division-by-zero error is that of generating the error message,
 primarily sprintf()-type stuff.  The cost of scanning a multi-megabyte
 string seems likely to be much larger.
True. I ignored that there doesn't have to be an xid assigned yet... I still 
think its not very relevant though.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.3devel branch

2012-06-13 Thread Robert Haas
Seeing as we're about to start a CommitFest, I think it's about time
to create the 9.3devel branch.

I'm happy to do it, but I believe it's usually Tom's gig.

Tom?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 07/16] Log enough data into the wal to reconstruct logical changes from it if wal_level=logical

2012-06-13 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 
 This adds a new wal_level value 'logical'
 
 Missing cases:
 - heap_multi_insert
 - primary key changes for updates
 - no primary key
 - LOG_NEWPAGE
 
First, Wow!
 
I look forward to the point where we can replace our trigger-based
replication with this!  Your missing cases for primary key issues
would not cause us any pain for our current system, since we require
a primary key and don't support updates to PKs for replicated
tables. While I don't expect that the first cut of this will be able
to replace our replication-related functionality, I'm interested in
making sure it can be extended in that direction, so I have a couple
things to consider:
 
(1)  For our usage, with dozens of source databases feeding into
multiple aggregate databases and interfaces, DDL replication is not
of much if any interest.  It should be easy enough to ignore as long
as it is low volume, so that doesn't worry me too much; but if I'm
missing something any you run across any logical WAL logging for DDL
which does generate a lot of WAL traffic, it would be nice to have a
way to turn that off at generation time rather than filtering it or
ignoring it later.  (Probably won't be an issue, just a head-up.)
 
(2)  To match the functionality we now have, we would need the
logical stream to include the *before* image of the whole tuple for
each row updated or deleted.  I understand that this is not needed
for the use cases you are initially targeting; I just hope the
design leaves this option open without needing to disturb other use
cases.  Perhaps this would require yet another wal_level value. 
Perhaps rather than testing the current value directly for
determining whether to log something, the GUC processing could set
some booleans for faster testing and less code churn when the
initial implementation is expanded to support other use cases (like
ours).
 
(3)  Similar to point 2, it would be extremely desirable to be able
to determine table name and columns names for the tuples in a stream
from that stream, without needing to query a hot standby or similar
digging into other sources of information.  Not only will the
various source databases all have different OID values for the same
objects, and the aggregate targets have different values from each
other and the sources, but some targets don't have the tables at
all.  I'm talking about our database transaction repository and the
interfaces to business partners which we currently drive off of the
same transaction stream which drives replication.
 
Would it be helpful or just a distraction if I were to provide a
more detailed description of our whole replication / transaction
store / interface area?
 
If it would be useful, I could also describe some other replication
patterns I have seen over the years.  In particular, one which might
be interesting is where subsets of the data are distributed to
multiple standalone machines which have intermittent or unreliable
connections to a central site, which periodically collects data from
all the remote sites, recalculates distribution, and sends
transactions back out to those remote sites to add, remove, and
update rows based on the distribution rules and the new data.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 16/16] current version of the design document

2012-06-13 Thread Merlin Moncure
On Wed, Jun 13, 2012 at 9:40 AM, Andres Freund and...@2ndquadrant.com wrote:
 Hi Merlin,

 On Wednesday, June 13, 2012 04:21:12 PM Merlin Moncure wrote:
 On Wed, Jun 13, 2012 at 6:28 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  +synchronized catalog at the decoding site. That adds some complexity to
  use +cases like replicating into a different database or cross-version
  +replication. For those it is relatively straight-forward to develop a
  proxy pg +instance that only contains the catalog and does the
  transformation to textual +changes.
 wow.  Anyways, could you elaborate on a little on how this proxy
 instance concept would work?
 To do the decoding into another form you need an up2date catalog + correct
 binaries. So the idea would be to have a minimal instance which is just a copy
 of the database with all the tables with an oid  FirstNormalObjectId i.e.
 only the catalog tables. Then you can apply all xlog changes on system tables
 using the existing infrastructure for HS (or use the command trigger
 equivalent we need to build for BDR) and decode everything else into the
 ApplyCache just as done in the patch. Then you would fill out the callbacks
 for the ApplyCache (see patch 14/16 and 15/16 for an example) to do whatever
 you want with the data. I.e. generate plain sql statements or run some
 transform procedure.

 Let's take the case where I have N small-ish schema identical database
 shards that I want to aggregate into a single warehouse -- something that
 HS/SR currently can't do.
 There's a lot of ways to do that obviously but assuming the warehouse
 would have to have a unique schema, could it be done in your
 architecture?
 Not sure what you mean by the warehouse having a unique schema? It has the
 same schema as the OLTP counterparts? That would obviously be the easy case if
 you take care and guarantee uniqueness of keys upfront. That basically would
 be trivial ;)

by unique I meant 'not the same as the shards' -- presumably this
would mean one of
a) each shard's data would be in a private schema folder
or
b) you'd have one set of tables but decorated with an extra shard
identifying column that would to be present in all keys to get around
uniqueness issues

 It gets a bit more complex if you need to transform the data for the
 warehouse. I don't plan to put in work to make that possible without some C
 coding (filling out the callbacks and doing the work in there). It shouldn't
 need much though.

 Does that answer your question?

yes.  Do you envision it would be possible to wrap the ApplyCache
callbacks in a library that could be exposed as an extension?  For
example, a library that would stick the replication data into a queue
that a userland (non C) process could walk, transform, etc?   I know
that's vague -- my general thrust here is that I find the
transformation features particularly interesting and I'm wondering how
much C coding would be needed to access them in the long term.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-13 Thread Robert Haas
On Wed, Jun 13, 2012 at 7:27 AM, Andres Freund and...@2ndquadrant.com wrote:
 Unless somebody objects I will add most of the individual marked as RFC to the
 current commitfest. I hope that with comments stemming from that round we can
 get several of the patches into the first or second commitfest. As soon as the
 design is clear/accepted we will try very hard to get the following patches
 into the second or third round.

I made a logical replication topic within the CommitFest for this
patch series.  I think you should add them all there.  I have some
substantive thoughts about the design as well, which I will write up
in a separate email.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote:
 According to my testing, the main cost of an exception block catching
 a division-by-zero error is that of generating the error message,
 primarily sprintf()-type stuff.  The cost of scanning a multi-megabyte
 string seems likely to be much larger.

 True. I ignored that there doesn't have to be an xid assigned yet... I still 
 think its not very relevant though.

I don't think it's relevant either.  In the first place, I don't think
that errors occurring multi megabytes into a JSON blob are going to
occur often enough to be performance critical.  In the second place,
removing cycles from the non-error case is worth something too, probably
more than making the error case faster is worth.

In any case, the proposed scheme for providing context requires that
you know where the error is before you can identify the context.  I
considered schemes that would keep track of the last N characters or
line breaks in case one of them proved to be the one we need.  That
would add enough cycles to non-error cases to almost certainly not be
desirable.  I also considered trying to back up, but that doesn't look
promising either for arbitrary multibyte encodings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3devel branch

2012-06-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Seeing as we're about to start a CommitFest, I think it's about time
 to create the 9.3devel branch.

 I'm happy to do it, but I believe it's usually Tom's gig.

There are a couple open issues (like reversion of that
checkpoint-skipping patch) that probably ought to get dealt with
before we branch, just to save double patching.  Barring objections
I'll cut the branch tonight or tomorrow though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 07/16] Log enough data into the wal to reconstruct logical changes from it if wal_level=logical

2012-06-13 Thread Andres Freund
On Wednesday, June 13, 2012 05:27:06 PM Kevin Grittner wrote:
 Andres Freund and...@2ndquadrant.com wrote:
  This adds a new wal_level value 'logical'
  
  Missing cases:
  - heap_multi_insert
  - primary key changes for updates
  - no primary key
  - LOG_NEWPAGE
 
 First, Wow!
Thanks ;) I hope you will still be convinced after reading some of the code :P

 I look forward to the point where we can replace our trigger-based
 replication with this!  Your missing cases for primary key issues
 would not cause us any pain for our current system, since we require
 a primary key and don't support updates to PKs for replicated
 tables. While I don't expect that the first cut of this will be able
 to replace our replication-related functionality, I'm interested in
 making sure it can be extended in that direction, so I have a couple
 things to consider:
Ok.

 (1)  For our usage, with dozens of source databases feeding into
 multiple aggregate databases and interfaces, DDL replication is not
 of much if any interest.  It should be easy enough to ignore as long
 as it is low volume, so that doesn't worry me too much; but if I'm
 missing something any you run across any logical WAL logging for DDL
 which does generate a lot of WAL traffic, it would be nice to have a
 way to turn that off at generation time rather than filtering it or
 ignoring it later.  (Probably won't be an issue, just a head-up.)
I don't really see a problem there. I don't yet have a mental image of the 
API/Parameters to START_LOGICAL_REPLICATION to specify filters on the source 
side, but this should be possible.

 (2)  To match the functionality we now have, we would need the
 logical stream to include the *before* image of the whole tuple for
 each row updated or deleted.  I understand that this is not needed
 for the use cases you are initially targeting; I just hope the
 design leaves this option open without needing to disturb other use
 cases.  Perhaps this would require yet another wal_level value.
 Perhaps rather than testing the current value directly for
 determining whether to log something, the GUC processing could set
 some booleans for faster testing and less code churn when the
 initial implementation is expanded to support other use cases (like
 ours).
Hm. I don't see a big problem implementing this although I have to say that I 
am a bit hesitant to do this without in-core users of it for fear of silent 
breakage. WAL is kind of a central thing... ;). But then, the implementation 
should be relatively easy.
I don't see a need to break the wal level down into some booleans: changing 
the test from wal_level = WAL_LEVEL_LOGICAL into something else shouldn't 
result in any measurable difference in that codepath.

I definitely have the use-case of replicating into databases where you need to 
do some transformation in mind.

 (3)  Similar to point 2, it would be extremely desirable to be able
 to determine table name and columns names for the tuples in a stream
 from that stream, without needing to query a hot standby or similar
 digging into other sources of information.  Not only will the
 various source databases all have different OID values for the same
 objects, and the aggregate targets have different values from each
 other and the sources, but some targets don't have the tables at
 all.  I'm talking about our database transaction repository and the
 interfaces to business partners which we currently drive off of the
 same transaction stream which drives replication.
I don't forsee this as a realistic thing. I think the required changes would 
be way to intrusive for too little gain. The performance and space 
requirements would probably also be rather noticeable. I think you will have 
to live with the mentioned 'proxy' pg instances which only contain the catalog 
(which normally isn't very big anyway) doing the decoding into your target 
database (which then doesn't need the same oids).
For how I imagine those proxy instances check my mail to merlin earlier today, 
I described it in some more detail there.
If I can find the time I should possibly develop (another) prototype of such a 
proxy instance. I don't forsee it needing much more infrastructure/code.

Is that a problem for your use-case? If yes, why?

 Would it be helpful or just a distraction if I were to provide a
 more detailed description of our whole replication / transaction
 store / interface area?
If you have it ready, yes. Otherwise I think I have a good enough image 
already. I tried to listen to you at pgcon and I have developed similar things 
before.
To be honest, I would prefer you spending the time on checking some of the 
code if not ;)

 If it would be useful, I could also describe some other replication
 patterns I have seen over the years.  In particular, one which might
 be interesting is where subsets of the data are distributed to
 multiple standalone machines which have intermittent or unreliable
 connections to a central site, which 

Re: [HACKERS] 9.3devel branch

2012-06-13 Thread Robert Haas
On Wed, Jun 13, 2012 at 11:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Seeing as we're about to start a CommitFest, I think it's about time
 to create the 9.3devel branch.

 I'm happy to do it, but I believe it's usually Tom's gig.

 There are a couple open issues (like reversion of that
 checkpoint-skipping patch) that probably ought to get dealt with
 before we branch, just to save double patching.  Barring objections
 I'll cut the branch tonight or tomorrow though.

Yeah, that'd be nice to get cleaned up.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Robert Haas
On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote:
 According to my testing, the main cost of an exception block catching
 a division-by-zero error is that of generating the error message,
 primarily sprintf()-type stuff.  The cost of scanning a multi-megabyte
 string seems likely to be much larger.

 True. I ignored that there doesn't have to be an xid assigned yet... I still
 think its not very relevant though.

 I don't think it's relevant either.  In the first place, I don't think
 that errors occurring multi megabytes into a JSON blob are going to
 occur often enough to be performance critical.  In the second place,
 removing cycles from the non-error case is worth something too, probably
 more than making the error case faster is worth.

 In any case, the proposed scheme for providing context requires that
 you know where the error is before you can identify the context.  I
 considered schemes that would keep track of the last N characters or
 line breaks in case one of them proved to be the one we need.  That
 would add enough cycles to non-error cases to almost certainly not be
 desirable.  I also considered trying to back up, but that doesn't look
 promising either for arbitrary multibyte encodings.

Oh, I see.  :-(

Well, I guess I'll have to suck it up, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 16/16] current version of the design document

2012-06-13 Thread Andres Freund
Hi,

On Wednesday, June 13, 2012 05:39:36 PM Merlin Moncure wrote:
 On Wed, Jun 13, 2012 at 9:40 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  Let's take the case where I have N small-ish schema identical database
  shards that I want to aggregate into a single warehouse -- something
  that HS/SR currently can't do.
  There's a lot of ways to do that obviously but assuming the warehouse
  would have to have a unique schema, could it be done in your
  architecture?
  
  Not sure what you mean by the warehouse having a unique schema? It has
  the same schema as the OLTP counterparts? That would obviously be the
  easy case if you take care and guarantee uniqueness of keys upfront.
  That basically would be trivial ;)
 
 by unique I meant 'not the same as the shards' -- presumably this
 would mean one of
 a) each shard's data would be in a private schema folder
 or
 b) you'd have one set of tables but decorated with an extra shard
 identifying column that would to be present in all keys to get around
 uniqueness issues
I think it would have to mean a) and that you have N of those logical import 
processes hanging around. We really need an identical TupleDesc to do the 
decoding.

  It gets a bit more complex if you need to transform the data for the
  warehouse. I don't plan to put in work to make that possible without some
  C coding (filling out the callbacks and doing the work in there). It
  shouldn't need much though.
  
  Does that answer your question?
 yes.  Do you envision it would be possible to wrap the ApplyCache
 callbacks in a library that could be exposed as an extension?  For
 example, a library that would stick the replication data into a queue
 that a userland (non C) process could walk, transform, etc?   I know
 that's vague -- my general thrust here is that I find the
 transformation features particularly interesting and I'm wondering how
 much C coding would be needed to access them in the long term.
I can definitely imagine the callbacks calling some wrapper around a higher-
level language. Not sure how that fits into an extension (if you mean it as in 
CREATE EXTENSION) though. I don't think you will be able to start the 
replication process from inside a normal backend. I imagine something like 
specifying a shared object + parameters in the config or such.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-13 Thread Andres Freund
On Wednesday, June 13, 2012 05:53:31 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 7:27 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  Unless somebody objects I will add most of the individual marked as RFC
  to the current commitfest. I hope that with comments stemming from that
  round we can get several of the patches into the first or second
  commitfest. As soon as the design is clear/accepted we will try very
  hard to get the following patches into the second or third round.
 
 I made a logical replication topic within the CommitFest for this
 patch series.  I think you should add them all there.  
Thanks. Added all but the bgworker patch (which is not ready) and the 
WalSndWakeup (different category) one which is not really relevant to the 
topic.

I have the feeling I am due quite some reviewing this round...

 I have some substantive thoughts about the design as well, which I will
 write up in a separate email.
Thanks. Looking forward to it. At least now, before I have read it.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GENERAL] pg_upgrade from 9.0.7 to 9.1.3: duplicate key pg_authid_oid_index

2012-06-13 Thread Bruce Momjian
On Mon, Jun 04, 2012 at 10:16:45AM -0400, Bruce Momjian wrote:
  I think the checks that are actually needed here are (1) bootstrap
  superusers are named the same, and (2) there are no roles other than the
  bootstrap superuser in the new cluster.
 
 You are right that it is more complex than I stated, but given the
 limited feedback I got on the pg_upgrade/plplython, I figured people
 didn't want to hear the details.  Here they are:
 
 There are three failure modes for pg_upgrade:
 
 1.  check failure
 2.  schema restore failure
 3.  silent failure/corruption
 
 Of course, the later items are worse than the earlier ones.  The
 reporter got a schema restore failure while still following the
 pg_upgrade instructions.  My initial patch changed that #2 error to a #1
 error.  Tom is right that creating users in the new cluster (against
 instructions), can still generate a #2 error if a new/old pg_authid.oid
 match, and they are not the install user, but seeing that is something
 that is against the instructions, I was going to leave that as a #2.

Applied and back-patched to Postgres 9.1.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In any case, the proposed scheme for providing context requires that
 you know where the error is before you can identify the context.  I
 considered schemes that would keep track of the last N characters or
 line breaks in case one of them proved to be the one we need.  That
 would add enough cycles to non-error cases to almost certainly not be
 desirable.  I also considered trying to back up, but that doesn't look
 promising either for arbitrary multibyte encodings.

 Oh, I see.  :-(

Attached is a complete proposed patch for this, with some further
adjustments to make the output look a bit more like what we use for
SQL error context printouts (in particular, ... at both ends of the
excerpt when appropriate).

One thing I did not touch, but am less than happy with, is the wording
of this detail message:

errdetail(Expected string, number, object, array, true, false, or 
null, but found \%s\.,
  token),

This seems uselessly verbose to me.  It could be argued that enumerating
all the options is helpful to rank JSON novices ... but unless you know
exactly what an object is and why that's different from the other
options, it's not all that helpful.  I'm inclined to think that
something like this would be better:

errdetail(Expected JSON value, but found \%s\.,
  token),

Thoughts?

regards, tom lane


diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index e79c294..59335db 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*** typedef struct	/* state of JSON lexe
*** 43,50 
  	char	   *token_start;	/* start of current token within input */
  	char	   *token_terminator; /* end of previous or current token */
  	JsonValueType token_type;	/* type of current token, once it's known */
- 	int			line_number;	/* current line number (counting from 1) */
- 	char	   *line_start;		/* start of current line within input (BROKEN!!) */
  } JsonLexContext;
  
  typedef enum	/* states of JSON parser */
--- 43,48 
*** static void json_lex_string(JsonLexConte
*** 78,83 
--- 76,82 
  static void json_lex_number(JsonLexContext *lex, char *s);
  static void report_parse_error(JsonParseStack *stack, JsonLexContext *lex);
  static void report_invalid_token(JsonLexContext *lex);
+ static int report_json_context(JsonLexContext *lex);
  static char *extract_mb_char(char *s);
  static void composite_to_json(Datum composite, StringInfo result,
  			  bool use_line_feeds);
*** json_validate_cstring(char *input)
*** 185,192 
  	/* Set up lexing context. */
  	lex.input = input;
  	lex.token_terminator = lex.input;
- 	lex.line_number = 1;
- 	lex.line_start = input;
  
  	/* Set up parse stack. */
  	stacksize = 32;
--- 184,189 
*** json_lex(JsonLexContext *lex)
*** 335,345 
  	/* Skip leading whitespace. */
  	s = lex-token_terminator;
  	while (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r')
- 	{
- 		if (*s == '\n')
- 			lex-line_number++;
  		s++;
- 	}
  	lex-token_start = s;
  
  	/* Determine token type. */
--- 332,338 
*** json_lex(JsonLexContext *lex)
*** 350,356 
  		{
  			/* End of string. */
  			lex-token_start = NULL;
! 			lex-token_terminator = NULL;
  		}
  		else
  		{
--- 343,349 
  		{
  			/* End of string. */
  			lex-token_start = NULL;
! 			lex-token_terminator = s;
  		}
  		else
  		{
*** json_lex(JsonLexContext *lex)
*** 397,403 
  			/*
  			 * We got some sort of unexpected punctuation or an otherwise
  			 * unexpected character, so just complain about that one
! 			 * character.
  			 */
  			lex-token_terminator = s + 1;
  			report_invalid_token(lex);
--- 390,397 
  			/*
  			 * We got some sort of unexpected punctuation or an otherwise
  			 * unexpected character, so just complain about that one
! 			 * character.  (It can't be multibyte because the above loop
! 			 * will advance over any multibyte characters.)
  			 */
  			lex-token_terminator = s + 1;
  			report_invalid_token(lex);
*** json_lex_string(JsonLexContext *lex)
*** 443,453 
  lex-token_terminator = s;
  report_invalid_token(lex);
  			}
  			ereport(ERROR,
  	(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  	 errmsg(invalid input syntax for type json),
! 	 errdetail(line %d: Character with value \0x%02x\ must be escaped.,
! 	 lex-line_number, (unsigned char) *s)));
  		}
  		else if (*s == '\\')
  		{
--- 437,449 
  lex-token_terminator = s;
  report_invalid_token(lex);
  			}
+ 			lex-token_terminator = s + pg_mblen(s);
  			ereport(ERROR,
  	(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  	 errmsg(invalid input syntax for type json),
! 	 errdetail(Character with value 0x%02x must be escaped.,
! 			   

[HACKERS] uncataloged tables are a vestigial husk

2012-06-13 Thread Robert Haas
While working on some code today, I noticed that RELKIND_UNCATALOGED
appears to serve no useful purpose.  In the few places where we check
for it at all, we treat it in exactly the same way as
RELKIND_RELATION.  It seems that it's only purpose is to serve as a
placeholder inside each newly-created relcache entry until the real
relkind is filled in.  But this seems pretty silly, because
RelationBuildLocalRelation(), where the relcache entry is created, is
called in only one place, heap_create(), which already knows the
relkind.  So, essentially, right now, we're relying on the callers of
heap_create() to pass in a relkind and then, after heap_create()
returns, stick that same relkind into the relcache entry before
inserting the pg_class tuple.  The only place where that doesn't
happen is in the bootstrap code, which instead allows
RELKIND_UNCATALOGED to stick around in the relcache entry even though
we have RELKIND_RELATION in the pg_class tuple. But we don't actually
rely on that for anything, so it seems this is just an unnecessary
complication.

The attached patch cleans it up by removing RELKIND_UNCATALOGED and
teaching RelationBuildLocalRelation() to set the relkind itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


remove-relkind-uncataloged.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Robert Haas
On Wed, Jun 13, 2012 at 12:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In any case, the proposed scheme for providing context requires that
 you know where the error is before you can identify the context.  I
 considered schemes that would keep track of the last N characters or
 line breaks in case one of them proved to be the one we need.  That
 would add enough cycles to non-error cases to almost certainly not be
 desirable.  I also considered trying to back up, but that doesn't look
 promising either for arbitrary multibyte encodings.

 Oh, I see.  :-(

 Attached is a complete proposed patch for this, with some further
 adjustments to make the output look a bit more like what we use for
 SQL error context printouts (in particular, ... at both ends of the
 excerpt when appropriate).

I like some of these changes - in particular, the use of errcontext(),
but some of them still seem off.

! DETAIL:  Token ' is invalid.
! CONTEXT:  JSON data, line 1: '...

This doesn't make sense to me.

! DETAIL:  Character with value 0x0a must be escaped.
! CONTEXT:  JSON data, line 1: abc
! ...

This seems an odd way to present this, especially if the goal is to
NOT include the character needing escaping in the log unescaped, which
I thought was the point of saying 0x0a.

! DETAIL:  \u must be followed by four hexadecimal digits.
! CONTEXT:  JSON data, line 1: \u000g...

Why does this end in ... even though there's nothing further in the
input string?

 One thing I did not touch, but am less than happy with, is the wording
 of this detail message:

        errdetail(Expected string, number, object, array, true, false, or 
 null, but found \%s\.,
                  token),

 This seems uselessly verbose to me.  It could be argued that enumerating
 all the options is helpful to rank JSON novices ... but unless you know
 exactly what an object is and why that's different from the other
 options, it's not all that helpful.  I'm inclined to think that
 something like this would be better:

        errdetail(Expected JSON value, but found \%s\.,
                  token),

 Thoughts?

Good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-06-13 Thread Jeff Davis
On Wed, 2012-06-13 at 13:53 +0300, Peter Eisentraut wrote:
 The --help output for the -N option was copy-and-pasted wrongly.
 
 The message issued when using -N is also a bit content-free.  Maybe
 something like
 
 Running in nosync mode.  The data directory might become corrupt if the
 operating system crashes.\n

Thank you, fixed.

 Which leads to the question, how does one get out of this state?  Is
 running sync(1) enough?  Is starting the postgres server enough?

sync(1) calls sync(2), and the man page says:

According to the standard specification  (e.g.,  POSIX.1-2001),  sync()
schedules the writes, but may return before the actual writing is done.
However, since version 1.3.20 Linux does actually  wait.   (This  still
does not guarantee data integrity: modern disks have large caches.)

So it looks like sync is enough if you are on linux *and* you have any
unprotected write cache disabled.

I don't think starting the postgres server is enough.

Before, I think we were safe because we could assume that the OS would
flush the buffers before you had time to store any important data. But
now, that window can be much larger.

 There are no updates to the initdb man page included in your patch,
 which would be a suitable place to discuss this briefly.

Thank you, I added that.

Regards,
Jeff Davis


initdb-fsync-20120613.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I like some of these changes - in particular, the use of errcontext(),
 but some of them still seem off.

 ! DETAIL:  Token ' is invalid.
 ! CONTEXT:  JSON data, line 1: '...

 This doesn't make sense to me.

Well, the input is two single quotes and the complaint is that the first
one of them doesn't constitute a valid JSON token.  What would you
expect to see instead?

I considered putting double quotes around the excerpt text, but we do
not do that in SQL error-context reports, and it seems likely to just be
confusing given the prevalence of double quotes in JSON data.  But happy
to consider opposing arguments.

Another thing that could be done here is to print the rest of the line,
rather than ..., if there's not very much of it.  I'm not sure that's
an improvement though.  The advantage of the proposed design is that the
point where the excerpt ends is exactly where the error was detected;
lacking an error cursor, I don't see how else to present that info.

 ! DETAIL:  Character with value 0x0a must be escaped.
 ! CONTEXT:  JSON data, line 1: abc
 ! ...

 This seems an odd way to present this, especially if the goal is to
 NOT include the character needing escaping in the log unescaped, which
 I thought was the point of saying 0x0a.

Do you think it would be better to present something that isn't what the
user typed?  Again, I don't see an easy improvement here.  If you don't
want newlines in the logged context, what will we do for something like

{foo: {
bar:44
}
]

There basically isn't any useful context to present unless we are
willing to back up several lines, and I don't think it'll be more
readable if we escape all the newlines.

 ! DETAIL:  \u must be followed by four hexadecimal digits.
 ! CONTEXT:  JSON data, line 1: \u000g...

 Why does this end in ... even though there's nothing further in the
 input string?

There is something further, namely the trailing double quote.

The examples in the regression tests are not really designed to show
cases where this type of error reporting is an improvement, because
there's hardly any context around the error sites.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-13 Thread Steve Singer

On 12-06-13 07:27 AM, Andres Freund wrote:

Its also available in the 'cabal-rebasing' branch on
git.postgresql.org/users/andresfreund/postgres.git . That branch will modify
history though.



That branch has a merge error in f685a11ce43b9694cbe61ffa42e396c9fbc32b05


gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -I../../../../src/include 
-D_GNU_SOURCE -c -o xact.o xact.c

xact.c:4684: error: expected identifier or ‘(’ before ‘’ token
xact.c:4690:46: warning: character constant too long for its type
xact.c:4712:46: warning: character constant too long for its type
xact.c:4719: error: expected identifier or ‘(’ before ‘’ token
xact.c:4740:46: warning: character constant too long for its type
make[4]: *** [xact.o] Error 1



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] uncataloged tables are a vestigial husk

2012-06-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 While working on some code today, I noticed that RELKIND_UNCATALOGED
 appears to serve no useful purpose.  In the few places where we check
 for it at all, we treat it in exactly the same way as
 RELKIND_RELATION.  It seems that it's only purpose is to serve as a
 placeholder inside each newly-created relcache entry until the real
 relkind is filled in.

I suspect that it had some actual usefulness back in Berkeley days.
But now that catalogs are created with the correct relkind to start
with during initdb, I agree it's probably just inertia keeping that
around.

 The attached patch cleans it up by removing RELKIND_UNCATALOGED and
 teaching RelationBuildLocalRelation() to set the relkind itself.

I think there are probably some places to fix in the docs too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-13 Thread Andres Freund
On Wednesday, June 13, 2012 07:11:40 PM Steve Singer wrote:
 On 12-06-13 07:27 AM, Andres Freund wrote:
  Its also available in the 'cabal-rebasing' branch on
  git.postgresql.org/users/andresfreund/postgres.git . That branch will
  modify history though.
 
 That branch has a merge error in f685a11ce43b9694cbe61ffa42e396c9fbc32b05
 
 
 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
 -Wformat-security -fno-strict-aliasing -fwrapv -I../../../../src/include
 -D_GNU_SOURCE -c -o xact.o xact.c
 xact.c:4684: error: expected identifier or ‘(’ before ‘’ token
 xact.c:4690:46: warning: character constant too long for its type
 xact.c:4712:46: warning: character constant too long for its type
 xact.c:4719: error: expected identifier or ‘(’ before ‘’ token
 xact.c:4740:46: warning: character constant too long for its type
 make[4]: *** [xact.o] Error 1
Aw crap. Will fix that. Sorry.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] uncataloged tables are a vestigial husk

2012-06-13 Thread Robert Haas
On Wed, Jun 13, 2012 at 1:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The attached patch cleans it up by removing RELKIND_UNCATALOGED and
 teaching RelationBuildLocalRelation() to set the relkind itself.

 I think there are probably some places to fix in the docs too.

catalogs.sgml doesn't include it in the list of possible relkinds,
since it never hits the disk.  And grep -i uncatalog doc/src/sgml
comes up empty.  Where else should I be looking?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is cachedFetchXidStatus provably valid?

2012-06-13 Thread Merlin Moncure
It's probably an academic concern, but what happens if a backend saves
off cachedFetchXidStatus and then sleeps for a very long time.  During
that time an xid wraparound happens and the backend wakes up and
happens to read another unhinted tuple with the same xid and a
different commit status.  This is obviously incredibly unlikely, but
shouldn't cachedFetchXid be cleared at some appropriate point --
perhaps end of transaction?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-13 Thread Andres Freund
On Wednesday, June 13, 2012 07:11:40 PM Steve Singer wrote:
 On 12-06-13 07:27 AM, Andres Freund wrote:
  Its also available in the 'cabal-rebasing' branch on
  git.postgresql.org/users/andresfreund/postgres.git . That branch will
  modify history though.
 
 That branch has a merge error in f685a11ce43b9694cbe61ffa42e396c9fbc32b05
 
 
 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
 -Wformat-security -fno-strict-aliasing -fwrapv -I../../../../src/include
 -D_GNU_SOURCE -c -o xact.o xact.c
 xact.c:4684: error: expected identifier or ‘(’ before ‘’ token
 xact.c:4690:46: warning: character constant too long for its type
 xact.c:4712:46: warning: character constant too long for its type
 xact.c:4719: error: expected identifier or ‘(’ before ‘’ token
 xact.c:4740:46: warning: character constant too long for its type
 make[4]: *** [xact.o] Error 1
Hrmpf. I reordered the patch series a last time to be more clear and I somehow 
didn't notice this. I compiled  tested the now pushed head 
(7e0340a3bef927f79b3d97a11f94ede4b911560c) and will submit an updated patch 
[10/16].

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-13 Thread Andres Freund
The previous mail contained a patch with a mismerge caused by reording 
commits. Corrected version attached.

Thanks to Steve Singer for noticing this quickly.

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From efb28b1f931a30738faac83703810b598a82a6ee Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sat, 9 Jun 2012 14:49:34 +0200
Subject: [PATCH] Introduce the concept that wal has a 'origin' node

One solution to avoid loops when doing wal based logical replication in
topologies which are more complex than one unidirectional transport is
introducing the concept of a 'origin_id' into the wal stream. Luckily there is
some padding in the XLogRecord struct that allows us to add that field without
further bloating the struct.
This solution was chosen because it allows for just about any topology and is
inobstrusive.

This adds a new configuration parameter multimaster_node_id which determines
the id used for wal originating in one cluster.

When applying changes from wal from another cluster code can set the variable
current_replication_origin_id. This is a global variable because passing it
through everything which can generate wal would be far to intrusive.
---
 src/backend/access/transam/xact.c |   54 +
 src/backend/access/transam/xlog.c |3 +-
 src/backend/access/transam/xlogreader.c   |2 +
 src/backend/replication/logical/Makefile  |2 +-
 src/backend/replication/logical/logical.c |   19 +
 src/backend/utils/misc/guc.c  |   19 +
 src/backend/utils/misc/postgresql.conf.sample |3 ++
 src/include/access/xlog.h |4 +-
 src/include/access/xlogdefs.h |2 +
 src/include/replication/logical.h |   22 ++
 10 files changed, 100 insertions(+), 30 deletions(-)
 create mode 100644 src/backend/replication/logical/logical.c
 create mode 100644 src/include/replication/logical.h

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 3cc2bfa..79ec19a 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,8 +36,9 @@
 #include libpq/be-fsstubs.h
 #include miscadmin.h
 #include pgstat.h
-#include replication/walsender.h
+#include replication/logical.h
 #include replication/syncrep.h
+#include replication/walsender.h
 #include storage/lmgr.h
 #include storage/predicate.h
 #include storage/procarray.h
@@ -4545,12 +4546,13 @@ xactGetCommittedChildren(TransactionId **ptr)
  * actions for which the order of execution is critical.
  */
 static void
-xact_redo_commit_internal(TransactionId xid, XLogRecPtr lsn,
-		  TransactionId *sub_xids, int nsubxacts,
-		  SharedInvalidationMessage *inval_msgs, int nmsgs,
-		  RelFileNode *xnodes, int nrels,
-		  Oid dbId, Oid tsId,
-		  uint32 xinfo)
+xact_redo_commit_internal(TransactionId xid, RepNodeId originating_node,
+  XLogRecPtr lsn, XLogRecPtr origin_lsn,
+  TransactionId *sub_xids, int nsubxacts,
+  SharedInvalidationMessage *inval_msgs, int nmsgs,
+  RelFileNode *xnodes, int nrels,
+  Oid dbId, Oid tsId,
+  uint32 xinfo)
 {
 	TransactionId max_xid;
 	int			i;
@@ -4659,8 +4661,8 @@ xact_redo_commit_internal(TransactionId xid, XLogRecPtr lsn,
  * Utility function to call xact_redo_commit_internal after breaking down xlrec
  */
 static void
-xact_redo_commit(xl_xact_commit *xlrec,
- TransactionId xid, XLogRecPtr lsn)
+xact_redo_commit(xl_xact_commit *xlrec, RepNodeId originating_node,
+			TransactionId xid, XLogRecPtr lsn)
 {
 	TransactionId *subxacts;
 	SharedInvalidationMessage *inval_msgs;
@@ -4670,27 +4672,26 @@ xact_redo_commit(xl_xact_commit *xlrec,
 	/* invalidation messages array follows subxids */
 	inval_msgs = (SharedInvalidationMessage *) (subxacts[xlrec-nsubxacts]);
 
-	xact_redo_commit_internal(xid, lsn, subxacts, xlrec-nsubxacts,
-			  inval_msgs, xlrec-nmsgs,
-			  xlrec-xnodes, xlrec-nrels,
-			  xlrec-dbId,
-			  xlrec-tsId,
-			  xlrec-xinfo);
+	xact_redo_commit_internal(xid, originating_node, lsn, xlrec-origin_lsn,
+	  subxacts, xlrec-nsubxacts, inval_msgs,
+	  xlrec-nmsgs, xlrec-xnodes, xlrec-nrels,
+	  xlrec-dbId, xlrec-tsId, xlrec-xinfo);
 }
 
 /*
  * Utility function to call xact_redo_commit_internal  for compact form of message.
  */
 static void
-xact_redo_commit_compact(xl_xact_commit_compact *xlrec,
-		 TransactionId xid, XLogRecPtr lsn)
+xact_redo_commit_compact(xl_xact_commit_compact *xlrec, RepNodeId originating_node,
+			TransactionId xid, XLogRecPtr lsn)
 {
-	xact_redo_commit_internal(xid, lsn, xlrec-subxacts, xlrec-nsubxacts,
-			  NULL, 0,	

Re: [HACKERS] uncataloged tables are a vestigial husk

2012-06-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 13, 2012 at 1:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The attached patch cleans it up by removing RELKIND_UNCATALOGED and
 teaching RelationBuildLocalRelation() to set the relkind itself.

 I think there are probably some places to fix in the docs too.

 catalogs.sgml doesn't include it in the list of possible relkinds,
 since it never hits the disk.  And grep -i uncatalog doc/src/sgml
 comes up empty.  Where else should I be looking?

Huh.  Okay, there probably isn't anyplace then.  I'm surprised we didn't
list it in catalogs.sgml, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-13 Thread Andres Freund
Hi,

The patch as of yet doesn't contain how you actually can use the prototype... 
Obviously at this point its not very convenient:

I have two config files:
Node 1:
port = 5501
wal_level = logical
max_wal_senders = 10
wal_keep_segments = 200
multimaster_conninfo = 'port=5502 host=/tmp'
multimaster_node_id = 1

Node 2:
port = 5502
wal_level = logical
max_wal_senders = 10
wal_keep_segments = 200
multimaster_conninfo = 'port=5501 host=/tmp'
multimaster_node_id = 2

after initdb'ing the first cluster (initdb required):
$ ~/src/postgresql/build/assert/src/backend/postgres -D 
~/tmp/postgres/bdr/1/datadir/ -c 
config_file=~/tmp/postgres/bdr/1/postgresql.conf -c 
hba_file=~/tmp/postgres/bdr/1/pg_hba.conf -c 
ident_file=~/tmp/postgres/bdr/1/pg_ident.conf

$ psql -p 5501 -U andres postgres
CREATE TABLE data(id serial primary key, data bigint);
ALTER SEQUENCE data_id_seq INCREMENT 2;
SELECT setval('data_id_seq', 1);

shutdown cluster

$ rsync -raxv --delete /home/andres/tmp/postgres/bdr/1/datadir/* 
/home/andres/tmp/postgres/bdr/2/datadir

start both cluster which should sync after some output.

$ psql -p 5501 -U andres postgres
SELECT setval('data_id_seq', 2);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] BDR Prototype startup instructions (not prematurely sent this time)

2012-06-13 Thread Andres Freund
Hi,

The patch as of yet doesn't contain how you actually can use the prototype... 
Obviously at this point its not very convenient:

I have two config files:
Node 1:
port = 5501
wal_level = logical
max_wal_senders = 10
wal_keep_segments = 200
multimaster_conninfo = 'port=5502 host=/tmp'
multimaster_node_id = 1

Node 2:
port = 5502
wal_level = logical
max_wal_senders = 10
wal_keep_segments = 200
multimaster_conninfo = 'port=5501 host=/tmp'
multimaster_node_id = 2

after initdb'ing the first cluster (initdb required):
$ ~/src/postgresql/build/assert/src/backend/postgres -D 
~/tmp/postgres/bdr/1/datadir/ -c 
config_file=~/tmp/postgres/bdr/1/postgresql.conf -c 
hba_file=~/tmp/postgres/bdr/1/pg_hba.conf -c 
ident_file=~/tmp/postgres/bdr/1/pg_ident.conf

$ psql -p 5501 -U andres postgres
CREATE TABLE data(id serial primary key, data bigint);
ALTER SEQUENCE data_id_seq INCREMENT 2;
SELECT setval('data_id_seq', 1);

shutdown cluster

$ rsync -raxv --delete /home/andres/tmp/postgres/bdr/1/datadir/* 
/home/andres/tmp/postgres/bdr/2/datadir

start both clusters which should sync after some output.


$ psql -p 5501 -U andres postgres

INSERT INTO data(data) VALUES (random()*1) RETURNING *;
INSERT INTO data(data) VALUES (random()*1) RETURNING *;
INSERT INTO data(data) VALUES (random()*1) RETURNING *;
INSERT INTO data(data) VALUES (random()*1) RETURNING *;

$ psql -p 5502 -U andres postgres
SELECT setval('data_id_seq', 2);

INSERT INTO data(data) VALUES (random()*1) RETURNING *;
INSERT INTO data(data) VALUES (random()*1) RETURNING *;
INSERT INTO data(data) VALUES (random()*1) RETURNING *;
INSERT INTO data(data) VALUES (random()*1) RETURNING *;
INSERT INTO data(data) VALUES (random()*1) RETURNING *;

postgres=# SELECT * FROM data;
 id | data 
+--
  3 |  396
  5 | 2522
  7 |  275
  9 | 9632
 11 | 1176
  4 | 6977
  6 | 7339
  8 | 6383
 10 | 4600
 12 | 8878
 14 | 1987
(11 rows)

$ psql -p 5501 -U andres postgres
 id | data 
+--
  3 |  396
  5 | 2522
  7 |  275
  9 | 9632
 11 | 1176
  4 | 6977
  6 | 7339
  8 | 6383
 10 | 4600
 12 | 8878
 14 | 1987
(11 rows)

DELETE FROM data;

$ psql -p 5502 -U andres postgres
SELECT * FROM data;
 id | data 
+--
(0 rows)


There is not much more you can do at this time. You can run an unmodified 
pgbench if you do it before the rsync if you prohibit vacuum (-n) which 
actually does a truncate on the history table.

Hope that helps the brave sould already trying this,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Robert Haas
On Wed, Jun 13, 2012 at 1:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I like some of these changes - in particular, the use of errcontext(),
 but some of them still seem off.

 ! DETAIL:  Token ' is invalid.
 ! CONTEXT:  JSON data, line 1: '...

 This doesn't make sense to me.

 Well, the input is two single quotes and the complaint is that the first
 one of them doesn't constitute a valid JSON token.  What would you
 expect to see instead?

Oh, I see.

 Another thing that could be done here is to print the rest of the line,
 rather than ..., if there's not very much of it.  I'm not sure that's
 an improvement though.  The advantage of the proposed design is that the
 point where the excerpt ends is exactly where the error was detected;
 lacking an error cursor, I don't see how else to present that info.

OK.


 ! DETAIL:  Character with value 0x0a must be escaped.
 ! CONTEXT:  JSON data, line 1: abc
 ! ...

 This seems an odd way to present this, especially if the goal is to
 NOT include the character needing escaping in the log unescaped, which
 I thought was the point of saying 0x0a.

 Do you think it would be better to present something that isn't what the
 user typed?  Again, I don't see an easy improvement here.  If you don't
 want newlines in the logged context, what will we do for something like

        {foo: {
                bar:44
                }
        ]

 There basically isn't any useful context to present unless we are
 willing to back up several lines, and I don't think it'll be more
 readable if we escape all the newlines.

Hmm.  If your plan is to trace back to the opening brace you were
expecting to match, I don't think that's going to work either.  What
if there are three pages (or 3MB) of data in between?

 The examples in the regression tests are not really designed to show
 cases where this type of error reporting is an improvement, because
 there's hardly any context around the error sites.

Yeah, true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented

2012-06-13 Thread Noah Misch
On Mon, Jun 11, 2012 at 09:18:39PM -0400, Robert Haas wrote:
 I guess the
 remaining question is whether to do it only for LOCAL TEMP tables or
 also for GLOBAL TEMP ones.  A survey of what other products do might
 be of some value.

Thanks for investigating.

 Sybase ASE, which I include only because it is one of the few systems
 that actually support the CREATE LOCAL TEMPORARY TABLE syntax, appears
 to give them the same semantics as our existing temp tables: session
 local.  Sybase ASE also includes two kinds of global temporary tables:
 non-shared - i.e. permanent tables with session-local contents - and
 shared - i.e. what we call unlogged tables, except that they don't
 survive a clean shutdown.
 
 http://dcx.sybase.com/1200/en/dbreference/create-local-temporary-table-statement.html
 http://dcx.sybase.com/1200/en/dbusage/temporary-tables.html

FWIW, that's SQL Anywhere, not ASE.  ASE is closer to Microsoft SQL Server in
this area.

 So I can't find any evidence that any database product in existence
 uses CREATE LOCAL TEMPORARY TABLE to mean anything other than what
 CREATE TEMPORARY TABLE does in PostgreSQL, and there's at least one
 where it means exactly the thing that we do.  Given that, I am
 inclined to think that we should only warn about using GLOBAL TEMP,
 and not LOCAL TEMP.  It seems needlessly hard-headed to warn about
 using a syntax for which there are no existing, incompatible
 implementations and for which we have no plans to change the existing
 semantics.  YMMV, of course.

Oracle Rdb implemented the SQL standard behavior:
http://www.oracle.com/technetwork/products/rdb/implementing-procedure-result-sets-091225.html

So, one implementation mirrors our current CREATE LOCAL TEMPORARY TABLE
semantics and another implements SQL standard semantics.  No classic migration
source product implements the syntax at all.  Given that, I think we should
make the decision independent of migration concerns.

Our continuing users will be quicker to accept the need to remove GLOBAL than
LOCAL; the table is not at all global but is, informally, local.  Future users
will benefit from a self-consistent system.  Though it's difficult to
quantify, future users also benefit from a system following the SQL standard.
Given that, how about warning on GLOBAL only but having the documentation
equally discourage use of both?

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 14/16] Add module to apply changes from an apply-cache using low-level functions

2012-06-13 Thread Christopher Browne
On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com wrote:
 From: Andres Freund and...@anarazel.de

 We decided to use low level functions to do the apply instead of producing sql
 statements containing the data (or using prepared statements) because both, 
 the
 text conversion and the full executor overhead aren't introduce a significant
 overhead which is unneccesary if youre using the same version of pg on the 
 same
 architecture.

 There are loads of use cases though that require different methods of applyin
 though - so the part doing the applying from an ApplyCache is just a bunch of
 well abstracted callbacks getting passed all the required knowledge to change
 the data representation into other formats.

It's important to make sure that it's not going to be *too* difficult
to jump through the hoops necessary to apply changes on a different
version.

While pg_upgrade has diminished the need to use replication to handle
cross-version/architecture upgrades, I don't think it has brought that
to zero.

One other complication I'll observe...  The assumption is being made
that UPDATE/DELETE will be handled via The Primary Key.  For the most
part, I approve of this.  Once upon a time, Slony had a misfeature
where you could tell it to add in a surrogate primary key, and that
caused no end of trouble.  However, the alternative, that *does* seem
to work alright, is to allow selecting a candidate primary key, that
is, a set of columns that have UNIQUE + NOT NULL characteristics.  I
could see people having a problem switching over to use this system if
they MUST begin with a 'Right Proper Primary Key.'  If they start with
a system with a 2TB table full of data that lacks that particular
constraint, that could render them unable to use the facility.

 Missing:

 - TOAST handling. For physical apply not much needs to be done because the
  toast inserts will have been made beforehand. There needs to be an option in
  ApplyCache that helps reassembling TOAST datums to make it easier to write
  apply modules which convert to text.

Dumb question: Is it possible that two nodes would have a different
idea as to which columns need to get toasted?  I should think it
possible for nodes to be configured with different values for TOAST
policies, and while it's likely pretty dumb to set them to have
different handlings, it would seem undesirable to not bother looking,
and find the backend crashing due to an un-noticed mismatch.

-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 13, 2012 at 1:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ! DETAIL:  Character with value 0x0a must be escaped.
 ! CONTEXT:  JSON data, line 1: abc
 ! ...
 
 This seems an odd way to present this, especially if the goal is to
 NOT include the character needing escaping in the log unescaped, which
 I thought was the point of saying 0x0a.

 Do you think it would be better to present something that isn't what the
 user typed?  Again, I don't see an easy improvement here.  If you don't
 want newlines in the logged context, what will we do for something like
 
{foo: {
bar:44
}
]

 Hmm.  If your plan is to trace back to the opening brace you were
 expecting to match, I don't think that's going to work either.  What
 if there are three pages (or 3MB) of data in between?

No, that's not the proposal; I only anticipate printing a few dozen
characters of context.  But that could still mean printing something
like

DETAIL:  expected , or }, but found ].
CONTEXT:  JSON data, line 123: ...bar:44
   }
]

which I argue is much more useful than just seeing the ].  So the
question is whether it's still as useful if we mangle the whitespace.
I'm thinking it's not.  We don't mangle whitespace when printing SQL
statements into the log, anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented

2012-06-13 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Given that, how about warning on GLOBAL only but having the documentation
 equally discourage use of both?

Yeah, that's about what I was thinking, too.

Any thoughts on the wording of the WARNING message?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 14/16] Add module to apply changes from an apply-cache using low-level functions

2012-06-13 Thread Andres Freund
On Wednesday, June 13, 2012 08:50:42 PM Christopher Browne wrote:
 On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  From: Andres Freund and...@anarazel.de
  
  We decided to use low level functions to do the apply instead of
  producing sql statements containing the data (or using prepared
  statements) because both, the text conversion and the full executor
  overhead aren't introduce a significant overhead which is unneccesary if
  youre using the same version of pg on the same architecture.
  
  There are loads of use cases though that require different methods of
  applyin though - so the part doing the applying from an ApplyCache is
  just a bunch of well abstracted callbacks getting passed all the
  required knowledge to change the data representation into other formats.
 
 It's important to make sure that it's not going to be *too* difficult
 to jump through the hoops necessary to apply changes on a different
 version.
I aggree. But I don't see it as a feature of the first version for the moment. 
Getting a base set of features into 9.3 is going to be hard enough as-is. But 
I think there is enough interest from all sides to make that possible cross-
version.

 While pg_upgrade has diminished the need to use replication to handle
 cross-version/architecture upgrades, I don't think it has brought that
 to zero.
Aggreed.

 One other complication I'll observe...  The assumption is being made
 that UPDATE/DELETE will be handled via The Primary Key.  For the most
 part, I approve of this.  Once upon a time, Slony had a misfeature
 where you could tell it to add in a surrogate primary key, and that
 caused no end of trouble.  However, the alternative, that *does* seem
 to work alright, is to allow selecting a candidate primary key, that
 is, a set of columns that have UNIQUE + NOT NULL characteristics.  I
 could see people having a problem switching over to use this system if
 they MUST begin with a 'Right Proper Primary Key.'  If they start with
 a system with a 2TB table full of data that lacks that particular
 constraint, that could render them unable to use the facility.
It wouldn't need that much code to allow candidate keys. The data 
representation in the catalogs is a bit unfriendly for that, but there has 
been talk about changing that for some time now. I am not convinced that its 
worth the cpu cycles though.

Btw, you can convert a unique key to a primary key since 9.1. The unique key 
previously can be created CONCURRENTLY.

  Missing:
  
  - TOAST handling. For physical apply not much needs to be done because
  the toast inserts will have been made beforehand. There needs to be an
  option in ApplyCache that helps reassembling TOAST datums to make it
  easier to write apply modules which convert to text.
 Dumb question: Is it possible that two nodes would have a different
 idea as to which columns need to get toasted?  I should think it
 possible for nodes to be configured with different values for TOAST
 policies, and while it's likely pretty dumb to set them to have
 different handlings, it would seem undesirable to not bother looking,
 and find the backend crashing due to an un-noticed mismatch.
I don't think it should be possible to configure the toast configurations 
differently if you use the binary apply mode. But even if it were a value 
which is toasted although the local policy says it should not be wouldn't 
cause any problems as far as I can see.
The one problem that could cause problems for that are different page sizes et 
al, but that needs to be prohibited anyway.

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup docs

2012-06-13 Thread Dimitri Fontaine
Magnus Hagander mag...@hagander.net writes:
 I would like to see that page changed to list pg_basebackup as the
 default way of doing base backups, and then list the manual way as
 an option if you need more flexibility.

+1

 I'd also like to add pg_basebackup -x under standalone hot backups,
 again as the main option.

+1

 I also wonder if we need a tl;dr; section of that whole page that just
 goes through *what to do*, rather than why we do it? Of course not
 removing the details, just showing the simplest case in, um, a simpler
 way?

+1

Come to think about it, that is the perfect occasion to have the
tutorial open itself up to dealing with admin tasks, right?

Please let's apply that documentation patch to 9.2 too.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-06-13 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 I thought I already pointed that out, but: we have *extensions*.  What
 we don't have is a convenient method of dealing with functions that need
 to be migrated across extensions, or from an extension to core, between
 one major release and the next.  It would clearly be nice to have that
 someday, but we don't have it now.  Designing on the assumption that 9.3

Well, my patch for 9.2 called Finer Extension Dependencies was all
about supporting that. The idea is to be able to name a set of functions
(or other objects) then setup a dependency graph towards that arbitrary
name. Then it's possible to migrate that named set of objects from an
extension to another one, and it's possible for core to publish a list
of provided names of set of objects provided.

  https://commitfest.postgresql.org/action/patch_view?id=727

Inspiring my work from some other development facilities I enjoy
spending my time with, I called that set a feature and added ways for
extensions to provide and require them, like has been done in some
lisps for more than 3 decades now. I'm not wedded to those terms, which
have been bringing confusion on the table before, please suggest some
other ones if you think you like the feature.

I'm going to have this patch in the next CF so that we can talk about it
again, as I think it is actually designed to help us fix the problem
here.

The only missing part in the patch is allowing for the core to declare
a set of set of objects (a set of features in its current terminology)
that it brings on the table. Such a list already exists though, and is
using the same terminology as in my patch:

  http://www.postgresql.org/docs/9.2/static/features-sql-standard.html

We wouldn't only publish the standard compliant feature list with such a
mechanism though or it would be quite useless for our operations here.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented

2012-06-13 Thread Noah Misch
On Wed, Jun 13, 2012 at 02:56:58PM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  Given that, how about warning on GLOBAL only but having the documentation
  equally discourage use of both?
 
 Yeah, that's about what I was thinking, too.
 
 Any thoughts on the wording of the WARNING message?

My patch used GLOBAL is deprecated in temporary table creation, which still
seems fine to me.  Here's an update based on this discussion.
*** a/doc/src/sgml/ref/create_table.sgml
--- b/doc/src/sgml/ref/create_table.sgml
***
*** 163,169  CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] 
TABLE [ IF NOT EXI
   para
Optionally, literalGLOBAL/literal or literalLOCAL/literal
can be written before literalTEMPORARY/ or literalTEMP/.
!   This makes no difference in productnamePostgreSQL/, but see
xref linkend=sql-createtable-compatibility
endterm=sql-createtable-compatibility-title.
   /para
--- 163,170 
   para
Optionally, literalGLOBAL/literal or literalLOCAL/literal
can be written before literalTEMPORARY/ or literalTEMP/.
!   This presently makes no difference in productnamePostgreSQL/
!   and is deprecated; see
xref linkend=sql-createtable-compatibility
endterm=sql-createtable-compatibility-title.
   /para
***
*** 1310,1316  CREATE TABLE employees OF employee_type (
  productnamePostgreSQL/productname does not have.
  For compatibility's sake, productnamePostgreSQL/productname will
  accept the literalGLOBAL/literal and literalLOCAL/literal keywords
! in a temporary table declaration, but they have no effect.
 /para
  
 para
--- 1311,1318 
  productnamePostgreSQL/productname does not have.
  For compatibility's sake, productnamePostgreSQL/productname will
  accept the literalGLOBAL/literal and literalLOCAL/literal keywords
! in a temporary table declaration, but they have no effect.  Use in new
! applications is discouraged.
 /para
  
 para
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***
*** 2514,2521  OptTemp: TEMPORARY   
{ $$ = RELPERSISTENCE_TEMP; }
| TEMP  { $$ = 
RELPERSISTENCE_TEMP; }
| LOCAL TEMPORARY   { $$ = 
RELPERSISTENCE_TEMP; }
| LOCAL TEMP{ $$ = 
RELPERSISTENCE_TEMP; }
!   | GLOBAL TEMPORARY  { $$ = 
RELPERSISTENCE_TEMP; }
!   | GLOBAL TEMP   { $$ = 
RELPERSISTENCE_TEMP; }
| UNLOGGED  { $$ = 
RELPERSISTENCE_UNLOGGED; }
| /*EMPTY*/ { $$ = 
RELPERSISTENCE_PERMANENT; }
;
--- 2514,2533 
| TEMP  { $$ = 
RELPERSISTENCE_TEMP; }
| LOCAL TEMPORARY   { $$ = 
RELPERSISTENCE_TEMP; }
| LOCAL TEMP{ $$ = 
RELPERSISTENCE_TEMP; }
!   | GLOBAL TEMPORARY
!   {
!   ereport(WARNING,
!   (errmsg(GLOBAL is 
deprecated in temporary table creation),
!
parser_errposition(@1)));
!   $$ = RELPERSISTENCE_TEMP;
!   }
!   | GLOBAL TEMP
!   {
!   ereport(WARNING,
!   (errmsg(GLOBAL is 
deprecated in temporary table creation),
!
parser_errposition(@1)));
!   $$ = RELPERSISTENCE_TEMP;
!   }
| UNLOGGED  { $$ = 
RELPERSISTENCE_UNLOGGED; }
| /*EMPTY*/ { $$ = 
RELPERSISTENCE_PERMANENT; }
;
***
*** 8930,8940  OptTempTableName:
--- 8942,8958 
}
| GLOBAL TEMPORARY opt_table qualified_name
{
+   ereport(WARNING,
+   (errmsg(GLOBAL is 
deprecated in temporary table creation),
+
parser_errposition(@1)));
$$ = $4;
$$-relpersistence = 
RELPERSISTENCE_TEMP;
   

Re: [HACKERS] temporal support patch

2012-06-13 Thread Miroslav Šimulčík
There would be no problem to make my solution compatible with SQL 2011, but
the standard is not freely available. Can anybody provide me with this
standard?

2012/5/20 Pavel Stehule pavel.steh...@gmail.com

 Hello

 2012/5/18 Miroslav Šimulčík simulcik.m...@gmail.com:
  Hello.
 
  SQL 2011 standard wasn't available in time I started this project so I
 built
  my implementation on older standards TSQL2 and SQL/Temporal, that were
 only
  available. None of these were accepted by ANSI/ISO commissions however.
 
  There is different syntax in SQL 2011 and it looks like one that IBM DB2
 had
  been using even before this standard were published.
 
  So my implementation differs in syntax, but features are same as stated
 in
  system versioned tables part of slideshow.
 

 I would to see temporal functionality in pg, but only in SQL 2011
 syntax. Using syntax from deprecated proposals has no sense. I am not
 sure so history table concept is best from performance view - it is
 simpler for implementation, but you duplicate all indexes - there will
 be lot of redundant fields in history table. A important query is
 difference in cost for some non trivial query for actual data and same
 query for historic data.

 Regards

 Pavel Stehule

  Regards
  Miroslav Simulcik
 
 
  2012/5/17 Pavel Stehule pavel.steh...@gmail.com
 
  Hello
 
  what is conformance of your solution with temporal extension in ANSI SQL
  2011
 
 
 
 http://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438
 
  Regards
 
  Pavel Stehule
 
  2012/5/16 Miroslav Šimulčík simulcik.m...@gmail.com:
   Hi all,
  
   as a part of my master's thesis I have created temporal support patch
   for
   PostgreSQL. It enables the creation of special temporal tables with
   entries
   versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these
   tables
   don't cause permanent changes to entries, but create new versions of
   them.
   Thus user can easily get to the past states of the table.
  
   Basic information on temporal databases can be found
   on http://en.wikipedia.org/wiki/Temporal_database
  
   In field of temporal databases, there are only proprietary solution
   available. During the analysis I found these:
   - IBM DB2 10 for z/OS
   - Oracle 11g Workspace Manager
   - Teradata Database 13.10
  
   Primary goal of my work was the creation of opensource solution, that
 is
   easy to use and is backward compatible with existing applications,
   so that
   the change of the original tables to temporal ones, does not require
   changes
   to applications that work with them. This patch is built on standard
   SQL/Temporal with some minor modifications inspired by commercial
   temporal
   database systems. Currently it only deals with transaction time
 support.
  
   Here is simple description on how it works:
  
   1. user can create transaction time table using modified CREATE TABLE
   command:
  
   CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;
  
   This command automatically creates all objects required for
   transaction
   time support:
  
 List of relations
 Schema | Name |   Type   |  Owner
+--+--+--
 public | person   | table| tester
 public | person__entry_id_seq | sequence | tester
 public | person_hist  | table| postgres
  
  
 Table
 public.person
   Column   |Type |
Modifiers
  
  
  

 +-+--
 name   | character varying(50)   |
 _entry_id  | bigint  | not null default
   nextval('person__entry_id_seq'::regclass)
 _sys_start | timestamp without time zone | not null default
   clock_timestamp()
 _sys_end   | timestamp without time zone | not null default
   '294276-12-31 23:59:59.99'::timestamp without time zone
Indexes:
person__entry_id_idx btree (_entry_id)
person__sys_start__sys_end_idx btree (_sys_start, _sys_end)
  
  
 Table public.person_hist
   Column   |Type | Modifiers
+-+---
 name   | character varying(50)   |
 _entry_id  | bigint  | not null
 _sys_start | timestamp without time zone | not null
 _sys_end   | timestamp without time zone | not null
Indexes:
person_hist__entry_id_idx btree (_entry_id)
person_hist__sys_start__sys_end_idx btree (_sys_start,
   _sys_end)
  
  
  
  
   Table person stores current versions of entries. 3 additional
   columns
   are added:
   _entry_id - id of entry. It groups together 

Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-06-13 Thread Gabriele Bartolini

Hi Noah,

Il 10/06/12 22:53, Noah Misch ha scritto:

This has bitrotted; please refresh.

Also, please evaluate Peter's feedback:
http://archives.postgresql.org/message-id/1333693277.32606.9.ca...@vanquo.pezone.net


Our goal is to work on this patch from the next commit fest.

What we are about to do for this commit fest is to split the previous 
patch and send a small one just for the array_remove() and 
array_replace() functions.


Then we will sit down and organise the development of the feature 
according to Peter's feedback. It is important indeed that we find a 
commonly accepted terminology and syntax for this feature.


I hope this sounds like a reasonable plan. Thank you.

Cheers,
Gabriele

--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-06-13 Thread Miroslav Šimulčík
2012/5/30 Jim Nasby j...@nasby.net

 On 5/18/12 2:06 AM, Miroslav Šimulčík wrote:

 - no data redundancy - in my extension current versions of entries are
 stored only once in original table (in table_log - entries are inserted to
 both original and log table)


 That's not necessarily a benefit... it makes querying for both history
 *and* current data a lot more complex. Table inheritance might be an
 elegant solution to that, but I doubt you could just bolt that on top of
 what you've created.



Yes, querying for history data is more complex, but i focused on preserving
the performance of current queries. That's the reason why I use separate
table for old versions.

Table inheritance is very good idea and it will not require so much effort
to use it in my solution. Currently, when user queries whole history of
entries, table reference in FROM clause is replaced with subselect, which
access data in both tables. For example when user executes command:

NONSEQUENCED TRANSACTIONTIME SELECT * FROM person;

The actually executed command is:

SELECT * FROM (SELECT * FROM person UNION ALL SELECT * FROM person_hist) as
person

Use of table inheritance can make things simpler and more elegant, but I'm
not sure about how it affect performance. Will it cause gain in performance?


The timestamp fields need to have timezone info. If you change the timezone
 for a connection you will get inconsistent results without it.

 _sys_end should either be NULLable or if it's going to have a magic value
 that magic value should be Infinity:



Good point. I will use timestamp with timezone and  value Infinity
instead of max timestamp value


Re: [HACKERS] Is cachedFetchXidStatus provably valid?

2012-06-13 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 It's probably an academic concern, but what happens if a backend saves
 off cachedFetchXidStatus and then sleeps for a very long time.  During
 that time an xid wraparound happens and the backend wakes up and
 happens to read another unhinted tuple with the same xid and a
 different commit status.  This is obviously incredibly unlikely, but
 shouldn't cachedFetchXid be cleared at some appropriate point --
 perhaps end of transaction?

Well, aside from what the odds might be of hitting the case if you did
manage to sleep through an XID wraparound, I think it's impossible for a
backend to sleep that long, because of cache inval signals.  (Or, to
put it differently, a backend has typically got a whole lot of XIDs
cached within tuples in its syscaches.  cachedFetchXidStatus is the
least of its worries if it fails to engage in cache inval activity.)

If we had a multiple-entry cache in place of the single-entry cache,
I think this would be a more realistic concern.  You'd need some way to
flush old entries from that cache, rather than being able to expect
that the single entry would get overwritten with newer values anytime
something happened.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-06-13 Thread Miroslav Šimulčík


 * I'd very much like to see you make use of Range Types from 9.2; in
 particular, TSTZRANGE would be much better than holding two timestamps.
 If a standard requires you to display two timestamps in certain
 situations, perhaps you could use ranges internally and display the
 boundaries as timestamps when needed.



I agree, new range types will be ideal for this




 * There is other useful information that could be recorded, such as the
 user who inserted/updated/deleted the record.



Yes I considered addition of user ID and transaction ID columns, because it
can be useful in some cases (for example to find all changes made by
transaction). However it wasn't necessary, so i omitted it. It can be
easily added.




 * For some purposes, it's very useful to keep track of the columns that
 changed. For instance, a query like show me any time a salary was
 changed over the last month (or some other rare event) would be very
 slow to run if there was not some explicit annotation on the historical
 records (e.g. a columns changed bitmap or something).



Another useful feature. I can take a look on it



 * As Jim mentioned, it might make sense to use something resembling
 inheritance so that selecting from the historical table includes the
 current data (but with no upper bound for the range).



See reply to Jim's post.




 * It might make sense to hammer out as many of the details as we can
 with an extension. For instance, exactly what options will be available,
 what data types will be used, what objects will be created, the trigger
 code, etc. Then, it will be more obvious exactly what we need to add
 extra core support for (e.g. if we are going to use some inheritance
 like mechanism), and what we need to add syntax sugar for.


 I recommend that you start posting more detailed designs on
 http://wiki.postgresql.org



In which section of wiki can I post detailed design of my solution?



 If you already have code, feel free to submit it for the next commitfest
 ( http://commitfest.postgresql.org ), but this is a relatively large
 project, so it will most likely take several commitfest cycles.



I have working patch for postgresql version 9.0.4, but it needs refactoring
before i can submit it, because some parts don't
meet formatting requirements yet. And yes, changes are large, so it will be
better to discuss design first and then deal with code. Do you insist on
compatibility with standard SQL 2011 as Pavel wrote?


Re: [HACKERS] temporal support patch

2012-06-13 Thread Merlin Moncure
On Wed, Jun 13, 2012 at 4:10 PM, Miroslav Šimulčík
simulcik.m...@gmail.com wrote:
 I have working patch for postgresql version 9.0.4, but it needs refactoring
 before i can submit it, because some parts don't
 meet formatting requirements yet. And yes, changes are large, so it will be
 better to discuss design first and then deal with code. Do you insist on
 compatibility with standard SQL 2011 as Pavel wrote?

Standards compliance is always going to make things easier in terms of
gaining community acceptance if you're targeting in core adoption.  At
the very least it will remove one barrier although you might be in for
a slog for other reasons.  You may not have known this, but postgres
had a time travel feature waaay back in the day (see:
http://www.postgresql.org/docs/6.3/static/c0503.htm).  It was removed
for performance reasons and the first thing I'm wondering is how your
stuff performs in various scenarios and various other interesting
things. Also, +1 on use of range types  Anyways, thanks for submitting
and good luck!

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is cachedFetchXidStatus provably valid?

2012-06-13 Thread Merlin Moncure
On Wed, Jun 13, 2012 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 It's probably an academic concern, but what happens if a backend saves
 off cachedFetchXidStatus and then sleeps for a very long time.  During
 that time an xid wraparound happens and the backend wakes up and
 happens to read another unhinted tuple with the same xid and a
 different commit status.  This is obviously incredibly unlikely, but
 shouldn't cachedFetchXid be cleared at some appropriate point --
 perhaps end of transaction?

 Well, aside from what the odds might be of hitting the case if you did
 manage to sleep through an XID wraparound, I think it's impossible for a
 backend to sleep that long, because of cache inval signals.  (Or, to
 put it differently, a backend has typically got a whole lot of XIDs
 cached within tuples in its syscaches.  cachedFetchXidStatus is the
 least of its worries if it fails to engage in cache inval activity.)

 If we had a multiple-entry cache in place of the single-entry cache,
 I think this would be a more realistic concern.  You'd need some way to
 flush old entries from that cache, rather than being able to expect
 that the single entry would get overwritten with newer values anytime
 something happened.

Right -- thanks for that -- I figured as much.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented

2012-06-13 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Wed, Jun 13, 2012 at 02:56:58PM -0400, Tom Lane wrote:
 Any thoughts on the wording of the WARNING message?

 My patch used GLOBAL is deprecated in temporary table creation, which still
 seems fine to me.  Here's an update based on this discussion.

Applied with some further wordsmithing on docs and comments.  We can
still tweak this if anyone objects, of course, but I thought it'd
probably save work to get it in before the branch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-06-13 Thread Noah Misch
On Wed, Jun 13, 2012 at 10:12:18PM +0200, Gabriele Bartolini wrote:
 Our goal is to work on this patch from the next commit fest.

 What we are about to do for this commit fest is to split the previous  
 patch and send a small one just for the array_remove() and  
 array_replace() functions.

 Then we will sit down and organise the development of the feature  
 according to Peter's feedback. It is important indeed that we find a  
 commonly accepted terminology and syntax for this feature.

 I hope this sounds like a reasonable plan. Thank you.

Sounds fine.  The 2012-01 CF entry for this patch has been moved to the
2012-06 CF.  Please mark that entry Returned with Feedback and create a new
entry for the subset you're repackaging for this CommitFest.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-13 Thread Peter Eisentraut
On ons, 2012-06-13 at 15:14 +0200, Honza Horak wrote:
 Thus, I'd like to ask if anybody is aware of any issue connected with 
 simply patching pg_config_manual.h, same as Debian does it already?
 For example, is there any piece of software, that simply rely on /tmp 
 location of the socket and doesn't use libpg for the communication?

If you're asking, should Red Hat/Fedora simply fix the issue locally by
patching pg_config_manual.h, then yes, that would work, see Debian, but
it has its annoyances, especially with additional software compiled from
source, or third-party binary installers.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-13 Thread Peter Eisentraut
On mån, 2012-06-11 at 18:07 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On sön, 2012-06-10 at 17:24 -0400, Robert Haas wrote:
  Why would that matter?  If you configure M ports and N Unix socket
  locations, you get M*N actual sockets created.
 
  ...I *seriously* doubt that this is the behavior anyone wants.
  Creating M sockets per directory seems patently silly.
 
  How else would it work?
 
  If I say, syntax aside, listen on ports 5432 and 5433, and use socket
  directories /tmp and /var/run/postgresql, then a libpq-using client
  would expect to be able to connect using
 
 This argument seems quite circular to me: you are assuming that we will
 adopt exactly the behavior that Robert is questioning.
 
 What would make more sense to me is
 
 (1) there is still a *single* port parameter, which is what we use for
 things like shared memory keys;
 
 (2) listen_addresses (and the hypothetical socket_directories list)
 grows the ability to specify a port number in any list element.  The
 primary port number parameter sets the default.
 
 So for instance
 
   port = 5432
   listen_addresses = '*, 127.0.0.1:5433'
 
 results in listening on *:5432 and 127.0.0.1:5433.
 
  So you do need to create M*N sockets.
  I don't really see a problem with that.
 
 I do: first, it's a lotta sockets, and second, it's not real hard to
 foresee cases where somebody actively doesn't want that cross-product.

Well, it's fine if we provide ways not to have the cross-product, but
there should also be an easy way to get it.  I can easily see cases in
systems I have administered where I would have liked to use two unix
sockets, two IP sockets, and two ports.  Maybe I actually would have
needed only 7 out of those 8 sockets, but it's far easier to configure,
document, and explain if I just set up all 8 of them.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented

2012-06-13 Thread Noah Misch
On Wed, Jun 13, 2012 at 05:50:36PM -0400, Tom Lane wrote:
 Applied with some further wordsmithing on docs and comments.  We can
 still tweak this if anyone objects, of course, but I thought it'd
 probably save work to get it in before the branch.

Thanks.

  The SQL standard also distinguishes between global and local temporary
  tables, where a local temporary table is only visible within a specific SQL
  module, though its definition is still shared across sessions.  Since
  PostgreSQL does not support SQL modules, this distinction is not relevant in
  PostgreSQL.

That new documentation paragraph describes the standard behavior for DECLARE
LOCAL TEMPORARY TABLE.  CREATE LOCAL TEMPORARY TABLE produces a table
available to all modules but having one instance of its contents per module,
per session.  With GLOBAL, by contrast, all modules see the same table
contents during a given session.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-13 Thread Peter Eisentraut
On mån, 2012-06-11 at 18:45 -0500, Michael Nolan wrote:
 What about entries in pg_hba.conf? Will they need to be able to specify
 both the directory and the port number?

I think the philosophy behind pg_hba.conf is that you distinguish client
*systems*.  So one client might be Kerberos-capable, or one client might
be Windows and use SSPI.  For that, it doesn't matter which of multiple
ports or local sockets it uses.  So I don't think there are any changes
needed in this area.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-13 Thread Peter Eisentraut
On tis, 2012-06-12 at 14:47 +0200, Honza Horak wrote:
 This could be true in case all listening ports are equal, which I
 guess isn't a good idea, because IIUIC the port number as a part of
 the socket name is used for distinguish sockets of various postmasters
 in the same directory. In that scenario every client should know which
 port to connect and also which one is primary.

Why?  The client won't care which port is primary or secondary or
whatever as long as it reaches the server.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-13 Thread Peter Eisentraut
On ons, 2012-06-13 at 15:25 +0200, Honza Horak wrote:
 It seems unix_socket_directory could be turned into list and probably 
 renamed to unix_socket_directories, since it would be confusing if a 
 list value is in singular.

Well, it would also be annoying to rename the parameter name for a
marginal change in functionality.

 On the other hand, we probably don't want to specify listening ports
 together with additional unix sockets in one 
 configuration option, so it seems better to add a new configuration 
 option to distinguish the primary listening port from additional
 ports.

I think it would be wiser if you left the port business out of this.
There are more issues hidden in there than you need to deal with.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented

2012-06-13 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Wed, Jun 13, 2012 at 05:50:36PM -0400, Tom Lane wrote:
   The SQL standard also distinguishes between global and local temporary
   tables, where a local temporary table is only visible within a specific SQL
   module, though its definition is still shared across sessions.  Since
   PostgreSQL does not support SQL modules, this distinction is not relevant in
   PostgreSQL.

 That new documentation paragraph describes the standard behavior for DECLARE
 LOCAL TEMPORARY TABLE.  CREATE LOCAL TEMPORARY TABLE produces a table
 available to all modules but having one instance of its contents per module,
 per session.  With GLOBAL, by contrast, all modules see the same table
 contents during a given session.

[ reads spec more closely... ]  Yeah, you're right.  Will fix, thanks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree

2012-06-13 Thread Alexander Korotkov
Hackers,

attached patch implements quad-tree on ranges. Some performance results in
comparison with current GiST indexing.
Index creation is slightly slower. Probably, it need some investigation.
Search queries on SP-GiST use much more pages. However this comparison can
be not really correct, because SP-GiST can pin same buffer several times
during one scan. In CPU search queries on SP-GiST seems to be slightly
faster. Dramatical difference in column @ const query is thanks to
2d-mapping.

test=# create index period_idx on period_test using gist (period);
CREATE INDEX
Time: 49141,148 ms

test=# create index period_idx2 on period_test2 using spgist (period);
CREATE INDEX
Time: 59503,215 ms

test=# explain (analyze, buffers) select * from period_test where period 
daterange('2011-12-10', '2011-12-11');
QUERY PLAN


--
 Bitmap Heap Scan on period_test  (cost=471.63..13716.45 rows=11866
width=32) (actual time=107.258..147.139 rows=365451 loops=1)
   Recheck Cond: (period  '[2011-12-10,2011-12-11)'::daterange)
   Buffers: shared read=7636
   -  Bitmap Index Scan on period_idx  (cost=0.00..468.67 rows=11866
width=0) (actual time=106.697..106.697 rows=365451 loops=1)
 Index Cond: (period  '[2011-12-10,2011-12-11)'::daterange)
 Buffers: shared read=3694
 Total runtime: 160.670 ms
(7 rows)

test=# explain (analyze, buffers) select * from period_test2 where period
 daterange('2011-12-10', '2011-12-11');
   QUERY PLAN


-
 Bitmap Heap Scan on period_test2  (cost=414.52..13659.34 rows=11866
width=32) (actual time=88.793..129.608 rows=365451 loops=1)
   Recheck Cond: (period  '[2011-12-10,2011-12-11)'::daterange)
   Buffers: shared hit=7623 read=7687
   -  Bitmap Index Scan on period_idx2  (cost=0.00..411.55 rows=11866
width=0) (actual time=88.285..88.285 rows=365451 loops=1)
 Index Cond: (period  '[2011-12-10,2011-12-11)'::daterange)
 Buffers: shared hit=7623 read=3745
 Total runtime: 142.971 ms
(7 rows)

test=# explain (analyze, buffers) select * from period_test where period @
daterange('2011-12-10', '2011-12-11');
QUERY PLAN


---
---
 Bitmap Heap Scan on period_test  (cost=102.06..6140.66 rows=2373 width=32)
(actual time=85.437..85.437 rows=0 loops=1)
   Recheck Cond: (period @ '[2011-12-10,2011-12-11)'::daterange)
   Buffers: shared read=3694
   -  Bitmap Index Scan on period_idx  (cost=0.00..101.47 rows=2373
width=0) (actual time=85.431..85.431 rows=0 loops=1)
 Index Cond: (period @ '[2011-12-10,2011-12-11)'::daterange)
 Buffers: shared read=3694
 Total runtime: 85.493 ms
(7 rows)

test=# explain (analyze, buffers) select * from period_test2 where period
@ daterange('2011-12-10', '2011-12-11');
QUERY PLAN


--
 Bitmap Heap Scan on period_test2  (cost=88.95..6127.54 rows=2373 width=32)
(actual time=18.666..18.666 rows=0 loops=1)
   Recheck Cond: (period @ '[2011-12-10,2011-12-11)'::daterange)
   Buffers: shared hit=1404
   -  Bitmap Index Scan on period_idx2  (cost=0.00..88.35 rows=2373
width=0) (actual time=18.660..18.660 rows=0 loops=1)
 Index Cond: (period @ '[2011-12-10,2011-12-11)'::daterange)
 Buffers: shared hit=1404
 Total runtime: 18.717 ms
(7 rows)


test=# explain (analyze, buffers) select * from period_test where period @
daterange('2011-08-10', '2011-12-31');
  QUERY PLAN


---
 Bitmap Heap Scan on period_test  (cost=102.06..6140.66 rows=2373 width=32)
(actual time=56.114..73.785 rows=118097 loops=1)
   Recheck Cond: (period @ '[2011-08-10,2011-12-31)'::daterange)
   Buffers: shared read=4125
   -  Bitmap Index Scan on period_idx  (cost=0.00..101.47 rows=2373
width=0) (actual time=55.740..55.740 rows=118097 loops=1)
 Index Cond: (period @ '[2011-08-10,2011-12-31)'::daterange)
 Buffers: shared read=1391
 Total runtime: 78.469 ms
(7 rows)


test=# explain (analyze, buffers) select * from period_test2 where period
@ daterange('2011-08-10', '2011-12-31');
  QUERY PLAN



Re: [HACKERS] Minimising windows installer password confusion

2012-06-13 Thread Craig Ringer

On 06/13/2012 05:10 PM, Dave Page wrote:

The idea of storing the password in clear text in the registry gives
me nervous twitches.


Me too. It's horrible, and I really dislike the idea. I can't imagine 
that Microsoft don't have a better solution to this.


I talked to some Microsoft people at an event yesterday, and they said 
that they just don't use completely isolated user accounts for services. 
Microsoft's services install into the three standard service access levels:


  LocalService
  NetworkService
  LocalSystem

as mentioned:

  http://msdn.microsoft.com/en-us/library/ms143504.aspx


http://msdn.microsoft.com/en-us/library/windows/desktop/ms686005(v=vs.85).aspx


... so maybe the answer is that we're trying to do it too UNIX-ish  (ie: 
securely) and we should by default use the NetworkService, allowing 
users to change the service account if they want to as an advanced feature.


Personally I think that'd be better than the current situation, which is 
not user friendly, and has a much lower squick-factor than storing 
passwords in the registry.


This'd also solve issues with other Pg installs; we just switch smoothly 
over to installing in NetworkService by default, giving users a radiobox 
to switch to custom service user account where the name postgres is 
prefilled.


--
Craig Ringer


POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

2012-06-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ! DETAIL:  Character with value 0x0a must be escaped.
 ! CONTEXT:  JSON data, line 1: abc
 ! ...

 This seems an odd way to present this, especially if the goal is to
 NOT include the character needing escaping in the log unescaped, which
 I thought was the point of saying 0x0a.

I thought of a simple way to address that objection for this particular
case: we can just truncate the context display *at* the offending
character, instead of *after* it.  This is playing a bit fast and loose
with the general principle that the context should end at the point of
detection of the error; but given that the character in question is
always unprintable, I think it's probably not going to bother anyone.

I've gone ahead and committed this before branching, though I'm
certainly still willing to entertain suggestions for further
improvement.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimising windows installer password confusion

2012-06-13 Thread Craig Ringer

On 06/13/2012 05:14 PM, Dave Page wrote:

On Wed, Jun 13, 2012 at 2:18 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:

On 06/12/2012 08:08 PM, Dave Page wrote:

Some background: By default the installer will use 'postgres' for both
the service (OS) account, and the database superuser account. It will
use the same password for both (though, users have complete control at
the command line if they want it, which is why the account names are
shown in the installer). Unlike *nix installations, the service
account must have a password, which is required during both
installation and upgrade to configure the PostgreSQL service. We
therefore ask for the password during both installation and upgrade.
If the user account exists (which can happen if there has previously
been an installation of Postgres on the system), the user must specify
the existing password for the account during installation (and of
course, during all upgrades). This is where users normally get stuck,
as they've forgotten the password they set in the past.

They may not even have set it, because the EnterpriseDB installer can be run
unattended and may've been used by some 3rd party package.

I'd be interested in seeing what Microsoft installers that create isolated
user accounts do. I think .NET creates one for ASP, so that'd be one option
to look into.

They tend to use the localsystem or networkservice accounts for most
things, which don't have passwords. The reason we don't do that is
that since the early days of 8.0 we've said we want to enable users to
use the service account as if it were a regular account, so that they
can do things like access network resources (useful for server-side
copy for example).

I wasn't overly convinced back then that that was necessary, and I'm
still not now. I suppose we potentially could use the networkservice
account by default, and let advanced users override that on the
command line if they need to. Then remembering the password becomes
their responsibility.
+1 from me on this, though I think making the service account part of 
the install process makes sense.


SERVICE ACCOUNT
-

You can probably just accept the default here.

PostgreSQL runs in the background as a network service in a user account 
that
only has limited access to the files and programs on the computer. This 
is fine
for most purposes, but will prevent certain operations like server-side 
COPY and
direct access by the server to resources on shared folders from working. 
If you
need these capabilities, we recommend that you create a postgres user 
account

below and have the PostgreSQL server run in that instead of the default
NetworkService account.

-- [service account] ---
|  |
| [*] LocalService |
|  |
| [ ] Custom Service Account   |
|  |
| -- [custom account]--|   |
| ||   |
| | Username: [ ]  |   |
| | Password: [ ]  |   |
| | Domain:   [ THISCOMPUTER]  |   |
| ||   |
| ||   |
|--|

Reasonable option?

--
Craig Ringer



POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-13 Thread Bruce Momjian
On Thu, Jun 14, 2012 at 01:31:31AM +0300, Peter Eisentraut wrote:
 On ons, 2012-06-13 at 15:25 +0200, Honza Horak wrote:
  It seems unix_socket_directory could be turned into list and probably 
  renamed to unix_socket_directories, since it would be confusing if a 
  list value is in singular.
 
 Well, it would also be annoying to rename the parameter name for a
 marginal change in functionality.

We often rename admin-only config variables for clarity, and this seems
to be such a case.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimising windows installer password confusion

2012-06-13 Thread Craig Ringer

On 06/13/2012 05:18 PM, Dave Page wrote:

On Wed, Jun 13, 2012 at 3:07 AM, Craig Ringer



Why using the windows control panel ?


Because when I wrote the email I was looking for a simple solution
that wouldn't require writing code that has potential to fail
depending on how the users environment is configured (the user account
stuff tends to go wrong in weird ways, for example when used on
domains in unusual (or high security) configurations. We're spending a
lot of effort at the moment getting the 9.2 buildfarm together, and
updating all the StackBuilder add-on packages (think multiple
man-months) - I'm trying not to add to that too much.


Ah, sorry. I'm *not* trying to say that any of this is stuff that EDB 
should just up and do. I have no say over what you do and how, I'm 
just trying to raise possible usability points that might be useful, 
either soon or to inform design of later releases.


--
Craig Ringer

POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimising windows installer password confusion

2012-06-13 Thread Craig Ringer

On 06/13/2012 06:32 PM, Florian Pflug wrote:

Some further googling indicates that, yes, the service account passwords
are stored in the registry, but are only accessible to the LocalSystem
account [2]. Querying them from the postgres installer thus isn't really an
option. But what you could do, I guess, is to offer the user the ability to
change the password, and using the approach from [1] to update the affected
service definitions afterwards.


Yep, that fits with how MS SQL server does things:

Always use SQL Server tools such as SQL Server Configuration Manager to 
change the account used by the SQL Server Database Engine or SQL Server 
Agent services, or to change the password for the account. In addition 
to changing the account name, SQL Server Configuration Manager performs 
additional configuration such as updating the Windows local security 
store which protects the service master key for the Database Engine. 
Other tools such as the Windows Services Control Manager can change the 
account name but do not change all the required settings.


http://msdn.microsoft.com/en-us/library/ms143504.aspx

--
Craig Ringer



POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-06-13 Thread Greg Stark
On Mon, Jun 4, 2012 at 8:49 PM, Christopher Browne cbbro...@gmail.com wrote:
 What if the two servers are in different time zones?

 NTP shouldn't have any problem; it uses UTC underneath.  As does
 PostgreSQL, underneath.

As an aside, this is not strictly speaking true. NTP doesn't use UTC
-- afaik it doesn't know about time zones at all. Likewise Postgres's
underlying representation is not UTC either. They both use the number
of seconds that have passed since the epoch. That's simply a number,
not a time at all, and the number is the same regardless of what time
zone you're in.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-06-13 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 I thought I already pointed that out, but: we have *extensions*.  What
 we don't have is a convenient method of dealing with functions that need
 to be migrated across extensions, or from an extension to core, between
 one major release and the next.  It would clearly be nice to have that
 someday, but we don't have it now.  Designing on the assumption that 9.3

 Well, my patch for 9.2 called Finer Extension Dependencies was all
 about supporting that. The idea is to be able to name a set of functions
 (or other objects) then setup a dependency graph towards that arbitrary
 name. Then it's possible to migrate that named set of objects from an
 extension to another one, and it's possible for core to publish a list
 of provided names of set of objects provided.

Well, TBH I didn't think that concept was a useful solution for this at
all.  I can't imagine that we would define features at a sufficiently
fine granularity, or with enough advance foresight, to solve the sort of
problem that's being faced here.  How would you deal with the need for,
say, some of contrib/xml2's functions to get migrated to core in 9.4 or
so?  When you didn't know that would happen, much less exactly which
functions, when 9.3 came out?  AFAICS the only way that features could
fix this would be if we always created a feature for every exposed
function, which is unmanageable.

 The only missing part in the patch is allowing for the core to declare
 a set of set of objects (a set of features in its current terminology)
 that it brings on the table. Such a list already exists though, and is
 using the same terminology as in my patch:
   http://www.postgresql.org/docs/9.2/static/features-sql-standard.html
 We wouldn't only publish the standard compliant feature list with such a
 mechanism though or it would be quite useless for our operations here.

AFAICS, the SQL-standard features list is just about entirely irrelevant
to this discussion.  How often is it going to happen that we implement a
standard feature in a contrib module before migrating it into core?
I think almost every case in which we'll face this problem will involve
a PG-specific feature not mentioned in the SQL feature taxonomy.  The
case at hand (some proposed new functions for managing replication)
certainly isn't going to be found there.

And, quite aside from whether we could invent feature names that match
what we want to move from contrib to core, exactly how would having a
feature name help?  The problem we're actually facing is getting
pg_upgrade to not dump particular functions when it's doing a
binary-upgrade dump of an extension.  Maybe I've forgotten, but I do not
recall any exposed connection between feature names and particular SQL
objects in your proposal.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >